﻿<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[TechTalks]]></title><description><![CDATA[In-depth discussions about machine learning, deep learning, reinforcement learning, neural networks, artificial general intelligence, AI business, and other technology trends.]]></description><link>https://bdtechtalks.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!WLfM!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd313081-7e92-406a-abfe-8766ca6d87fd_396x396.png</url><title>TechTalks</title><link>https://bdtechtalks.substack.com</link></image><generator>Substack</generator><lastBuildDate>Tue, 09 Jun 2026 14:47:50 GMT</lastBuildDate><atom:link href="https://bdtechtalks.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Ben Dickson]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[bdtechtalks@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[bdtechtalks@substack.com]]></itunes:email><itunes:name><![CDATA[Ben Dickson]]></itunes:name></itunes:owner><itunes:author><![CDATA[Ben Dickson]]></itunes:author><googleplay:owner><![CDATA[bdtechtalks@substack.com]]></googleplay:owner><googleplay:email><![CDATA[bdtechtalks@substack.com]]></googleplay:email><googleplay:author><![CDATA[Ben Dickson]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How Codev brings discipline to AI software development]]></title><description><![CDATA[Casual AI prompting breaks down as codebases grow. Codev introduces strict protocols and multi-model reviews to help teams ship maintainable software.]]></description><link>https://bdtechtalks.substack.com/p/how-codev-brings-discipline-to-ai</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/how-codev-brings-discipline-to-ai</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 09 Jun 2026 12:45:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5dlh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5dlh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5dlh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!5dlh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!5dlh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!5dlh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5dlh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:322810,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/201291038?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5dlh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!5dlh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!5dlh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!5dlh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F925e7476-09f6-44a8-a705-2de558753bcd_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The software industry is currently obsessed with &#8220;<a href="https://bdtechtalks.com/2025/04/09/demystifying-vibe-coding/">vibe coding</a>,&#8221; the process of using conversational AI prompts to generate software on the fly. For the first hour of a project, the experience feels like magic. You type a sentence, code appears on the screen, and the application runs.</p><p>But unstructured chat hits a hard and painful ceiling. Vibe coding tends to max out and collapse under its own context drift when the codebase grows beyond several thousand lines of code.</p><p>The fundamental problem is that chat context is ephemeral. As a project grows, the AI must balance new feature requests against existing architectural rules. In a chat-based interface, instructions, early architectural decisions, and bug-fix logic get compressed and eventually scroll away. Once the AI loses that context, the system&#8217;s architecture breaks down. The AI starts hallucinating functions, breaking dependencies, and leaving developers with a brittle codebase they no longer fully understand.</p><p><a href="https://github.com/cluesmith/codev">Codev</a>, an open-source platform designed to orchestrate AI coding tools, flips this paradigm with a concept called &#8220;Context-Driven Development.&#8221; Instead of relying on chat logs to guide the AI, Codev requires developers to treat natural language specifications as the true source code. These specifications are checked into Git alongside the software, allowing the AI&#8217;s instructions to be versioned, reviewed, and maintained with the same rigor as the code itself.</p><h2>The AI chief of staff</h2><p>To manage this spec-first process, Codev shifts developers away from using AI merely as a smart autocomplete. Instead, it pushes teams into a framework where human developers act as directors, orchestrating specialized AI agents that, in turn, coordinate other agents.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The system relies on an Architect-Builder pattern. The human developer acts as the client commissioning the software. An Architect agent acts as the project manager, and autonomous Builder agents work in parallel to actually write the code.</p><p>&#8220;Imagine you&#8217;re trying to commission a building. You would interact with the architect, and the architect would interact with the builders,&#8221; Waleed Kadous, the primary developer behind Codev, told TechTalks. &#8220;In the ideal case you have a large team of builders working in parallel and they&#8217;ll come back to the architect if they need a final check on their work or if they get stuck.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3eAL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F897cfcd3-4dab-4bf3-bd4f-52bf021323e5_2048x1117.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3eAL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F897cfcd3-4dab-4bf3-bd4f-52bf021323e5_2048x1117.png 424w, https://substackcdn.com/image/fetch/$s_!3eAL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F897cfcd3-4dab-4bf3-bd4f-52bf021323e5_2048x1117.png 848w, https://substackcdn.com/image/fetch/$s_!3eAL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F897cfcd3-4dab-4bf3-bd4f-52bf021323e5_2048x1117.png 1272w, https://substackcdn.com/image/fetch/$s_!3eAL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F897cfcd3-4dab-4bf3-bd4f-52bf021323e5_2048x1117.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3eAL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F897cfcd3-4dab-4bf3-bd4f-52bf021323e5_2048x1117.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/897cfcd3-4dab-4bf3-bd4f-52bf021323e5_2048x1117.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3eAL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F897cfcd3-4dab-4bf3-bd4f-52bf021323e5_2048x1117.png 424w, https://substackcdn.com/image/fetch/$s_!3eAL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F897cfcd3-4dab-4bf3-bd4f-52bf021323e5_2048x1117.png 848w, https://substackcdn.com/image/fetch/$s_!3eAL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F897cfcd3-4dab-4bf3-bd4f-52bf021323e5_2048x1117.png 1272w, https://substackcdn.com/image/fetch/$s_!3eAL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F897cfcd3-4dab-4bf3-bd4f-52bf021323e5_2048x1117.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In this setup, the Architect agent gathers choices, reviews the Builders&#8217; progress, and surfaces only the critical decisions to the human developer in a &#8220;Needs Attention&#8221; queue.</p><p>&#8220;Just like a real architect, it gathers all the choices to be made and then helps you make them, offering its suggestions with an eye to the project as a whole. But you, as the person commissioning the building, make the final call, and if you want to inspect every brick, you can,&#8221; Kadous said.</p><h2>Erasing workspace fragmentation</h2><p>Previous iterations of AI agent workflows were highly fragmented. Developers had to juggle their primary code editor, a browser tab for GitHub to check pull requests, and multiple terminal windows just to monitor what their autonomous agents were doing.</p><p>Codev 3.0 fixes this context switching by bringing the entire ecosystem directly into the integrated development environment (IDE). With a newly introduced VS Code extension, the agent terminals run natively inside the editor. A single sidebar shows the builders, backlog, pull requests, and the &#8220;Needs Attention&#8221; list. When an agent references a specific file or function during a task, clicking it opens the exact line of code instantly.</p><p>The 3.0 release also introduces a modular &#8220;forge&#8221; abstraction. Forges are repository management platforms like GitHub, GitLab, or Gitea. Historically, integrating AI agents with these platforms required hard-coding API calls for each specific service. Codev abstracts these platforms into a standardized set of 17 distinct operations, such as creating an issue, reading comments, or merging a pull request.</p><p>Because the AI sees &#8220;the forge&#8221; as a single skill with identical commands, teams can mix and match their stack without breaking the AI&#8217;s workflow. For example, a team can run a hybrid setup that uses Linear for bug tracking and GitHub for pull requests.</p><p>The agent&#8217;s real context lives in the repository itself, stored as specifications and plans in version control. The forge simply supplies the live operational data through a single API layer, meaning the AI frontend never needs to know which underlying platform is configured.</p><h2>Forcing discipline onto autonomous agents</h2><p>While frontier AI models are highly capable coders, they lack inherent discipline. Left to their own devices, autonomous agents will often take shortcuts, skip writing tests, or ignore the overarching system architecture to quickly solve the immediate prompt.</p><p>To prevent agents from going off the rails, Codev uses an orchestrator named &#8220;porch&#8221; to act as a sheriff, forcing models to adhere to strict, deterministic workflows.</p><p>&#8220;As one of the AIs themselves told me, they&#8217;re good at coding but bad at discipline,&#8221; Kadous said. &#8220;If the agent doesn&#8217;t do this, it&#8217;s not allowed to advance to the next stage of work, and it&#8217;s told to try again.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x89z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38351e8-e100-469e-94db-6986b7aa65f9_1400x781.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x89z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38351e8-e100-469e-94db-6986b7aa65f9_1400x781.png 424w, https://substackcdn.com/image/fetch/$s_!x89z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38351e8-e100-469e-94db-6986b7aa65f9_1400x781.png 848w, https://substackcdn.com/image/fetch/$s_!x89z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38351e8-e100-469e-94db-6986b7aa65f9_1400x781.png 1272w, https://substackcdn.com/image/fetch/$s_!x89z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38351e8-e100-469e-94db-6986b7aa65f9_1400x781.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x89z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38351e8-e100-469e-94db-6986b7aa65f9_1400x781.png" width="1400" height="781" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f38351e8-e100-469e-94db-6986b7aa65f9_1400x781.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:781,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!x89z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38351e8-e100-469e-94db-6986b7aa65f9_1400x781.png 424w, https://substackcdn.com/image/fetch/$s_!x89z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38351e8-e100-469e-94db-6986b7aa65f9_1400x781.png 848w, https://substackcdn.com/image/fetch/$s_!x89z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38351e8-e100-469e-94db-6986b7aa65f9_1400x781.png 1272w, https://substackcdn.com/image/fetch/$s_!x89z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff38351e8-e100-469e-94db-6986b7aa65f9_1400x781.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The flagship workflow enforced by the sheriff is the SPIR protocol, which requires agents to walk through four strict phases:</p><p>- <strong>Specify: </strong>Define exactly why and what is being built in clear natural language.</p><p>- <strong>Plan: </strong>Break the specification down into how it will be built.</p><p>- <strong>Implement: </strong>Write the code, write the tests, and verify the requirements.</p><p>- <strong>Review:</strong> Ensure the code meets the quality bar.</p><p>At various stages of this protocol, Codev invokes a three-way, multi-model review. Different AI models have entirely different analytical blind spots. For instance, testing has shown that OpenAI&#8217;s Codex excels at catching edge cases and security surface area, while Anthropic&#8217;s Claude is better at spotting runtime semantics and protocol-level mistakes, and Google&#8217;s Gemini excels at overarching architecture.</p><p>As Kadous recounted, during a recent Codev development sprint, Codex flagged a Unix socket that was created without restrictive permissions, a flaw that would allow any local user on the machine to hijack a shell session. Both Claude and Gemini missed it. However, later in the same project, Claude caught an OAuth vulnerability where a secret validation token was placed on the wrong URL, opening the door to a severe cross-site request forgery attack. Both Codex and Gemini missed that vulnerability entirely.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Rather than relying on one model&#8217;s perspective, Codev brings in all three to review the code and render an independent opinion: Approve, Comment, or Request Changes.</p><p>If a reviewing model requests changes, the original Builder agent can push back in a &#8220;rebuttal-and-re-iterate&#8221; loop. It can either implement the request or debate the reviewer.</p><p>&#8220;When models fundamentally disagree, Codev doesn&#8217;t compute a winner. First there are a few rounds of negotiation between the agents, but if that fails, it surfaces the disagreement and escalates it to a person, because that disagreement is exactly the kind of thing a human should look at,&#8221; Kadous said. Averaging away the disagreement would throw out the most useful signal the system generates.</p><h2>The hard truth of context-driven development</h2><p>Spec-first development feels unnatural to developers accustomed to instant chat output. It asks developers to slow down at the exact moment they are most eager to see code execution.</p><p>&#8220;Every instinct trained by chat says you&#8217;re wasting time,&#8221; Kadous said. &#8220;So the hurdle isn&#8217;t intellectual; it&#8217;s learning to trust a process that front-loads the discipline before you&#8217;ve seen it pay off.&#8221;</p><p>But the data shows the discipline does pay off for larger projects. The Codev team ran a controlled experiment comparing the SPIR protocol against unstructured prompting with Claude Code using the same prompt and the same underlying model. SPIR scored 1.2 points higher overall, as judged by independent AI reviewers.</p><p>More importantly, the rigorous process excelled at the unglamorous tasks that separate a quick demo from shippable software. The SPIR protocol delivered roughly three times the test coverage and significantly better deployment readiness.</p><p>The catch is the cost. Adopting this rigid structure took roughly 3.7 times longer to execute and cost three to five times more in compute tokens.</p><p>The conclusion is pragmatic: vibe coding is genuinely the right call for throwaway weekend prototypes. But the structure of Context-Driven Development earns its keep when a team has to maintain the software long-term. Using this method, Codev has maintained productivity on codebases scaling up to 200,000 lines of source code.</p><h2>Guardrails and human-in-the-loop gates</h2><p>To safely scale this process across a team, Codev 3.0 decoupled autonomous builders from single branches. Historically, an AI agent would operate on one branch and issue one pull request. Now, a persistent workspace generates a sequence of pull requests over a feature&#8217;s life, starting with a pull request for the specification, then the plan, and finally the code implementation.</p><p>This multi-PR approach allows human teammates to review and tweak the AI&#8217;s intent before it wastes compute tokens writing code. Kadous points to a recent feature he built with a frontend-focused teammate.</p><p>&#8220;Usually you have one reviewer for the spec,&#8221; Kadous said. But with the 3.0 features, he can stop at the spec stage, and his colleague can review the spec along with him to make sure he&#8217;s good with it before implementing the code. He left comments on the specification in GitHub, and the Architect agent read those comments and modified the spec to address his concerns across both the frontend and backend.</p><p>When the Builder agents finally do write code, they execute entirely inside isolated Git worktrees. A worktree is an isolated directory linked to the main repository. If an autonomous agent fails, hallucinates, or thrashes, the damage is contained entirely within its own sandbox. The main tree remains untouched.</p><p>At the end of the line, critical merge gates cannot be bypassed by AI.</p><p>&#8220;Approving a gate needs a command with an explicit flag whose literal name says a human approved it, and no code path can supply that flag automatically,&#8221; Kadous said. &#8220;The builder prompt forbids self-approval.&#8221;</p><h2>The evolution toward hybrid teams</h2><p>Codev is pushing the boundary toward true hybrid teams where AI agents actively coordinate tasks alongside human colleagues, rather than just acting as passive subordinates waiting for a prompt.</p><p>In the near future, software will become increasingly self-improving. Kadous anticipates systems where the AI automatically clusters user feedback and bug reports, translates them into actionable issues, and autonomously spawns its own builder agents to investigate and draft fixes.</p><p>Ultimately, the goal is not to remove humans from the loop, but to elevate their role from line-by-line coding to engineering oversight.</p><p>&#8220;I still feel the &#8216;humans vs machines&#8217; framing is naive and simplistic, and the question should be what humans and machines can do together that neither of them could do alone,&#8221; Kadous said.</p>]]></content:encoded></item><item><title><![CDATA[Scaling the harness: The next major bottleneck in agentic AI]]></title><description><![CDATA[Scaling LLMs hits limits when dealing with agentic AI tasks. For that, we need to look at the harness and the system built around the model(s).]]></description><link>https://bdtechtalks.substack.com/p/scaling-the-harness-the-next-major</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/scaling-the-harness-the-next-major</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 02 Jun 2026 17:20:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ji21!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9021e03b-1749-4577-aeb0-7b8d6c98e41f_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ji21!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9021e03b-1749-4577-aeb0-7b8d6c98e41f_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ji21!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9021e03b-1749-4577-aeb0-7b8d6c98e41f_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ji21!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9021e03b-1749-4577-aeb0-7b8d6c98e41f_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ji21!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9021e03b-1749-4577-aeb0-7b8d6c98e41f_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ji21!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9021e03b-1749-4577-aeb0-7b8d6c98e41f_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ji21!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9021e03b-1749-4577-aeb0-7b8d6c98e41f_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9021e03b-1749-4577-aeb0-7b8d6c98e41f_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ji21!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9021e03b-1749-4577-aeb0-7b8d6c98e41f_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ji21!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9021e03b-1749-4577-aeb0-7b8d6c98e41f_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ji21!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9021e03b-1749-4577-aeb0-7b8d6c98e41f_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ji21!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9021e03b-1749-4577-aeb0-7b8d6c98e41f_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The dominant narrative of AI progress in recent years has been model building larger models and feeding them more data. But for long-horizon agentic AI, model scaling alone is not the ultimate solution.</p><p>AI agents don&#8217;t get their abilities from next-token prediction alone but from the system that wraps around the model to translate its answers into real-world behavior.</p><p>The next major bottleneck in agentic AI is &#8220;system scaling,&#8221; or scaling the &#8220;harness,&#8221; according to a <a href="https://arxiv.org/abs/2605.26112">new paper</a> from UC Berkeley. This approach treats the structured execution layer as a first-class object of design and optimization. As the author notes, &#8220;The dominant story of recent AI progress has been model scaling... For agentic AI, this story is now incomplete&#8221;.</p><p>Furthermore, &#8220;Once foundation models are embedded into tools, terminals, browsers, repositories, memory stores, and external services, their behavior is no longer determined by the model alone. It is determined by a system.&#8221;</p><p>This means when building and evaluating AI systems, we should move beyond just evaluating the model and look at the entire system and scaffolding as a whole and make sure every aspect is optimized for optimal performance.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Deconstructing the AI harness</h2><p>Modern agent frameworks operate as robust system infrastructures rather than basic prompt wrappers. Under the hood, an agentic system can be factored into six interacting components. These components include:</p><ul><li><p>A reasoning substrate that includes one or more LLMs</p></li><li><p>A memory store that keeps track of long-term information that the agent needs</p></li><li><p>A context constructor that builds and cleans up the information included in the model&#8217;s context window</p></li><li><p>A skill-routing layer that decides which skills should be involved in each substep of the agent&#8217;s solution</p></li><li><p>An orchestration loop that manages the sequence of operations and coordinates how the context constructor draws from memory to feed the foundation model</p></li><li><p>A verification-and-governance layer that acts as a gatekeeper for both intermediate reasoning and external actions, managing permissions, audit trails, and rollbacks, ensuring that outputs are checked before they are allowed to affect the live environment or be written back into the persistent memory store</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yDZW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8712f5-03cd-4e0e-bce2-01d58aab6e37_696x273.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yDZW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8712f5-03cd-4e0e-bce2-01d58aab6e37_696x273.png 424w, https://substackcdn.com/image/fetch/$s_!yDZW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8712f5-03cd-4e0e-bce2-01d58aab6e37_696x273.png 848w, https://substackcdn.com/image/fetch/$s_!yDZW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8712f5-03cd-4e0e-bce2-01d58aab6e37_696x273.png 1272w, https://substackcdn.com/image/fetch/$s_!yDZW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8712f5-03cd-4e0e-bce2-01d58aab6e37_696x273.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yDZW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8712f5-03cd-4e0e-bce2-01d58aab6e37_696x273.png" width="696" height="273" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df8712f5-03cd-4e0e-bce2-01d58aab6e37_696x273.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:273,&quot;width&quot;:696,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;AI harness&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="AI harness" title="AI harness" srcset="https://substackcdn.com/image/fetch/$s_!yDZW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8712f5-03cd-4e0e-bce2-01d58aab6e37_696x273.png 424w, https://substackcdn.com/image/fetch/$s_!yDZW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8712f5-03cd-4e0e-bce2-01d58aab6e37_696x273.png 848w, https://substackcdn.com/image/fetch/$s_!yDZW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8712f5-03cd-4e0e-bce2-01d58aab6e37_696x273.png 1272w, https://substackcdn.com/image/fetch/$s_!yDZW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf8712f5-03cd-4e0e-bce2-01d58aab6e37_696x273.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While different agent harnesses are built for different purposes and audiences, they usually converge on the same main components. Claude Code acts as a vendor coding tool, OpenClaw operates as a multi-channel personal assistant, and CheetahClaws serves as an open-source research reference harness. Yet, all three must actively manage context governance, memory storage, and skill routing.</p><p>The structural differences between these harnesses are driven by their specific deployment priorities, such as enterprise reliability versus open-source reproducibility, rather than changes to the underlying foundation model.</p><p>## The three layers of agent time</p><p>An AI system operates across distinct temporal scales to manage complex, long-term tasks. You can break this architecture down into three functional layers: prompts, skills, and memory.</p><p>Prompts operate strictly on a local, short-horizon timescale to define immediate goals. They dictate what the model should focus on in the current moment. However, prompts are fragile over extended execution lengths and transfer poorly to new scenarios.</p><p><a href="https://bdtechtalks.com/2025/10/20/anthropic-agent-skills/">Skills</a> operate on a task-level timescale as reusable execution patterns or tool workflows. Think of a skill as a predefined routine for searching a database or editing a configuration file. While highly effective for executing specific tasks, skills introduce new challenges. When an agent has dozens of skills, chaining them to solve ambiguous problems introduces routing, composition, and delegation challenges.</p><p><a href="https://bdtechtalks.com/2025/08/31/ai-agent-memory-frameworks/">Memory</a> functions as the longitudinal layer that preserves facts across sessions. It allows an agent to remember user preferences or project architectures weeks after the first interaction. However, memory is vulnerable to degradation, contamination, and over-generalization over time.</p><h2>The three major bottlenecks of system scaling</h2><p>Building a reliable autonomous agent requires solving three specific engineering roadblocks: context governance, trustworthy memory, and dynamic skill routing.</p><p>Expanding context capacity does not fix relevance. Unfiltered inputs create signal dilution, causing the model to suffer from an &#8220;exposure without access&#8221; failure where it misses crucial data buried in padding. Even LLMs that support million-token context windows suffer from context rot when their prompts become long and filled with conflicting information and data that are not relevant to the task at hand.</p><p>Context assembly must act as a strict selection policy that optimizes for a minimum sufficient context. &#8220;The hard problem of context is not capacity, but governance,&#8221; the paper notes, adding that &#8220;Long context does not indicate good context; tokens added without governance often degrade performance rather than improve it&#8221;.</p><p>Real-world tools prevent context flooding by employing aggressive management mechanisms. Recent <a href="https://medium.com/data-science-collective/everyone-analyzed-claude-codes-features-nobody-analyzed-its-architecture-1173470ab622">architectural analyses</a> of Claude Code reveal a five-tier compaction system. This includes routines like &#8220;micro-compact&#8221; for cleaning up old tool results and &#8220;context collapse&#8221; for summarizing long dialogue spans.</p><p>Furthermore, when tools emit massive text outputs (e.g., an endless server error log) the system avoids token bloating by writing the full file to the local disk and supplying only an 8-kilobyte preview to the LLM. This forces the model to act like a human developer, scanning the top of the log and only digging deeper if necessary.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Agent memory faces a completely different challenge: the &#8220;stale-but-confident&#8221; threat. This occurs when an agent erroneously tracks high-ranking semantic results that are completely outdated due to silent external drift. An agent might read an old note about how a codebase is structured, fail to realize the code was refactored yesterday, and confidently break the application. &#8220;The hard problem of agent memory is not storage, but trust,&#8221; the paper warns. Memory trust can only be sustained via just-in-time verification against the live environment.</p><p>Structurally, this is achieved via a &#8220;skeptical memory&#8221; framework. In Claude Code, an index file like MEMORY.md is treated strictly as an unverified pointer or hint. The agent is programmatically forced to verify the memory&#8217;s claims against the live file system before taking any destructive action. Additionally, systems maintain long-term memory hygiene by running background daemons like autoDream during idle times to resolve contradictions, compress insights, and bound memory growth before the agent degrades.</p><p>Finally, multi-agent configurations suffer from a &#8220;confident-but-unchecked&#8221; vulnerability. In this scenario, a specialized routing branch outputs highly plausible but completely unverified answers. &#8220;The hard problem of skill is not having skills, but routing and checking them,&#8221; the paper warns. Harness engineering must tie skill selection directly to explicit post-condition checks to guarantee reliability.</p><h2>Evaluating and governing the evolving agent</h2><p>Evaluating systems solely via <a href="https://bdtechtalks.com/2025/12/15/why-ai-benchmarks-are-broken/">one-shot outcome metrics</a>, such as simple task success rates, masks hidden systemic liabilities. These hidden liabilities include high token costs, excessive tool-call latencies, and high retry errors. In a real-world deployment, an agent that brute-forces its way to a solution through endless retries wastes compute and risks API rate-limiting. Evaluation protocols must integrate process metrics that measure trajectory hygiene, verification overhead, and context efficiency over extended steps.</p><p>Multi-agent arrangements can open parallel processing windows, but genuine collaboration breaks down without a standardized communication layer for state sharing, contradiction spotting, and uncertainty reporting. &#8220;A one-shot evaluation cannot reveal whether an agent&#8217;s memory becomes more useful, more noisy, or more dangerous over repeated use,&#8221; the paper notes.</p><p>To safeguard agents against persistent threats like memory poisoning and goal manipulation, a concrete governance standard must define exactly what persists, what updates, and what leaves an unalterable audit trail. &#8220;Without such standards, many so-called learning agents risk becoming opaque accumulations of prompts, notes, and heuristics rather than reliable adaptive systems,&#8221; the paper writes.</p><p>While raw frontier-model reasoning capabilities remain indispensable, model capability alone is no longer an adequate baseline for evaluating or predicting agent success. &#8220;Agentic AI is moving from isolated model inference to persistent system execution,&#8221; per the paper.</p><p>The long-term roadmap of the AI sector will be defined by how securely and efficiently systems manage what the model remembers, what it retrieves, what actions it permits, and what it leaves fully auditable. &#8220;Scaling the harness, alongside scaling the model, defines the next major bottleneck of agentic AI,&#8221; the paper concludes.</p>]]></content:encoded></item><item><title><![CDATA[What makes Cursor's Composer 2.5 a good coding model (and what are the caveats)]]></title><description><![CDATA[A deep look at the self-distillation techniques that make Composer 2.5 such a great coding model (and the hidden tradeoffs they introduce to AI reasoning).]]></description><link>https://bdtechtalks.substack.com/p/what-makes-cursors-composer-25-a</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/what-makes-cursors-composer-25-a</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 26 May 2026 13:07:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xm9V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a4d5e0b-64de-415b-bcef-9a9ced1779d2_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xm9V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a4d5e0b-64de-415b-bcef-9a9ced1779d2_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xm9V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a4d5e0b-64de-415b-bcef-9a9ced1779d2_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xm9V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a4d5e0b-64de-415b-bcef-9a9ced1779d2_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xm9V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a4d5e0b-64de-415b-bcef-9a9ced1779d2_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xm9V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a4d5e0b-64de-415b-bcef-9a9ced1779d2_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xm9V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a4d5e0b-64de-415b-bcef-9a9ced1779d2_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a4d5e0b-64de-415b-bcef-9a9ced1779d2_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xm9V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a4d5e0b-64de-415b-bcef-9a9ced1779d2_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xm9V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a4d5e0b-64de-415b-bcef-9a9ced1779d2_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xm9V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a4d5e0b-64de-415b-bcef-9a9ced1779d2_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xm9V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a4d5e0b-64de-415b-bcef-9a9ced1779d2_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The prevailing Silicon Valley narrative assumes that massive, general-purpose frontier models will inevitably eat every industry vertical. Companies are pouring billions into training behemoths like OpenAI&#8217;s GPT-5.5 and Anthropic&#8217;s Opus 4.7, expecting raw parameter scale to solve all domain-specific problems.</p><p>In software engineering, the reality on the ground looks different. Writing, refactoring, and debugging code consumes a massive volume of tokens. For the vast majority of daily engineering tasks (e.g., adding features, fixing bugs, and updating tests) speed and cost matter as much as raw intelligence.</p><p>This economic pressure has driven developers toward specialized coding agents. Cursor&#8217;s newly released <a href="https://cursor.com/blog/composer-2-5">Composer 2.5 model</a> has rapidly become the daily default for many engineers. At $0.50 per million input tokens and $2.50 per million output tokens, it makes high-volume agentic loops financially viable for small teams.</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/lubinho_k/status/2057809475994812907&quot;,&quot;full_text&quot;:&quot;I am on the $200 Claude, $100 Codex, $20 Cursor Plan.\n\nAfter using Composer 2.5 for 8 hours straight while only using 8% of my $20 plan, I should reconsider my entire subscription stack.\n\nMaybe $100 Codex for complex stuff, and $60 Cursor for UI &amp;amp; Copy? &quot;,&quot;username&quot;:&quot;lubinho_k&quot;,&quot;name&quot;:&quot;Luckforest&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/2026687105624322053/nVP7IbaU_normal.jpg&quot;,&quot;date&quot;:&quot;2026-05-22T13:03:05.000Z&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/HI7O0AEa0AA6i0s.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/ajEoPrRaoj&quot;}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:207,&quot;retweet_count&quot;:33,&quot;like_count&quot;:1605,&quot;impression_count&quot;:133717,&quot;expanded_url&quot;:null,&quot;video_url&quot;:null,&quot;belowTheFold&quot;:false}" data-component-name="Twitter2ToDOM"></div><p>Composer 2.5 is not perfect. On very complex tasks and edge cases, it still doesn&#8217;t match the power of frontier models like Opus 4.7 and GPT-5.5.</p><p>Yet the core achievement of Composer 2.5 remains intact. It demonstrates that specialized models do not need a larger parameter count to compete at the highest level. They need smarter post-training. By shifting the focus to algorithmic efficiency, Cursor is democratizing powerful agentic coding.</p><p>So, how did Cursor manage to create a model that is so damn good? Here&#8217;s what we know.</p><h2>The credit assignment problem and targeted RL</h2><p>Training a model to write code over long horizons introduces a major &#8220;credit assignment problem.&#8221; In standard reinforcement learning (RL), an agent interacts with an environment, takes a series of actions, and receives a reward at the end.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Imagine a coding agent writing a 500-line script that requires 10 different tool calls, such as searching the codebase, reading files, and executing tests. If the agent does all the substeps correctly but fails because of calling a nonexistent tool, the system assigns a single negative reward for the entire session. The model receives a zero. Because the feedback is delayed and sparse, the model has no way of knowing which specific token or action caused the failure. It might alter parts of its behavior that were perfectly fine, degrading its overall capability. The longer the trajectory, the sparser the training signal becomes.</p><p>Composer 2.5 solves this through what the company&#8217;s blog post calls &#8220;targeted RL with textual feedback.&#8221; Instead of waiting for the end of a rollout to penalize the model, the system intervenes exactly where the mistake occurs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Cwhn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04e183a-f63b-4dd4-ab97-bb66a970c7ed_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Cwhn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04e183a-f63b-4dd4-ab97-bb66a970c7ed_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!Cwhn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04e183a-f63b-4dd4-ab97-bb66a970c7ed_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!Cwhn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04e183a-f63b-4dd4-ab97-bb66a970c7ed_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!Cwhn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04e183a-f63b-4dd4-ab97-bb66a970c7ed_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Cwhn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04e183a-f63b-4dd4-ab97-bb66a970c7ed_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a04e183a-f63b-4dd4-ab97-bb66a970c7ed_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Cwhn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04e183a-f63b-4dd4-ab97-bb66a970c7ed_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!Cwhn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04e183a-f63b-4dd4-ab97-bb66a970c7ed_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!Cwhn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04e183a-f63b-4dd4-ab97-bb66a970c7ed_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!Cwhn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa04e183a-f63b-4dd4-ab97-bb66a970c7ed_1920x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When the agent makes a bad tool call during a long trajectory, the training pipeline momentarily pauses the sequence. It injects a local textual hint directly into the context, such as &#8220;Reminder: Available tools are [list of tools].&#8221; This gives the model a corrected probability map of what it should generate next, guided by the hint.</p><p>The system then applies the Kullback-Leibler (KL) divergence loss, which measures how far the model&#8217;s original prediction strayed from the corrected teacher distribution. The model adjusts its internal weights to pull its probabilities closer to the corrected path. Once the correction is made, the training resumes. This localized signal teaches the model exactly how to fix a specific behavior without spoiling the broader reinforcement learning objective over the full trajectory.</p><h2>Under the hood: OPSD vs. OPD and the cost of specialization</h2><p>To understand how Composer 2.5 achieves its economics, you need to look at two research papers on self-distillation referenced at the bottom of the blog post.</p><p>Distillation is a technique where a smaller, cheaper &#8220;student&#8221; model learns to mimic the outputs of a larger, more expensive &#8220;teacher&#8221; model.</p><p>Standard on-policy distillation (OPD) is highly effective but extremely expensive. It requires the massive teacher model (e.g., Claude Opus 4.7 or GPT-5.5) to actively run in parallel with the student. As the student generates its own trajectories (exploring different ways to solve a problem), the teacher evaluates every single step to provide supervision. Generating millions of tokens through a massive teacher model for every training run requires an enormous compute budget. It forces AI labs to choose between high-quality supervision and reasonable training costs.</p><p><a href="https://arxiv.org/abs/2601.18734">On-policy self-distillation</a> (OPSD) bypasses the costs of distillation by using the same model as both the student and the teacher.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1zhV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d10e89-f92e-41e3-b6b3-a0a2a7497822_1600x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1zhV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d10e89-f92e-41e3-b6b3-a0a2a7497822_1600x684.png 424w, https://substackcdn.com/image/fetch/$s_!1zhV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d10e89-f92e-41e3-b6b3-a0a2a7497822_1600x684.png 848w, https://substackcdn.com/image/fetch/$s_!1zhV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d10e89-f92e-41e3-b6b3-a0a2a7497822_1600x684.png 1272w, https://substackcdn.com/image/fetch/$s_!1zhV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d10e89-f92e-41e3-b6b3-a0a2a7497822_1600x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1zhV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d10e89-f92e-41e3-b6b3-a0a2a7497822_1600x684.png" width="1456" height="622" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54d10e89-f92e-41e3-b6b3-a0a2a7497822_1600x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:622,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1zhV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d10e89-f92e-41e3-b6b3-a0a2a7497822_1600x684.png 424w, https://substackcdn.com/image/fetch/$s_!1zhV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d10e89-f92e-41e3-b6b3-a0a2a7497822_1600x684.png 848w, https://substackcdn.com/image/fetch/$s_!1zhV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d10e89-f92e-41e3-b6b3-a0a2a7497822_1600x684.png 1272w, https://substackcdn.com/image/fetch/$s_!1zhV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d10e89-f92e-41e3-b6b3-a0a2a7497822_1600x684.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Instead of calling an external oracle, OPSD leverages the model&#8217;s inherent ability to understand context. When provided with privileged in-context information (like the localized text hints used in targeted RL), the model&#8217;s next-token predictions instantly improve. The system uses the model&#8217;s hint-assisted output as the &#8220;teacher&#8221; target, and forces the standard, unassisted version of the model to match those probabilities. The student learns to internalize the logic of the hint without needing the hint present at inference time.</p><p>This self-contained teaching loop eliminates the need for an external frontier model during the RL phase and makes the training much more efficient.</p><p>There is a catch to this efficiency. While inference becomes incredibly cheap, generating active, on-policy rollouts for training shifts the cost burden upstream. Training a model via self-distillation requires the system to constantly generate and evaluate its own output. This process demands roughly two to four times the floating-point operations (FLOPs) of standard supervised fine-tuning.</p><p>This compute shift explains the recent infrastructure moves in the AI coding space. Cursor recently formed a <a href="https://cursor.com/blog/spacex-model-training">partnership with SpaceXAI</a> to secure access to its massive compute cluster, applying millions of GPUs to the problem. The massive cost of intelligence has not disappeared; it has simply moved from the user&#8217;s API bill to the developer&#8217;s training cluster.</p><h2>The SDFT advantage: Continual learning without forgetting</h2><p>Software engineering is a highly dynamic field. New programming frameworks emerge monthly, APIs deprecate without warning, and individual companies maintain highly idiosyncratic codebases. A coding agent must learn these new patterns quickly.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The traditional approach to teaching a model new information is to fine-tune it on a dataset of the new material. However, large language models suffer from &#8220;catastrophic forgetting.&#8221; When you adjust a model&#8217;s weights to aggressively learn a new language or framework, it often overwrites the foundational logic and reasoning skills it learned during initial pre-training.</p><p><a href="https://venturebeat.com/orchestration/mits-new-fine-tuning-method-lets-llms-learn-new-skills-without-losing-old">Self-distillation fine-tuning</a> (SDFT) addresses this by creating a protective feedback loop during the learning process.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!muEe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775f0521-c53b-4c55-bdfb-fb1d13867aea_1600x893.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!muEe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775f0521-c53b-4c55-bdfb-fb1d13867aea_1600x893.jpeg 424w, https://substackcdn.com/image/fetch/$s_!muEe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775f0521-c53b-4c55-bdfb-fb1d13867aea_1600x893.jpeg 848w, https://substackcdn.com/image/fetch/$s_!muEe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775f0521-c53b-4c55-bdfb-fb1d13867aea_1600x893.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!muEe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775f0521-c53b-4c55-bdfb-fb1d13867aea_1600x893.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!muEe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775f0521-c53b-4c55-bdfb-fb1d13867aea_1600x893.jpeg" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/775f0521-c53b-4c55-bdfb-fb1d13867aea_1600x893.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!muEe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775f0521-c53b-4c55-bdfb-fb1d13867aea_1600x893.jpeg 424w, https://substackcdn.com/image/fetch/$s_!muEe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775f0521-c53b-4c55-bdfb-fb1d13867aea_1600x893.jpeg 848w, https://substackcdn.com/image/fetch/$s_!muEe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775f0521-c53b-4c55-bdfb-fb1d13867aea_1600x893.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!muEe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775f0521-c53b-4c55-bdfb-fb1d13867aea_1600x893.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When the model is introduced to new codebase patterns, it does not just blindly update its parameters based on the new text. First, the model generates its own reasoning pathways and explanations regarding the new data. The system then forces the model to distill its own generated logic. It evaluates how the new information integrates with the established rules of software development it already knows. By anchoring the training process to the model&#8217;s existing internal representations, SDFT constrains how much the core weights can shift.</p><p>The model acquires the new syntax and idiosyncratic developer patterns while preserving its baseline reasoning capabilities. It learns to adapt to a company&#8217;s specific coding style without forgetting how to execute fundamental software architecture.</p><h2>The danger zones: Information leakage and reward hacking</h2><p>Self-distillation and automated reinforcement learning democratize powerful agents, but they introduce severe alignment risks. When a model acts as its own supervisor, optimizing purely for self-generated rewards, the training process can quickly derail.</p>
      <p>
          <a href="https://bdtechtalks.substack.com/p/what-makes-cursors-composer-25-a">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How prompt injection broke Nvidia's sandboxed OpenClaw agent]]></title><description><![CDATA[Research into Nvidia&#8217;s NemoClaw reveals that sandboxes don't stop AI agents like OpenClaw from leaking data. We need to rethink security from first principles.]]></description><link>https://bdtechtalks.substack.com/p/how-prompt-injection-broke-nvidias</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/how-prompt-injection-broke-nvidias</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 19 May 2026 09:31:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!VFxF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5152c680-da4a-461e-b368-bfb327684946_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VFxF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5152c680-da4a-461e-b368-bfb327684946_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VFxF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5152c680-da4a-461e-b368-bfb327684946_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VFxF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5152c680-da4a-461e-b368-bfb327684946_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VFxF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5152c680-da4a-461e-b368-bfb327684946_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VFxF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5152c680-da4a-461e-b368-bfb327684946_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VFxF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5152c680-da4a-461e-b368-bfb327684946_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5152c680-da4a-461e-b368-bfb327684946_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VFxF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5152c680-da4a-461e-b368-bfb327684946_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VFxF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5152c680-da4a-461e-b368-bfb327684946_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VFxF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5152c680-da4a-461e-b368-bfb327684946_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VFxF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5152c680-da4a-461e-b368-bfb327684946_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The rapid adoption of autonomous AI agents like OpenClaw has introduced a fundamental security challenge: traditional defenses cannot predict what an LLM-driven application will do at runtime.</p><p>While containerization and virtual sandboxes isolate malicious execution from the underlying host machine, recent findings show that a sandbox alone cannot prevent an agent from being manipulated into leaking data or rewriting its own instructions.</p><p>Security firm Lasso recently disclosed <a href="https://www.lasso.security/blog/sandboxed-ai-agents-attack-surface">multiple vulnerabilities in NemoClaw</a>, Nvidia&#8217;s sandboxed environment for running OpenClaw. The research reveals that malicious actors can use subtle <a href="https://bdtechtalks.com/tag/prompt-injection-attacks/">prompt injection attacks</a> to exploit the autonomous nature of AI agents to distribute malware, bypass static detection filters, and persistently alter an agent&#8217;s core identity.</p><p>Because an agent&#8217;s execution path is determined dynamically by the text it reads, standard security measures are insufficient to protect the systems that host it.</p><h2>The promise of isolated agency in NemoClaw</h2><p>Nvidia designed <a href="https://www.nvidia.com/en-us/ai/nemoclaw/">NemoClaw</a> to provide a secure infrastructure for running OpenClaw. In a standard setup, an AI agent takes a user&#8217;s prompt, breaks it down into tasks, and executes code, installs packages, or calls APIs to achieve the goal. If an agent downloads an untrusted package or encounters a prompt injection attack, it could run malicious commands directly on the host system.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>NemoClaw addresses this problem by enforcing isolation. It deploys the agent within a dedicated sandbox using Docker or Kubernetes primitives. This configuration separates the host machine&#8217;s file system and network architecture from the environment where the agent executes code. If an agent is compromised or runs a destructive script, the impact remains confined inside the container.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6kUm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9285239-708e-42a1-a7de-9293a797a944_1440x1524.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6kUm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9285239-708e-42a1-a7de-9293a797a944_1440x1524.png 424w, https://substackcdn.com/image/fetch/$s_!6kUm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9285239-708e-42a1-a7de-9293a797a944_1440x1524.png 848w, https://substackcdn.com/image/fetch/$s_!6kUm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9285239-708e-42a1-a7de-9293a797a944_1440x1524.png 1272w, https://substackcdn.com/image/fetch/$s_!6kUm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9285239-708e-42a1-a7de-9293a797a944_1440x1524.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6kUm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9285239-708e-42a1-a7de-9293a797a944_1440x1524.png" width="1440" height="1524" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9285239-708e-42a1-a7de-9293a797a944_1440x1524.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1524,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6kUm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9285239-708e-42a1-a7de-9293a797a944_1440x1524.png 424w, https://substackcdn.com/image/fetch/$s_!6kUm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9285239-708e-42a1-a7de-9293a797a944_1440x1524.png 848w, https://substackcdn.com/image/fetch/$s_!6kUm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9285239-708e-42a1-a7de-9293a797a944_1440x1524.png 1272w, https://substackcdn.com/image/fetch/$s_!6kUm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9285239-708e-42a1-a7de-9293a797a944_1440x1524.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The sandboxed approach represents the industry&#8217;s primary strategy for running untrusted, LLM-generated code. By locking down the operating environment, developers operate under the assumption that the underlying infrastructure is insulated from the fluid and unpredictable behavior of the language model.</p><p>However, isolating the runtime environment does not change the fact that the agent inside still retains access to sensitive workspace files, configuration keys, and external network communication channels.</p><h2>Bypassing safeguards to leak secrets and poison the soul</h2><p>Lasso demonstrated two distinct attack vectors against the NemoClaw environment that exploit how agents autonomously handle external data and dependencies.</p><p>The first attack targets the egress boundary of the sandbox using dependency poisoning and obfuscated payloads. When an agent is tasked with a project (e.g., tracking cryptocurrency prices) it frequently needs to install third-party packages from repositories like npm or PyPI.</p><p>Malicious actors can publish packages that contain lifecycle hooks, such as postinstall or preinstall scripts. These scripts run automatically the moment the package manager downloads the dependency.</p><p>In the Lasso demonstration, the malicious package commanded the agent to read a configuration file containing internal access keys. To exfiltrate this file without triggering static security alarms, the researchers employed an emoji-encoding technique. By translating the sensitive data into strings of emojis, the payload successfully bypassed GitHub&#8217;s automated secret scanning algorithms and OpenClaw&#8217;s internal filters. Once the data reached its destination, the model easily decoded the non-standard characters, completing the exfiltration chain.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!peGu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb885e386-3d73-427b-b8d3-9bb5a59093cf_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!peGu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb885e386-3d73-427b-b8d3-9bb5a59093cf_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!peGu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb885e386-3d73-427b-b8d3-9bb5a59093cf_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!peGu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb885e386-3d73-427b-b8d3-9bb5a59093cf_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!peGu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb885e386-3d73-427b-b8d3-9bb5a59093cf_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!peGu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb885e386-3d73-427b-b8d3-9bb5a59093cf_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b885e386-3d73-427b-b8d3-9bb5a59093cf_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!peGu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb885e386-3d73-427b-b8d3-9bb5a59093cf_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!peGu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb885e386-3d73-427b-b8d3-9bb5a59093cf_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!peGu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb885e386-3d73-427b-b8d3-9bb5a59093cf_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!peGu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb885e386-3d73-427b-b8d3-9bb5a59093cf_2048x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Noy Pearl, an AI security researcher at Lasso, told TechTalks that emoji-encoding was simply one choice among many. &#8220;Emoji-encoding was the technique we chose in order to bypass GitHub static scans, there are many other ways we could have achieved this exfiltration,&#8221; Pearl said. &#8220;The issue is the fact that as long as the agent has connection to the outer world, no static mechanism can fully protect you.&#8221;</p><p>The second attack vector involves agent configuration poisoning. Autonomous agents rely on specific anchoring files within their workspace directory to define their rules of engagement, system instructions, operational boundaries, and long-term memory. In OpenClaw architectures, this file is often called SOUL.md. It acts as the cognitive blueprint for the agent, dictating how it should behave, make decisions, and respond to the user.</p><p>The researchers used indirect prompt injection to force the agent to modify its own SOUL.md file. By feeding the agent a malicious text file during a standard task, the embedded instructions overrode the agent&#8217;s core programming.</p><p>The agent then rewrote its memory file, embedding a persistent backdoor. Because the agent references SOUL.md at the start of every new session, the behavioral corruption remains active indefinitely, transforming a temporary injection into a permanent compromise.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QRCk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374a9028-f6f2-4c01-894a-87c876e5deed_1834x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QRCk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374a9028-f6f2-4c01-894a-87c876e5deed_1834x2048.png 424w, https://substackcdn.com/image/fetch/$s_!QRCk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374a9028-f6f2-4c01-894a-87c876e5deed_1834x2048.png 848w, https://substackcdn.com/image/fetch/$s_!QRCk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374a9028-f6f2-4c01-894a-87c876e5deed_1834x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!QRCk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374a9028-f6f2-4c01-894a-87c876e5deed_1834x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QRCk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374a9028-f6f2-4c01-894a-87c876e5deed_1834x2048.png" width="1456" height="1626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/374a9028-f6f2-4c01-894a-87c876e5deed_1834x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1626,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QRCk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374a9028-f6f2-4c01-894a-87c876e5deed_1834x2048.png 424w, https://substackcdn.com/image/fetch/$s_!QRCk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374a9028-f6f2-4c01-894a-87c876e5deed_1834x2048.png 848w, https://substackcdn.com/image/fetch/$s_!QRCk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374a9028-f6f2-4c01-894a-87c876e5deed_1834x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!QRCk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F374a9028-f6f2-4c01-894a-87c876e5deed_1834x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The non-sandboxed alternative and the IDEsaster threat</h2><p>While the NemoClaw sandbox left sensitive data exposed to egress manipulation, it represents a significantly more secure posture than the industry baseline. Most developer-oriented AI assistants and agentic coding platforms omit the sandboxing layer entirely. Instead, these tools execute code directly on the host operating system, inheriting the permissions, system privileges, and network access of the local user.</p><p>Security researchers have documented a broad class of vulnerabilities across non-sandboxed developer tools under the term &#8220;<a href="https://thehackernews.com/2025/12/researchers-uncover-30-flaws-in-ai.html">IDEsaster</a>.&#8221; This includes more than 30 separate common vulnerabilities and exposures (CVEs) across popular code editors and AI agents such as Cursor, Windsurf, Kiro, and Zed.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>In Cursor, for instance, the <a href="https://www.catonetworks.com/blog/curxecute-rce/">CurXecute</a> vulnerability allowed a prompt injection attack to chain directly into local code execution, resulting in full host compromise. Similarly, GitHub Copilot&#8217;s &#8220;YOLO mode&#8221; auto-approval toggle could be flipped silently via an injected .vscode/settings.json file. This exploit carried a critical CVSS severity score of 9.6 because it enabled automated arbitrary code execution on the user&#8217;s machine without warning.</p><p>When an agent operates without isolation, the risk shifts from simple data exfiltration to total machine takeover. Pearl warned that without a dedicated container layer, a compromised sandbox immediately leads to a compromised host.</p><p>&#8220;The attacker will be able to escalate their attack to the host&#8217;s filesystem and network - which on a developer&#8217;s machine usually means sensitive files, SSH keys, cloud credentials and tokens, browser cookies and any local service the computer has access to,&#8221; Pearl said.</p><p>He added that NemoClaw and OpenShell are unusual because they ship any kind of sandbox at all, noting that stripping that isolation away turns a simple credential exfiltration chain into full host compromise on the very first attack.</p><h2>Rethinking the shared responsibility model for AI agents</h2><p>Following Lasso&#8217;s disclosure, Nvidia stated that these attack scenarios fell outside the scope of their official bug bounty program because the sandbox behaved exactly as it was configured to run. This reaction highlights a growing friction point between infrastructure vendors and AI application developers regarding the traditional shared responsibility model.</p><p>In conventional software, a sandbox is considered successful if a guest program cannot break out into the host system. But when the guest program is an autonomous agent guided by an LLM, the threat actor is inside the perimeter. If the agent can be tricked by external text into volunteering secrets through an open network port, the integrity of the sandbox becomes secondary to the failure of behavioral control.</p><p>Pearl said that this type of response has become a recognizable pattern across the industry as vendors race to ship AI capabilities. &#8220;The sandbox behaved as configured is a fine argument when the thing running inside is a deterministic program,&#8221; Pearl said. &#8220;It doesn&#8217;t survive contact with LLM-driven agents, whose behavior is shaped at runtime by every piece of text they ingest.&#8221;</p><p>For software engineers integrating these technologies, this shifts the defensive burden. Builders cannot treat infrastructure platforms as inherently secure environments. Instead, threat models must adapt to assume that the LLM running inside the sandbox will occasionally be manipulated or operate as an inside threat actor.</p><p>Security strategies must move past simple boundary configuration and actively restrict or audit operations via strict runtime logging, human-in-the-loop checkpoints, and explicit permission policies.</p><h2>Implementing intent-based runtime security</h2><p>Traditional web defenses rely heavily on static allowlists, which match outbound domain names or IP addresses against an approved index. In an agentic environment, static allowlists fail because they only answer whether a connection to a specific domain is permitted. They cannot evaluate whether the data being transmitted is relevant or necessary for the task the user requested.</p><p>To secure autonomous systems without introducing prohibitive processing lag, Pearl suggests that engineers must deploy &#8220;intent-based&#8221; security guardrails specifically at the sandbox egress boundary. While internal activities like file edits or workspace execution require monitoring, they are inherently local and recoverable via snapshot restoration. Network exfiltration, by contrast, is irreversible the moment a data packet leaves the container.</p><p>In fact, in the emoji-based attack, NemoClaw found out about the malicious nature of the postinstall script after it ran it, but it was already too late and the sensitive information had already been sent to the attackers&#8217; server.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dbS4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96a2f806-7f80-4b6f-b190-0c3c63e093aa_2048x1001.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dbS4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96a2f806-7f80-4b6f-b190-0c3c63e093aa_2048x1001.png 424w, https://substackcdn.com/image/fetch/$s_!dbS4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96a2f806-7f80-4b6f-b190-0c3c63e093aa_2048x1001.png 848w, https://substackcdn.com/image/fetch/$s_!dbS4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96a2f806-7f80-4b6f-b190-0c3c63e093aa_2048x1001.png 1272w, https://substackcdn.com/image/fetch/$s_!dbS4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96a2f806-7f80-4b6f-b190-0c3c63e093aa_2048x1001.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dbS4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96a2f806-7f80-4b6f-b190-0c3c63e093aa_2048x1001.png" width="1456" height="712" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96a2f806-7f80-4b6f-b190-0c3c63e093aa_2048x1001.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:712,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dbS4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96a2f806-7f80-4b6f-b190-0c3c63e093aa_2048x1001.png 424w, https://substackcdn.com/image/fetch/$s_!dbS4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96a2f806-7f80-4b6f-b190-0c3c63e093aa_2048x1001.png 848w, https://substackcdn.com/image/fetch/$s_!dbS4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96a2f806-7f80-4b6f-b190-0c3c63e093aa_2048x1001.png 1272w, https://substackcdn.com/image/fetch/$s_!dbS4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96a2f806-7f80-4b6f-b190-0c3c63e093aa_2048x1001.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>An effective intent-based defense system relies on a two-layer verification architecture built directly into the container&#8217;s network egress controller. When an agent attempts an outbound request, the traffic must first pass through a static cross-referencing and provenance tracking layer.</p><p>This primary layer tracks the sensitive files (e.g., .env, secrets.json, or SOUL.md) that the agent read during the active session. If the payload contains data derived from a credential file and is bound for an unrecognized server, the engine blocks the packet immediately at policy-engine speeds without calling an LLM.</p><p>If the static pattern matching is inconclusive and cannot determine intent, the system triggers a reasoning-blind alignment check as a lightweight fallback judge. This secondary model inspects only two data points: the user&#8217;s original text prompt and the proposed outbound payload. The judge answers a narrow binary question: does this outbound payload logically align with the user&#8217;s explicit instruction?</p><p>Pearl said that this approach preserves the core autonomy of the agent where it matters most, keeping the execution path fast. &#8220;The vast majority of actions inside the sandbox don&#8217;t pay any latency cost,&#8221; Pearl said. &#8220;The intent check fires only when the agent tries to cross the egress boundary, which is exactly the moment a human would have wanted oversight anyway. The principle to lead with: spend your latency budget on irreversibility.&#8221;</p><h2>Balancing convenience and architectural control</h2><p>The dependency risks identified in Lasso&#8217;s research expose a gap in traditional software supply chain defenses. Standard practices emphasize &#8220;shift-left&#8221; security, which relies on signing packages, pinning versions, and running dependency scanners before software enters a production pipeline. These controls assume that developers can vet all source code prior to compilation.</p><p>Agentic systems break this assumption because the code execution path is determined dynamically at runtime based on the prompt context and the model&#8217;s intermediate planning decisions. Disabling package manager lifecycle hooks completely by using flags like --ignore-scripts is a necessary starting precaution, but it remains a shallow defense due to the sheer volume of language-specific installer hooks available.</p><p>A deeper structural challenge is the direct tradeoff between platform utility and total security lockdown. In theory, engineers could mitigate configuration poisoning by locking down identity files like SOUL.md, making them entirely read-only or modifiable only through an external command-line interface outside the agent&#8217;s reach.</p><p>However, allowing an agent to dynamically update its own instructions based on ongoing feedback is precisely the capability that makes agentic frameworks valuable to builders. Pearl said that the workflow where a user tells their agent to update its own soul is a core feature, not an accident. &#8220;Ultimately it&#8217;s the classic productivity-versus-security tradeoff: the more you lock identity edits behind an out-of-band CLI, the safer the agent - and the slower and clunkier the workflow users signed up for,&#8221; Pearl said.</p><p>Relying solely on human-in-the-loop (HITL) oversight to review every outbound SQL query, API call, or file edit introduces approval fatigue. Much like the explicit &#8220;YOLO modes&#8221; found in tools like <a href="https://bdtechtalks.com/2026/04/27/claude-code-api-token-leak/">Claude Code</a> or <a href="https://bdtechtalks.com/2022/07/05/github-copilot-large-language-model-product-management/">GitHub Copilot</a>, users inevitably bypass security prompts when repetitive confirmations slow down their development velocity. Looking forward, the architectural challenge for developers is to engineer system guardrails where the secure path requires no added friction, concentrating defensive resources precisely where an autonomous decision causes irreversible damage.</p>]]></content:encoded></item><item><title><![CDATA[Unpacking Gemma 4’s multi-token prediction (and why you should care)]]></title><description><![CDATA[How Gemma 4&#8217;s multi-token prediction and community-driven DFlash are speeding up local LLM throughput by 3-6x.]]></description><link>https://bdtechtalks.substack.com/p/unpacking-gemma-4s-multi-token-prediction</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/unpacking-gemma-4s-multi-token-prediction</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 12 May 2026 13:02:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!90dm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F190b5dc1-e10b-4417-86a7-3b4de263edd3_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!90dm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F190b5dc1-e10b-4417-86a7-3b4de263edd3_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!90dm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F190b5dc1-e10b-4417-86a7-3b4de263edd3_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!90dm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F190b5dc1-e10b-4417-86a7-3b4de263edd3_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!90dm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F190b5dc1-e10b-4417-86a7-3b4de263edd3_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!90dm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F190b5dc1-e10b-4417-86a7-3b4de263edd3_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!90dm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F190b5dc1-e10b-4417-86a7-3b4de263edd3_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/190b5dc1-e10b-4417-86a7-3b4de263edd3_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!90dm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F190b5dc1-e10b-4417-86a7-3b4de263edd3_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!90dm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F190b5dc1-e10b-4417-86a7-3b4de263edd3_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!90dm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F190b5dc1-e10b-4417-86a7-3b4de263edd3_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!90dm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F190b5dc1-e10b-4417-86a7-3b4de263edd3_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Google just made a major upgrade to its Gemma 4 family of open-weight LLMs that significantly improves the inference speed of the models.</p><p>By integrating <a href="https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/">multi-token prediction</a> (MTP) into its architecture, Gemma 4 breaks away from the traditional one-token-at-a-time approach, increasing throughput on consumer-grade hardware.</p><p>The efficiency of an LLM is often constrained by memory bandwidth rather than raw computational power. In standard autoregressive generation, a model predicts the next token, appends it to the sequence, and repeats the process. This requires moving the model&#8217;s massive weight matrices from memory to the processor for every single word. On local devices like MacBooks or PCs with limited VRAM speeds, this &#8220;memory wall&#8221; creates a hard limit on how fast the model can respond, regardless of the GPU&#8217;s clock speed.</p><p>Multi-token prediction changes this dynamic by predicting several tokens in parallel. Instead of asking the model &#8220;What is the next token?&#8221;MTP asks &#8220;What are the next n tokens?&#8221; By predicting multiple tokens at once, Gemma 4 leverages the parallel processing capabilities of modern GPUs more effectively, reducing the number of times the model weights must be fetched from memory. This results in a direct performance boost for the end user, making local interactions feel more instantaneous and fluid.</p><h2>How multi-token prediction accelerates inference</h2><p>To understand how Gemma 4 achieves this speedup, it is helpful to view the system as a partnership between a fast &#8220;drafter&#8221; and a high-quality &#8220;target&#8221; model. In a standard setup, the target model (i.e., the massive neural network containing the bulk of the parameters) does all the heavy lifting for each token it produces.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>In Gemma 4, the architecture includes smaller, specialized models that act as drafters. These drafters work ahead of the main model, guessing what the next few tokens might be.</p><p>Once the drafter proposes a sequence of tokens, the main model reviews them in a single pass. This is a form of parallel processing; because the main model already has the suggested tokens, it can use its &#8220;attention&#8221; mechanism to verify them all at once. Attention is the mathematical process the model uses to weigh the importance of different tokens in a prompt to understand context. Because verifying a sequence is computationally &#8220;cheaper&#8221; than generating it from scratch, the main model can confirm the entire sequence in roughly the same time it would normally take to generate just one token.</p><p>This verification process is not an &#8220;all-or-nothing&#8221; deal. If the drafter suggests four tokens and the main model agrees with the first three but finds the fourth unlikely, it accepts the three correct ones, generates its own fourth token, and then restarts the drafting process. This ensures that while the speed increases, the quality of the output remains identical to a traditional model. The user gets the intelligence of the large model with the speed of the smaller drafter.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NWuJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff43ae764-19c3-493b-add7-e0ea6bd8f25e_1000x562.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NWuJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff43ae764-19c3-493b-add7-e0ea6bd8f25e_1000x562.png 424w, https://substackcdn.com/image/fetch/$s_!NWuJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff43ae764-19c3-493b-add7-e0ea6bd8f25e_1000x562.png 848w, https://substackcdn.com/image/fetch/$s_!NWuJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff43ae764-19c3-493b-add7-e0ea6bd8f25e_1000x562.png 1272w, https://substackcdn.com/image/fetch/$s_!NWuJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff43ae764-19c3-493b-add7-e0ea6bd8f25e_1000x562.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NWuJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff43ae764-19c3-493b-add7-e0ea6bd8f25e_1000x562.png" width="1000" height="562" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f43ae764-19c3-493b-add7-e0ea6bd8f25e_1000x562.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:562,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NWuJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff43ae764-19c3-493b-add7-e0ea6bd8f25e_1000x562.png 424w, https://substackcdn.com/image/fetch/$s_!NWuJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff43ae764-19c3-493b-add7-e0ea6bd8f25e_1000x562.png 848w, https://substackcdn.com/image/fetch/$s_!NWuJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff43ae764-19c3-493b-add7-e0ea6bd8f25e_1000x562.png 1272w, https://substackcdn.com/image/fetch/$s_!NWuJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff43ae764-19c3-493b-add7-e0ea6bd8f25e_1000x562.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Experiments show that you can get up to 3x acceleration on both on-device and the larger versions of Gemma 4.</p><h2>Managing the overhead of speculative drafting</h2><p>Implementing MTP is not without its costs. Running additional drafter heads requires extra compute and memory. Google addressed this memory tax through an architecture that shares resources between the main model and the drafters.</p><p>A key component of this is KV cache sharing. The <a href="https://bdtechtalks.com/2026/02/23/llm-sparse-attention/">Key-Value (KV) cache</a> is the model&#8217;s short-term memory. It stores the attention values of previously seen tokens so the model doesn&#8217;t have to re-calculate them for new tokens. By allowing the main model and the drafters to use the same cache, Gemma 4 avoids duplicating data in the VRAM.</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/googlegemma/status/2051694045869879749&quot;,&quot;full_text&quot;:&quot;https://t.co/BvHkG5TaBF&quot;,&quot;username&quot;:&quot;googlegemma&quot;,&quot;name&quot;:&quot;Google Gemma&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/2038662245631320064/uWfEb6yw_normal.png&quot;,&quot;date&quot;:&quot;2026-05-05T16:02:33.000Z&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:21,&quot;retweet_count&quot;:151,&quot;like_count&quot;:1003,&quot;impression_count&quot;:147434,&quot;expanded_url&quot;:null,&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>Another optimization involves &#8220;shared target activations.&#8221; The drafters do not start their predictions from a blank slate. Instead, they use the internal representations (the &#8220;activations&#8221;) already calculated by the main model&#8217;s deeper layers. This means the drafters are essentially piggybacking on the work the main model has already done.</p><p>Additionally, the smaller Gemma 4 models (E2B and E4B) have an &#8220;efficient embedder&#8221; feature that further reduces the memory costs. Normal LLMs have a matrix that maps the activations to the entire token space (around 260,000 for Gemma 4). Efficient embedders use clustering methods that summarize groups of related tokens into a smaller space, which reduces the size of the projection matrix and keeps the drafter heads extremely lightweight.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pHIX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdde04b37-ba58-4d23-bfcd-3c57310984e7_1641x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pHIX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdde04b37-ba58-4d23-bfcd-3c57310984e7_1641x2048.png 424w, https://substackcdn.com/image/fetch/$s_!pHIX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdde04b37-ba58-4d23-bfcd-3c57310984e7_1641x2048.png 848w, https://substackcdn.com/image/fetch/$s_!pHIX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdde04b37-ba58-4d23-bfcd-3c57310984e7_1641x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!pHIX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdde04b37-ba58-4d23-bfcd-3c57310984e7_1641x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pHIX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdde04b37-ba58-4d23-bfcd-3c57310984e7_1641x2048.png" width="1456" height="1817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dde04b37-ba58-4d23-bfcd-3c57310984e7_1641x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pHIX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdde04b37-ba58-4d23-bfcd-3c57310984e7_1641x2048.png 424w, https://substackcdn.com/image/fetch/$s_!pHIX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdde04b37-ba58-4d23-bfcd-3c57310984e7_1641x2048.png 848w, https://substackcdn.com/image/fetch/$s_!pHIX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdde04b37-ba58-4d23-bfcd-3c57310984e7_1641x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!pHIX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdde04b37-ba58-4d23-bfcd-3c57310984e7_1641x2048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>However, MTP comes with a potential compute penalty. If the drafter is not well-aligned with the target model and produces sequences that are consistently rejected, the system actually becomes slower. Every rejected draft represents wasted GPU cycles that could have been used for standard generation. This is why the drafters in Gemma 4 are trained specifically to align with the main model, ensuring a high acceptance rate. Without this alignment, the overhead of the MTP heads would outweigh the benefits of parallel verification.</p><h2>Beyond MTP: DFlash and block diffusion</h2><p>The open nature of Gemma 4 has already allowed the research community to push its performance limits further. The team at Z-Lab have integrated a technique called DFlash with Gemma 4, achieving even higher inference speeds.</p><p>DFlash optimizes the way the GPU handles the model&#8217;s computations, targeting the bottlenecks in the KV cache and the attention layers that MTP doesn&#8217;t solve on its own. Experiments show that DFlash can increase the speed of Gemma 4 by up to 6x.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>DFlash incorporates a concept known as &#8220;Block Diffusion.&#8221; While MTP still works by predicting tokens one at a time, Block Diffusion treats the generation process more like an image generator might. It works with &#8220;blocks&#8221; of representations in the embedding space. Instead of just guessing the next token, the model refines a whole block of information simultaneously. This approach differs from MTP by moving away from strictly sequential logic to a more holistic, parallel refinement of the output.</p><p>The results on Gemma 4 show that these combined techniques can significantly reduce latency. By optimizing the underlying GPU kernels (the low-level code that tells the hardware how to perform math), DFlash ensures that the memory transfers are as efficient as possible. This community-driven improvement demonstrates that the baseline performance of Gemma 4 is just a starting point; the architecture is flexible enough to accommodate advanced optimization layers that weren&#8217;t part of the original release.</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/zhijianliu_/status/2051900751673467097&quot;,&quot;full_text&quot;:&quot;DFlash for Gemma 4: Up to 6x Faster. &#9889;&#9889;\n\nGreat to see MTP land natively in Gemma 4 today. If you want to push it further, try DFlash &#8212; open source, same quality, more speed!!\n\n<a class=\&quot;tweet-url\&quot; href=\&quot;http://github.com/z-lab/dflash\&quot;>github.com/z-lab/dflash</a>&quot;,&quot;username&quot;:&quot;zhijianliu_&quot;,&quot;name&quot;:&quot;Zhijian Liu&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/1693998286569713665/YRBp5-k1_normal.jpg&quot;,&quot;date&quot;:&quot;2026-05-06T05:43:56.000Z&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://substackcdn.com/image/upload/w_1028,c_limit,q_auto:best/l_twitter_play_button_rvaygk,w_88/lsfbcmzdaod7b88ekvwk&quot;,&quot;link_url&quot;:&quot;https://t.co/gZNeiRtFh4&quot;}],&quot;quoted_tweet&quot;:{&quot;full_text&quot;:&quot;Gemma 4: Now up to 3x Faster. &#9889;\n\nSame quality, way more speed. Our new MTP drafters allow Gemma 4 to predict multiple tokens at once, effectively tripling your output speed without compromising intelligence.&quot;,&quot;username&quot;:&quot;googledevs&quot;,&quot;name&quot;:&quot;Google for Developers&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/2042284161327648768/XHGnZShq_normal.jpg&quot;},&quot;reply_count&quot;:73,&quot;retweet_count&quot;:183,&quot;like_count&quot;:1519,&quot;impression_count&quot;:460999,&quot;expanded_url&quot;:null,&quot;video_url&quot;:&quot;https://video.twimg.com/amplify_video/2051900232976486400/vid/avc1/928x720/VGZnRuEqG2XHGgom.mp4&quot;,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h2>Why open weights are the engine of AI progress</h2><p>The rapid development of DFlash for Gemma 4 highlights the practical benefits of open-source AI. When a model is released with open weights and a documented architecture, it ceases to be a static product and becomes a platform for innovation. Researchers can inspect how the MTP heads interact with the main trunk of the model, allowing them to write specialized code that optimizes those specific pathways.</p><p>The integration of DFlash would have been impossible if Gemma 4 was released as an open model with an Apache 2.0 license. The community would be limited to using the model as provided by Google, with no way to tweak it for specific hardware or edge-case applications. Open weights allow for a crowdsourced R&amp;D department where thousands of independent developers can work on making a model faster, smaller, or more accurate. This ecosystem accelerates the transition of AI from giant data centers to everyday devices like smartphones and laptops.</p><p>Unfortunately, the current dynamics of the market encourage frontier labs to keep most of their models and the details of their architecture and training secret. However, the activity surrounding Gemma 4 suggests that the future of AI is not just about the size of the model, but the transparency of its design.</p><p>As more developers build on these open foundations, the gap between &#8220;research grade&#8221; and &#8220;daily use&#8221; AI continues to shrink. By sharing the underlying technology, Google has provided the blueprints for the community to make AI more accessible and efficient for everyone.</p>]]></content:encoded></item><item><title><![CDATA[How to scale LLMs to 100 million tokens without blowing up memory costs]]></title><description><![CDATA[Memory Sparse Attention (MSA) scales LLM context windows to an unprecedented 100 million tokens while preserving accuracy.]]></description><link>https://bdtechtalks.substack.com/p/how-to-scale-llms-to-100-million</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/how-to-scale-llms-to-100-million</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 05 May 2026 14:21:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!aAEX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadc6610-ec63-464e-82b8-f3e5f029b532_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aAEX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadc6610-ec63-464e-82b8-f3e5f029b532_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aAEX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadc6610-ec63-464e-82b8-f3e5f029b532_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!aAEX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadc6610-ec63-464e-82b8-f3e5f029b532_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!aAEX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadc6610-ec63-464e-82b8-f3e5f029b532_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!aAEX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadc6610-ec63-464e-82b8-f3e5f029b532_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aAEX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadc6610-ec63-464e-82b8-f3e5f029b532_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fadc6610-ec63-464e-82b8-f3e5f029b532_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aAEX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadc6610-ec63-464e-82b8-f3e5f029b532_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!aAEX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadc6610-ec63-464e-82b8-f3e5f029b532_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!aAEX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadc6610-ec63-464e-82b8-f3e5f029b532_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!aAEX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffadc6610-ec63-464e-82b8-f3e5f029b532_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Long-term memory remains a key challenge for large language models. The industry is currently maxing out at effective context windows of around 1 million tokens, which impedes the development of complex applications like massive multi-agent systems and processing very large text corpora.</p><p><a href="https://arxiv.org/abs/2603.23516">Memory Sparse Attention</a> (MSA), a new technique developed by researchers at Evermind, Shanda Group, and Peking University, addresses the shortcomings of current long-memory solutions. The architecture enables models to extend their context window up to 100 million tokens while preserving their reasoning accuracy.</p><p>The key innovation of MSA is a differentiable, end-to-end routing mechanism. The model learns to compress massive document collections into precomputed attention values and retrieve only the most relevant document chunks directly into the model&#8217;s active working memory during generation. MSA represents one of several emerging optimization techniques that allow developers to build AI applications capable of handling massive documents and developing long-term memory skills for dynamic environments.</p><h2>The challenge of long memory</h2><p>LLMs struggle with long-term, fine-grained memory retention. Standard full-attention mechanisms become computationally constrained as data grows because of their memory requirements. To process language, models compute how every token relates to every other token in a sequence. As the sequence gets longer, the computation required to track these relationships grows quadratically.</p><p>The effective <a href="https://bdtechtalks.com/2024/04/26/llm-infinite-context-fine-tuning-rag/">context window for most modern LLMs</a> is capped between 128,000 and 1 million tokens. To put this in perspective, cognitive science estimates human lifelong memory holds the equivalent of 200 to 300 million tokens.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This hard limit challenges complex applications that require long, persistent context. When attempting to comprehend extensive novel series (e.g., <em>A Song of Ice and Fire</em> or the <em>Harry Potter</em> series), standard models inevitably drop early plot points and subtle character details. When building digital twins to replicate human behavior, or maintaining consistent personas in role-playing, the AI will eventually forget its identity and break character as the conversation history overflows the available context window.</p><p>Similarly, managing the long-term history of multi-agent systems becomes unmanageable because the models cannot reliably retrieve granular decisions or past interactions to inform current reasoning. The core challenge for AI developers is scaling LLM memory without sacrificing computational efficiency, architectural compatibility, or reasoning precision.</p><h2>Requirements for an effective memory system</h2><p>In their paper, the researchers specify five core characteristics for an effective long-term memory system:</p><p>1- The system must offer architectural compatibility, integrating easily with mainstream LLM architectures rather than requiring isolated base models.</p><p>2- It must provide lifetime memory, scaling to handle massive context lengths while keeping computational overhead low and avoiding degradation in reasoning quality.</p><p>3- To ensure high-precision retrieval and storage, the memory mechanism needs end-to-end trainability, meaning it is fully differentiable and jointly optimized with the generation process as opposed to operating as an external retrieval system.</p><p>4- It also requires straightforward memory management for storing and updating context.</p><p>5- Finally, the system needs robustness against catastrophic forgetting. As the model processes vast amounts of conflicting information over time, it must retain its structural integrity and avoid overwriting critical historical knowledge.</p><p>Current approaches miss the mark on these requirements:</p><p>Parameter-based memory, such as continuous fine-tuning, updates model weights to store knowledge but incurs massive training overhead. It is also vulnerable to catastrophic forgetting when exposed to conflicting data.</p><p>External storage, such as standard <a href="https://bdtechtalks.com/2023/12/04/rag-document-retrieval-optimization/">retrieval-augmented generation</a> (RAG) pipelines, relies on semantic text embeddings decoupled from the LLM&#8217;s internal reasoning space. Because it is not end-to-end differentiable, standard RAG inherently hits a performance ceiling, offering shallow semantic matching that struggles with complex, multi-hop reasoning.</p><p>Latent state compression, such as techniques that <a href="https://bdtechtalks.substack.com/p/how-sparse-attention-is-solving-ais">compress the KV cache</a> into a fixed chunk of memory, are also limited in the granularity of information they can hold. As context lengthens, compressing long history into fixed-size states inevitably leads to severe information loss and precision degradation.</p><p>Overall, current approaches suffer from two fundamental limitations:</p><p>1- The limited scalability of high-fidelity memory forces a trade-off between how accurately a model remembers and how much data it can hold. Methods guaranteeing high-precision retrieval have fixed context limits, while those scaling to massive capacities struggle to maintain precision</p><p>2- There are no end-to-end trainable solutions. To build systems that scale, developers stitch together disconnected parts, creating an optimization gap that makes it impossible to build a memory pipeline meeting all core criteria listed above.</p><h2>How Memory Sparse Attention works</h2><p>To overcome the strict trade-offs of existing methods, MSA redefines how LLMs interact with their context by integrating memory retrieval and answer generation into a single, jointly-optimized latent state framework.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VP16!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafccc3d7-6265-4f1d-9fbf-bc82da92322c_996x475.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VP16!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafccc3d7-6265-4f1d-9fbf-bc82da92322c_996x475.png 424w, https://substackcdn.com/image/fetch/$s_!VP16!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafccc3d7-6265-4f1d-9fbf-bc82da92322c_996x475.png 848w, https://substackcdn.com/image/fetch/$s_!VP16!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafccc3d7-6265-4f1d-9fbf-bc82da92322c_996x475.png 1272w, https://substackcdn.com/image/fetch/$s_!VP16!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafccc3d7-6265-4f1d-9fbf-bc82da92322c_996x475.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VP16!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafccc3d7-6265-4f1d-9fbf-bc82da92322c_996x475.png" width="996" height="475" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/afccc3d7-6265-4f1d-9fbf-bc82da92322c_996x475.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:475,&quot;width&quot;:996,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VP16!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafccc3d7-6265-4f1d-9fbf-bc82da92322c_996x475.png 424w, https://substackcdn.com/image/fetch/$s_!VP16!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafccc3d7-6265-4f1d-9fbf-bc82da92322c_996x475.png 848w, https://substackcdn.com/image/fetch/$s_!VP16!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafccc3d7-6265-4f1d-9fbf-bc82da92322c_996x475.png 1272w, https://substackcdn.com/image/fetch/$s_!VP16!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafccc3d7-6265-4f1d-9fbf-bc82da92322c_996x475.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Memory Sparse Attention (MSA) (source: arXiv)</figcaption></figure></div><p>It relies on an end-to-end trainable sparse attention mechanism. MSA replaces standard dense self-attention with a document-based retrieval system operating directly within the model&#8217;s native representation space.</p><p>MSA segments documents, calculates their attention values, and compresses their hidden states into compact &#8220;Routing Keys&#8221; and standard Key-Value (KV) matrices. The KV cache is essentially the model&#8217;s scratchpad, storing the mathematical representations of previously processed tokens so it doesn&#8217;t have to recalculate them. In long-context tasks, the KV cache can become very large and unwieldy, so the routing keys are the compressed version of the data that act like index cards and enable the model to search for relevant data without perusing the dense attention values.</p><p>During generation, a specialized projector matches the user&#8217;s query against these Routing Keys to dynamically fetch only the top most relevant document chunks into the active context. To make this learnable through gradient descent, MSA introduces an additional loss function during pre-training that teaches the internal router to distinguish between relevant and irrelevant chunks in the latent routing space. This ensures the model has high-precision retrieval while avoiding the limitations of decoupled RAG systems.</p><p>The architecture also implements a technique called &#8220;document-wise positional encoding.&#8221; In an ideal scenario, building a 1 million token context model would involve training it on 1-million token examples.</p><p>Because that is computationally impractical, researchers use Rotary Positional Embedding (RoPE) to allow models to work beyond their training length. A model trained on 64k tokens might extrapolate to 128k or 256k tokens. However, standard RoPE fails when context grows drastically to 10 or 100 million tokens.</p><p>MSA solves this with document-wise RoPE, which assigns independent position IDs starting from zero to each document in the context. Inside the document, each token is assigned a position relative to the start of that specific document. This tweak decouples the positional semantics from the total number of documents in memory.</p><p>To the model&#8217;s internal attention mechanism, the 10,000th document in a massive database looks structurally identical to the first document. Because no document ever exceeds the length of the model&#8217;s training window, the model can effectively extrapolate. This allows developers to train efficiently on shorter 64k-token contexts while robustly extrapolating to 100 million tokens during inference, preventing catastrophic forgetting and precision loss.</p><p>A secondary benefit is that, with documents processed and indexed independently, developers can easily change the contents of individual documents and recompute their KV values without invalidating the entire cache.</p><h2>Handling the hardware limits</h2><p>To make MSA practical in deployment settings, the researchers designed two key innovations for retrieving and handling the model&#8217;s memory. The first is a multi-hop &#8220;Memory Interleave&#8221; mechanism. For complex queries requiring deep reasoning across scattered data, MSA doesn&#8217;t do single-shot retrieval from its memory store. Instead, it uses an iterative process that alternates between retrieving documents, appending them to the query context, and refining its search based on newly acquired evidence until it has sufficient context to generate a final answer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uodw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f64a11e-2a78-495e-a749-eb1abb4e8091_2048x1117.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uodw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f64a11e-2a78-495e-a749-eb1abb4e8091_2048x1117.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uodw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f64a11e-2a78-495e-a749-eb1abb4e8091_2048x1117.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uodw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f64a11e-2a78-495e-a749-eb1abb4e8091_2048x1117.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uodw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f64a11e-2a78-495e-a749-eb1abb4e8091_2048x1117.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uodw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f64a11e-2a78-495e-a749-eb1abb4e8091_2048x1117.jpeg" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f64a11e-2a78-495e-a749-eb1abb4e8091_2048x1117.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uodw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f64a11e-2a78-495e-a749-eb1abb4e8091_2048x1117.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uodw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f64a11e-2a78-495e-a749-eb1abb4e8091_2048x1117.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uodw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f64a11e-2a78-495e-a749-eb1abb4e8091_2048x1117.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uodw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f64a11e-2a78-495e-a749-eb1abb4e8091_2048x1117.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">MSA Memory Interleave</figcaption></figure></div><p>The second innovation tackles memory parallelism. Handling the KV cache for long contexts is a significant hardware bottleneck. According to the paper&#8217;s estimates, the compressed KV cache for 100 million tokens requires approximately 169GB of memory. Holding all that data in VRAM is unsustainable.</p><p>To solve this, MSA implements a two-part hardware optimization pipeline called &#8220;Memory Parallel,&#8221; relying on a tiered storage strategy. As we discussed above, when MSA processes documents, it creates lightweight Routing Keys that act like index cards. These keys are small enough, requiring about 56GB for 100 million tokens, that they can be safely stored directly on the fast, low-latency VRAM of the GPUs.</p><p>The massive bulk of the actual text data, the Content KVs, is offloaded entirely from the GPU and stored in the host machine&#8217;s cheaper CPU DRAM. On a standard AI compute node consisting of two NVIDIA A800 GPUs with a combined 160GB of VRAM, the system splits the 56GB Routing Keys in half. GPU 1 gets half the index, and GPU 2 gets the other half.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DdWZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a43494-5e2a-4561-be8c-b5e9a5980e72_2048x1117.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DdWZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a43494-5e2a-4561-be8c-b5e9a5980e72_2048x1117.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DdWZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a43494-5e2a-4561-be8c-b5e9a5980e72_2048x1117.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DdWZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a43494-5e2a-4561-be8c-b5e9a5980e72_2048x1117.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DdWZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a43494-5e2a-4561-be8c-b5e9a5980e72_2048x1117.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DdWZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a43494-5e2a-4561-be8c-b5e9a5980e72_2048x1117.jpeg" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/33a43494-5e2a-4561-be8c-b5e9a5980e72_2048x1117.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DdWZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a43494-5e2a-4561-be8c-b5e9a5980e72_2048x1117.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DdWZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a43494-5e2a-4561-be8c-b5e9a5980e72_2048x1117.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DdWZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a43494-5e2a-4561-be8c-b5e9a5980e72_2048x1117.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DdWZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a43494-5e2a-4561-be8c-b5e9a5980e72_2048x1117.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">MSA Memory Parallel Architecture</figcaption></figure></div><p>When a query arrives, it broadcasts to both GPUs simultaneously. Each GPU independently searches its own half of the index. They quickly compare scores globally to find the absolute best documents, and fetch those specific files from the DRAM.</p><h2>Putting it all together</h2><p>When MSA runs in production, it relies on a three-stage inference process. Assume a task features a 100 million token context. First, an offline stage processes the massive memory bank. The model runs a one-time forward pass over the entire document corpus. It generates standard KV matrices alongside the specialized Routing Keys. The system chunks and compresses these matrices and stores them in its memory bank.</p>
      <p>
          <a href="https://bdtechtalks.substack.com/p/how-to-scale-llms-to-100-million">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The silent threat of Claude Code (and other AI coding assistants)]]></title><description><![CDATA[A new study reveals how AI coding assistants like Claude Code are quietly hoarding and publishing sensitive API keys to code repositories.]]></description><link>https://bdtechtalks.substack.com/p/the-silent-threat-of-claude-code</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/the-silent-threat-of-claude-code</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 28 Apr 2026 13:32:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!CzGF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794c508d-f181-49d1-842e-6f646278ad19_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CzGF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794c508d-f181-49d1-842e-6f646278ad19_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CzGF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794c508d-f181-49d1-842e-6f646278ad19_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CzGF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794c508d-f181-49d1-842e-6f646278ad19_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CzGF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794c508d-f181-49d1-842e-6f646278ad19_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CzGF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794c508d-f181-49d1-842e-6f646278ad19_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CzGF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794c508d-f181-49d1-842e-6f646278ad19_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/794c508d-f181-49d1-842e-6f646278ad19_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CzGF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794c508d-f181-49d1-842e-6f646278ad19_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CzGF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794c508d-f181-49d1-842e-6f646278ad19_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CzGF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794c508d-f181-49d1-842e-6f646278ad19_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CzGF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F794c508d-f181-49d1-842e-6f646278ad19_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A <a href="https://www.lakera.ai/blog/your-ai-coding-assistant-just-shipped-your-api-keys">recent study</a> by cybersecurity firm Lakera reveals that AI coding assistants like Claude Code are inadvertently hoarding and leaking sensitive API keys during public package releases. While these tools accelerate the software development lifecycle, they also introduce hidden vulnerabilities into the automated software supply chain.</p><p>Claude Code caches approved terminal commands in a hidden local file. When a developer selects an &#8220;allow always&#8221; option to bypass repetitive prompts, any credentials passed within that command become permanently stored on the local machine. If the developer publishes the project to a public registry without explicitly ignoring this hidden directory, those stored API keys ship globally alongside the source code.</p><p>Industry experts emphasize the novelty and scale of this risk as AI agents move deeply into developer workflows. This means AI tool companies must adapt their tools to this new reality. At the same time, developers must take measures to avoid exposing their software libraries to the threats posed by AI coding tools.</p><p>&#8220;AI tooling is evolving at breakneck speed, and in many ways, this is the most software we&#8217;ve ever seen created and deployed without mature secure defaults both in the generated code itself and in the surrounding developer environment,&#8221; Steve Guiguere, Principal AI Security Advocate at Check Point Software, told TechTalks.</p><h2>How Claude Code leaks your sensitive data</h2><p>Claude Code operates using a strict permission system for shell commands. When the assistant attempts to run a command it has not executed before, it presents the developer with authorization options. Selecting &#8220;allow always&#8221; writes the exact command string to a hidden file located at &#8220;.claude/settings.local.json&#8221; within the root of the project directory.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Developers routinely execute authenticated API calls, run deployment scripts, or log into cloud services directly from the terminal. If an environment variable or API key is prepended to one of these commands, the AI agent logs it as a permanent allowlist entry. The agent is functioning exactly as designed, remembering state to reduce friction. But at the same time, it creates a static record of sensitive data.</p><p>The exposure occurs during the package publishing phase. Package managers like npm build distribution archives directly from the contents of the project directory. The &#8220;.claude/&#8221; folder acts similarly to a &#8220;.env&#8221; file, signaling that it contains personal, environment-specific data. However, it lacks the widespread ecosystem awareness that typically prevents environment files from shipping. Build tools exclude files via &#8220;.npmignore&#8221; or the &#8220;files&#8221; field in a &#8220;package.json,&#8221; but neither mechanism excludes the &#8220;.claude/&#8221; directory by default.</p><p>To measure the impact, Lakera built a service monitoring the npm registry&#8217;s changes feed. Across a scan window of roughly 46,500 packages, the firm identified 428 packages containing a &#8220;.claude/settings.local.json&#8221; file. Of those, 33 files across 30 packages contained live credentials. Roughly one in 13 of the shipped settings files exposed sensitive data to the public.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hX8s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe598c96e-0073-4bbc-807b-9c8c2152330d_1400x781.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hX8s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe598c96e-0073-4bbc-807b-9c8c2152330d_1400x781.png 424w, https://substackcdn.com/image/fetch/$s_!hX8s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe598c96e-0073-4bbc-807b-9c8c2152330d_1400x781.png 848w, https://substackcdn.com/image/fetch/$s_!hX8s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe598c96e-0073-4bbc-807b-9c8c2152330d_1400x781.png 1272w, https://substackcdn.com/image/fetch/$s_!hX8s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe598c96e-0073-4bbc-807b-9c8c2152330d_1400x781.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hX8s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe598c96e-0073-4bbc-807b-9c8c2152330d_1400x781.png" width="1400" height="781" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e598c96e-0073-4bbc-807b-9c8c2152330d_1400x781.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:781,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hX8s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe598c96e-0073-4bbc-807b-9c8c2152330d_1400x781.png 424w, https://substackcdn.com/image/fetch/$s_!hX8s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe598c96e-0073-4bbc-807b-9c8c2152330d_1400x781.png 848w, https://substackcdn.com/image/fetch/$s_!hX8s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe598c96e-0073-4bbc-807b-9c8c2152330d_1400x781.png 1272w, https://substackcdn.com/image/fetch/$s_!hX8s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe598c96e-0073-4bbc-807b-9c8c2152330d_1400x781.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Traditional automated safeguards frequently miss these exposures. Existing secret scanning tools like GitHub Advanced Security are highly effective at finding known credential patterns within source code and version control histories. &#8220;This case is different because the credentials are embedded inside an AI tool&#8217;s local settings file as part of approved shell command strings,&#8221; Guiguere said. The AI assistants have created entirely new locations where secrets quietly accumulate outside the view of established security workflows.</p><p>The underlying vulnerability affects any ecosystem that packages files from a project directory. Build tools for Python source distributions (PyPI), RubyGems, and Maven all select files and publish archives based on directory contents, carrying the same exposure risk if the hidden &#8220;.claude/&#8221; directory is visible to the packaging process, according to Lakera.</p><p>While widespread, automated exploitation of these specific files is not yet public, proactive research proves the capability exists. &#8220;If researchers can identify a repeatable way to discover exposed credentials in public registries, we have to assume adversaries can do the same, and likely will,&#8221; Guiguere warned. &#8220;In security, the right assumption is often that once a weakness is practical and economically interesting, it will eventually be operationalized.&#8221;</p><h2>Countermeasures for developers and enterprises</h2><p>Developers can immediately mitigate this risk by manually adding the &#8220;.claude/&#8221; directory to their &#8220;.npmignore&#8221; and &#8220;.gitignore&#8221; files. Furthermore, package managers offer preview mechanisms that allow developers to inspect an archive before it goes live. Running commands like &#8220;npm pack --dry-run&#8221; or using equivalent artifact inspection tools in other languages ensures that hidden AI state files are excluded from the final release.</p><p>For creators of AI tools, automatically generating or updating these ignore files during the tool&#8217;s initialization would act as a strong secure-by-default mechanism. Despite the logic of this approach, the industry is trending toward a cloud-style shared responsibility model, according to Guiguere.</p><p>&#8220;The platform will be securable, but not necessarily secure by default,&#8221; he said. &#8220;Providers will offer mechanisms and guidance, while developers and enterprises remain responsible for configuring and enforcing protections around those tools.&#8221;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This is a pattern we&#8217;ve seen in other security incidents, such as a recently reported <a href="https://bdtechtalks.com/2026/04/20/anthropic-mcp-vulnerability/'">vulnerability in Model Context Protocol (MCP</a>) that exposed computers to remote code execution (RCE) attacks. The maintainers of the protocol and MCP tools said this was a feature and that developers are responsible for preventing such incidents. Striking the right balance between flexibility and security remains an unsolved problem in AI tools, and for the time being, developers must be super cautious.</p><p>But relying on individual developers to remember manual file updates scales poorly across large organizations. Enterprises require automated preventative guardrails built directly into their software delivery pipelines. Platform engineering teams should implement policy checks that automatically fail builds if &#8220;.claude/&#8221; or similar agent directories appear in publishable artifacts.</p><p>The presence of local AI agents also introduces a new endpoint security dimension. Because tools like Claude Code run locally, they create risk at the developer workstation long before code ever reaches a registry. Enterprises need endpoint controls capable of auditing development directories where sensitive agent state accumulates, shifting the burden of security from individual developer hygiene to managed enterprise controls.</p><h2>The future of agentic AI security</h2><p>The integration of always-watching AI agents forces a fundamental shift in how the industry views command-line hygiene. Historically, passing an API key directly into a local terminal command via a tool like curl was relatively safe because local bash histories are rarely packaged and published.</p><p>AI coding assistants disrupt that model. &#8220;If an AI agent is watching and recording operational context, developers need to stop thinking of the terminal as purely ephemeral,&#8221; Guiguere said. &#8220;AI coding assistants change that model because they observe, approve, remember, and sometimes persist commands as part of their operating model.&#8221;</p><p>To harden systems against inadvertent credential hoarding, engineers must recognize that an AI agent running on a desktop is an application runtime. It requires the same architectural discipline expected of cloud infrastructure. Future security models will require sandboxing agents, mounting only the specific directories they need to function, and strictly enforcing the principle of least privilege.</p><p>Where secrets are required for complex workflows, they must be retrieved dynamically from controlled secret stores or secret managers using scoped, short-lived access, rather than relying on hard-coded credentials embedded in repeatable command-line approvals.</p>]]></content:encoded></item><item><title><![CDATA[The 'by design' security flaw of Model Context Protocol (MCP)]]></title><description><![CDATA[Security researchers have uncovered a massive architectural flaw in Anthropic's Model Context Protocol, exposing millions of AI applications to remote takeovers.]]></description><link>https://bdtechtalks.substack.com/p/the-by-design-security-flaw-of-model</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/the-by-design-security-flaw-of-model</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 21 Apr 2026 15:11:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!A9vm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d656f80-01a7-45b2-954f-803d45d28289_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A9vm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d656f80-01a7-45b2-954f-803d45d28289_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A9vm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d656f80-01a7-45b2-954f-803d45d28289_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!A9vm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d656f80-01a7-45b2-954f-803d45d28289_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!A9vm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d656f80-01a7-45b2-954f-803d45d28289_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!A9vm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d656f80-01a7-45b2-954f-803d45d28289_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A9vm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d656f80-01a7-45b2-954f-803d45d28289_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0d656f80-01a7-45b2-954f-803d45d28289_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A9vm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d656f80-01a7-45b2-954f-803d45d28289_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!A9vm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d656f80-01a7-45b2-954f-803d45d28289_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!A9vm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d656f80-01a7-45b2-954f-803d45d28289_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!A9vm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d656f80-01a7-45b2-954f-803d45d28289_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>An architectural vulnerability baked into the core of Anthropic&#8217;s <a href="https://bdtechtalks.com/2025/03/31/model-context-protocol-mcp/">Model Context Protocol</a> (MCP) exposes millions of AI applications to remote command execution. Security researchers at OX Security discovered a <a href="https://www.ox.security/resource-category/whitepapers-and-reports/mother-of-all-ai-supply-chains/">fundamental flaw</a> in how the protocol handles local process execution, allowing attackers to hijack servers, exfiltrate private data, and infiltrate enterprise networks.</p><p>Because the vulnerability exists at the protocol layer, the blast radius is massive. It affects over 150 million downloads, leaves more than 200,000 public servers potentially exposed, and has resulted in over 10 Common Vulnerabilities and Exposures (CVEs). The research team successfully executed commands on six live production platforms with paying customers and bypassed security checks on 9 out of 11 major MCP marketplaces.</p><p>The vulnerability mostly remains in the wild as it is a &#8220;feature not a bug,&#8221; and requires vigilance by developers. But the report highlights the kind of care you need to put into your AI applications as you adopt new technologies.</p><h2>How expected behavior turns into remote execution</h2><p>To understand the scope of the threat, you have to understand the role of MCP. Released by Anthropic in November 2024, MCP acts as a universal plug adapter for AI agents. Large language models (LLMs) cannot inherently browse your local files or query a private SQL database. MCP bridges this gap. An MCP adapter translates the LLM&#8217;s requests into actions the external service such as a web search engine or a database can understand.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Because MCP sits at the junction between the AI model and external tools, it holds highly privileged access to source code, chat logs, and API keys.</p><p>The vulnerability stems from how MCP manages local connections through its Standard Input/Output (STDIO) interface. STDIO is a basic mechanism computers use to pass text between running programs. When you instruct an MCP adapter to start a local server, you pass it a command string. MCP executes this string on the host machine&#8217;s operating system to spin up the connection.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oiEl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c1f3a-81d7-42f1-b6dd-8236bb46edb8_1638x872.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oiEl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c1f3a-81d7-42f1-b6dd-8236bb46edb8_1638x872.png 424w, https://substackcdn.com/image/fetch/$s_!oiEl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c1f3a-81d7-42f1-b6dd-8236bb46edb8_1638x872.png 848w, https://substackcdn.com/image/fetch/$s_!oiEl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c1f3a-81d7-42f1-b6dd-8236bb46edb8_1638x872.png 1272w, https://substackcdn.com/image/fetch/$s_!oiEl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c1f3a-81d7-42f1-b6dd-8236bb46edb8_1638x872.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oiEl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c1f3a-81d7-42f1-b6dd-8236bb46edb8_1638x872.png" width="1456" height="775" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f6c1f3a-81d7-42f1-b6dd-8236bb46edb8_1638x872.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:775,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oiEl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c1f3a-81d7-42f1-b6dd-8236bb46edb8_1638x872.png 424w, https://substackcdn.com/image/fetch/$s_!oiEl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c1f3a-81d7-42f1-b6dd-8236bb46edb8_1638x872.png 848w, https://substackcdn.com/image/fetch/$s_!oiEl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c1f3a-81d7-42f1-b6dd-8236bb46edb8_1638x872.png 1272w, https://substackcdn.com/image/fetch/$s_!oiEl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6c1f3a-81d7-42f1-b6dd-8236bb46edb8_1638x872.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">How the MCP STDIO vulnerability works (source: Ox Security)</figcaption></figure></div><p>If the command successfully starts an MCP connection, the system proceeds. However, if the command is malicious and fails to establish a valid MCP connection, the system simply returns an error to the user. The critical failure is that the underlying operating system command still runs. There are no sanitization warnings or roadblocks. An attacker can pass a malicious command, receive a connection error, and walk away with full control of the server.</p><p>The architecture makes it easy to distribute exploits. Developers frequently browse community marketplaces to download custom MCP configurations. OX Security uploaded a harmless proof-of-concept payload to 11 directories. Nine of them, including popular hubs like LobeHub and Cursor Directory, published the payload without any security review (GitHub was the only one that rejected the submission, and Cline did not respond). A single malicious listing could grant an attacker hidden access to thousands of developer machines.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mJRH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1899dd7-9d92-489a-9cd4-e6495f60e80b_1640x1112.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mJRH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1899dd7-9d92-489a-9cd4-e6495f60e80b_1640x1112.png 424w, https://substackcdn.com/image/fetch/$s_!mJRH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1899dd7-9d92-489a-9cd4-e6495f60e80b_1640x1112.png 848w, https://substackcdn.com/image/fetch/$s_!mJRH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1899dd7-9d92-489a-9cd4-e6495f60e80b_1640x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!mJRH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1899dd7-9d92-489a-9cd4-e6495f60e80b_1640x1112.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mJRH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1899dd7-9d92-489a-9cd4-e6495f60e80b_1640x1112.png" width="1456" height="987" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e1899dd7-9d92-489a-9cd4-e6495f60e80b_1640x1112.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:987,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mJRH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1899dd7-9d92-489a-9cd4-e6495f60e80b_1640x1112.png 424w, https://substackcdn.com/image/fetch/$s_!mJRH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1899dd7-9d92-489a-9cd4-e6495f60e80b_1640x1112.png 848w, https://substackcdn.com/image/fetch/$s_!mJRH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1899dd7-9d92-489a-9cd4-e6495f60e80b_1640x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!mJRH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1899dd7-9d92-489a-9cd4-e6495f60e80b_1640x1112.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Code repositories vulnerable to MCP attack (source: Ox Security)</figcaption></figure></div><p>In practice, this flaw manifests differently depending on the application environment.</p><p>LangFlow, an open-source framework owned by IBM, allows users to build AI workflows through a web interface. Researchers found over 915 publicly accessible LangFlow instances online. The platform exposes its MCP configuration panel without requiring authentication. Anyone on the internet can request a session token, send a crafted network request containing a malicious STDIO command, and take over the underlying server without ever logging in.</p><p>The threat also extends to local development environments through prompt injection. AI-powered Integrated Development Environments (IDEs) like Windsurf and Cursor read local files and browse the web to assist developers. If a developer visits an attacker-controlled website, the site can feed a hidden, malicious instruction to the AI assistant. The AI then proposes an edit to the local mcp.json file to include a dangerous STDIO command. In the case of Windsurf, the system applied the edit and executed the command immediately, requiring zero clicks or approvals from the developer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9gHa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74fbcb07-d4eb-40d7-97ca-898c634a6486_1638x542.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9gHa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74fbcb07-d4eb-40d7-97ca-898c634a6486_1638x542.png 424w, https://substackcdn.com/image/fetch/$s_!9gHa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74fbcb07-d4eb-40d7-97ca-898c634a6486_1638x542.png 848w, https://substackcdn.com/image/fetch/$s_!9gHa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74fbcb07-d4eb-40d7-97ca-898c634a6486_1638x542.png 1272w, https://substackcdn.com/image/fetch/$s_!9gHa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74fbcb07-d4eb-40d7-97ca-898c634a6486_1638x542.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9gHa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74fbcb07-d4eb-40d7-97ca-898c634a6486_1638x542.png" width="1456" height="482" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74fbcb07-d4eb-40d7-97ca-898c634a6486_1638x542.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:482,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9gHa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74fbcb07-d4eb-40d7-97ca-898c634a6486_1638x542.png 424w, https://substackcdn.com/image/fetch/$s_!9gHa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74fbcb07-d4eb-40d7-97ca-898c634a6486_1638x542.png 848w, https://substackcdn.com/image/fetch/$s_!9gHa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74fbcb07-d4eb-40d7-97ca-898c634a6486_1638x542.png 1272w, https://substackcdn.com/image/fetch/$s_!9gHa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74fbcb07-d4eb-40d7-97ca-898c634a6486_1638x542.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">MCP vulnerability in LangFlow (source: Ox Security)</figcaption></figure></div><h2>The balance between flexibility and security</h2><p>When OX Security reported these findings, the response from protocol maintainers and IDE vendors was uniform. Anthropic, LangChain, and Microsoft all stated that the behavior is expected and operates by design.</p><p>The core of their argument is flexibility. To maintain MCP as an unopinionated, universal standard, the protocol avoids putting strict limitations on what developers can execute. LangChain maintains that application authors are responsible for validating and sanitizing inputs. IDE vendors argue that users must actively trust their workspace environments.</p><p>This approach creates a structural failure. It shifts the burden of complex security sanitization onto tens of thousands of downstream developers building chat apps and internal tools. Many of these developers are not security engineers, guaranteeing that vulnerabilities will emerge at scale.</p><p>Moshe Siman Tov Bustan, security research team lead at OX Security, argues that safety does not require sacrificing functionality.</p><p>&#8220;The protocol can be redesigned in a way that doesn&#8217;t lose its utility,&#8221; Bustan told TechTalks. &#8220;By letting developers choose the available functionality, it could block arbitrary commands by default, but let developers enable them by passing a flag that clearly states the function enables arbitrary command execution and should not be exposed to user input.&#8221;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Securing the AI stack</h2><p>Until the foundational protocols are hardened, engineering teams need to adopt strict architectural controls to protect their environments from MCP-based exploits.</p><p><strong>Implement manifest-only execution:</strong> The pattern of passing raw, user-supplied strings directly into a shell execution environment must stop. Teams should transition to a manifest model, where applications only execute pre-defined server aliases locked within a static configuration file.</p><p><strong>Enforce strict sandboxing:</strong> AI agents should operate in low-privilege environments. &#8220;If an MCP needs to communicate with a remote server, it should be allowed to contact only that specific server. The same applies to filesystem access and any other behavior,&#8221; advises Bustan.</p><p><strong>Require explicit opt-ins:</strong> Developers should introduce a mandatory flag whenever dynamic STDIO arguments are active. This makes dangerous modes easily searchable for security linters and audit teams.</p><p><strong>Adopt least-privilege secret management:</strong> API keys and OAuth tokens provided to MCP servers must be strictly scoped to their intended function. The presence of broad, full-permission keys should trigger automated warnings during the deployment process.</p><p><strong>Establish marketplace verification:</strong> Registries need to mature beyond open uploads. Marketplaces should require developers to submit a standardized security manifest detailing exactly which network resources and file systems an MCP server will access before allowing users to download it.</p><h2>Navigating the &#8220;Army of Juniors&#8221;</h2><p>The vulnerabilities within MCP highlight a broader shift in how software is built today. The widespread use of AI coding assistants and <a href="https://bdtechtalks.com/2025/04/09/demystifying-vibe-coding/">vibe-coding tools</a> allows individuals with limited security maturity to generate and deploy functional code at an unprecedented speed.</p><p>OX Security refers to this dynamic as the &#8220;Army of Juniors&#8221; effect. AI systems are highly effective at writing integration logic, but they rarely question the architectural safety of the underlying protocols they use.</p><p>&#8220;We consider this cascading vulnerability to align with this AI coding anti-pattern, because AI services are often written by AI-native developers,&#8221; Bustan said. &#8220;If the AI doesn&#8217;t question the underlying implementation, the developer behind the AI won&#8217;t either.&#8221;</p><p>This dynamic makes foundational security critical. Protocol maintainers must embed &#8220;secure by design&#8221; principles directly into SDKs and package dependencies, rather than shifting the burden of sanitizing inputs downstream. As autonomous systems gain more access to sensitive data and critical infrastructure, relying on traditional tools or even other AI systems to catch anomalies is no longer enough.</p><p>&#8220;While code is rapidly evolving and being written at scale, security expertise and manual reviews are now needed more than ever,&#8221; Bustan said. &#8220;Our analysis shows that we cannot blindly trust automated AI reviews to check code security. Especially in areas where the code handles sensitive information, human manual reviews can identify architectural flaws that are often introduced by AI agents.&#8221;</p>]]></content:encoded></item><item><title><![CDATA[The hidden trap of LLMs self-distillation]]></title><description><![CDATA[Optimizing LLMs for concise answers can destroy their ability to explore alternative solutions on difficult problems. New study reveals the hidden cost of self-distillation.]]></description><link>https://bdtechtalks.substack.com/p/the-hidden-trap-of-llms-self-distillation</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/the-hidden-trap-of-llms-self-distillation</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 14 Apr 2026 14:28:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1RBd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb1ac4c-babe-4cc4-90d7-24f526ace9d1_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1RBd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb1ac4c-babe-4cc4-90d7-24f526ace9d1_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1RBd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb1ac4c-babe-4cc4-90d7-24f526ace9d1_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1RBd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb1ac4c-babe-4cc4-90d7-24f526ace9d1_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1RBd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb1ac4c-babe-4cc4-90d7-24f526ace9d1_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1RBd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb1ac4c-babe-4cc4-90d7-24f526ace9d1_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1RBd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb1ac4c-babe-4cc4-90d7-24f526ace9d1_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7cb1ac4c-babe-4cc4-90d7-24f526ace9d1_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1RBd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb1ac4c-babe-4cc4-90d7-24f526ace9d1_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1RBd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb1ac4c-babe-4cc4-90d7-24f526ace9d1_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1RBd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb1ac4c-babe-4cc4-90d7-24f526ace9d1_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1RBd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb1ac4c-babe-4cc4-90d7-24f526ace9d1_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Self-distillation has emerged as an effective post-training paradigm for large language models, often improving performance while shortening reasoning traces. However, <a href="https://arxiv.org/abs/2603.24472">recent research</a> by Microsoft Research, KAIST, and Seoul National University reveals a major flaw in this approach.</p><p>In mathematical reasoning, self-distillation inadvertently suppresses behaviors that allow models to explore alternative hypotheses and self-correct during complex problem-solving. As a result, the models become significantly less accurate on out-of-distribution problems.</p><p>The key takeaway is that optimizing post-training solely to reinforce concise, correct reasoning traces can quietly destroy a model&#8217;s ability to generalize. Across various open-weight models, researchers found that self-distillation can cause performance drops of up to 40% on unseen tasks.</p><p>For models to maintain their robust reasoning abilities, they must be exposed to different levels of uncertainty during training.</p><h2>What is self-distillation?</h2><p>In <a href="https://bdtechtalks.com/2023/09/18/what-is-llm-compression/">standard distillation pipelines</a>, a massive teacher model provides training signals to a smaller, more efficient student model. Self-distillation alters this formula by employing two instances of the exact same model as both teacher and student.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The student model generates reasoning sequences based solely on a standard input prompt. Meanwhile, the teacher model receives a much richer context, such as the ground-truth solution, environment feedback, or other auxiliary signals.</p><p>The training process tries to minimize the divergence between the student and teacher&#8217;s next-token distributions. Because the teacher is guided by privileged information, it naturally produces highly concise and confident reasoning trajectories with minimal uncertainty. By training the student to match these predictions, the model is encouraged to internalize the hints derived from the rich context. The model distills information available at training time without requiring an external teacher.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J415!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F583df4cd-ecbb-491b-8d4a-33b4e080e9cd_1600x893.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J415!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F583df4cd-ecbb-491b-8d4a-33b4e080e9cd_1600x893.png 424w, https://substackcdn.com/image/fetch/$s_!J415!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F583df4cd-ecbb-491b-8d4a-33b4e080e9cd_1600x893.png 848w, https://substackcdn.com/image/fetch/$s_!J415!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F583df4cd-ecbb-491b-8d4a-33b4e080e9cd_1600x893.png 1272w, https://substackcdn.com/image/fetch/$s_!J415!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F583df4cd-ecbb-491b-8d4a-33b4e080e9cd_1600x893.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J415!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F583df4cd-ecbb-491b-8d4a-33b4e080e9cd_1600x893.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/583df4cd-ecbb-491b-8d4a-33b4e080e9cd_1600x893.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J415!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F583df4cd-ecbb-491b-8d4a-33b4e080e9cd_1600x893.png 424w, https://substackcdn.com/image/fetch/$s_!J415!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F583df4cd-ecbb-491b-8d4a-33b4e080e9cd_1600x893.png 848w, https://substackcdn.com/image/fetch/$s_!J415!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F583df4cd-ecbb-491b-8d4a-33b4e080e9cd_1600x893.png 1272w, https://substackcdn.com/image/fetch/$s_!J415!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F583df4cd-ecbb-491b-8d4a-33b4e080e9cd_1600x893.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">LLM self-distillation</figcaption></figure></div><p>When combined with methods like <a href="https://bdtechtalks.com/2025/12/01/reinforcement-learning-for-llms-rlvr/">Reinforcement Learning from Verifiable Rewards</a> (RLVR) (a popular training technique that rewards the model when its final output matches an objectively correct answer) self-distillation leads to highly efficient performance gains in agentic environments and scientific reasoning domains. In these domains, the approach achieves higher accuracy while compressing the reasoning process, leading to shorter and more effective model responses.</p><p>Yet, these impressive gains do not translate uniformly across all cognitive tasks, as the experiments of the new study show.</p><h2>Testing the limits of self-distillation</h2><p>To investigate the impact of self-distillation on mathematical problem-solving, the researchers ran extensive experiments with several open-weight language models, including a distilled 7B version of <a href="https://bdtechtalks.com/2025/02/10/demystifying-deepseek-r1-the-model-that-shocked-the-ai-industry/">DeepSeek-R1</a>, Qwen3-8B, and Olmo3-7B-Instruct.</p><p>The models were trained using the <a href="https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed">DAPO-Math-17k</a> dataset, which contains thousands of mathematical problems. To test out-of-distribution generalization, they evaluated the fine-tuned checkpoints on unseen or more challenging math benchmarks, including AIME24, AIME25, AMC23, and MATH500.</p><p>The researchers compared Group Relative Policy Optimization (GRPO) against Reinforcement Learning via Self-Distillation (SDPO). They also conducted off-policy supervised fine-tuning experiments, contrasting models trained on standard unguided responses against models trained on concise, solution-guided responses.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!51hj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2a1825-dda9-4c1f-bc9f-a155539bec80_701x169.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!51hj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2a1825-dda9-4c1f-bc9f-a155539bec80_701x169.png 424w, https://substackcdn.com/image/fetch/$s_!51hj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2a1825-dda9-4c1f-bc9f-a155539bec80_701x169.png 848w, https://substackcdn.com/image/fetch/$s_!51hj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2a1825-dda9-4c1f-bc9f-a155539bec80_701x169.png 1272w, https://substackcdn.com/image/fetch/$s_!51hj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2a1825-dda9-4c1f-bc9f-a155539bec80_701x169.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!51hj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2a1825-dda9-4c1f-bc9f-a155539bec80_701x169.png" width="701" height="169" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a2a1825-dda9-4c1f-bc9f-a155539bec80_701x169.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:169,&quot;width&quot;:701,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!51hj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2a1825-dda9-4c1f-bc9f-a155539bec80_701x169.png 424w, https://substackcdn.com/image/fetch/$s_!51hj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2a1825-dda9-4c1f-bc9f-a155539bec80_701x169.png 848w, https://substackcdn.com/image/fetch/$s_!51hj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2a1825-dda9-4c1f-bc9f-a155539bec80_701x169.png 1272w, https://substackcdn.com/image/fetch/$s_!51hj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a2a1825-dda9-4c1f-bc9f-a155539bec80_701x169.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">LLM self-distillation results (source: arXiv)</figcaption></figure></div><p>The baseline GRPO consistently yielded modest performance gains on the out-of-distribution benchmarks. It also prompted a slight increase in response length.</p><p>SDPO resulted in a sharp drop in response length and led to substantial performance degradation. Performance dropped by roughly 40% on the AIME24 benchmark and 15% on AMC23. The off-policy experiments mirrored this trend. Training on concise, solution-guided trajectories drastically degraded benchmark scores, despite the training dataset consisting entirely of correct mathematical traces.</p><p>The researchers also observed very different outcomes when they manipulated the size and diversity of the training tasks. When trained on a small number of questions, ranging from 1 to 128 problems, SDPO proved highly efficient. It achieved high training scores while compressing response lengths by up to eight times compared to GRPO.</p><p>However, as the task coverage expanded to include hundreds or thousands of diverse problems, the dynamic entirely reversed. GRPO&#8217;s out-of-distribution performance scaled consistently as the dataset grew. SDPO struggled to accommodate the broader range of reasoning patterns, resulting in severe performance drops on evaluation benchmarks when trained on the larger problem sets.</p><h2>The importance of expressing uncertainty</h2><p>To understand the root cause of these performance drops, the researchers focused on &#8220;epistemic verbalization,&#8221; the model explicitly expressing uncertainty during its reasoning process with tokens such as &#8220;wait,&#8221; &#8220;hmm,&#8221; &#8220;perhaps,&#8221; and &#8220;maybe.&#8221;</p><p>Large language models do not plan their entire answer in advance; they calculate probabilities sequentially. Tokens like &#8220;wait&#8221; or &#8220;perhaps&#8221; act as functional computational steps for the model.</p><p>When a model verbalizes uncertainty, it successfully maintains alternative hypotheses and supports a gradual reduction of uncertainty. Conversely, when this behavior is artificially suppressed, the model loses the capacity to iteratively refine its beliefs. It prematurely commits to incorrect hypotheses with limited opportunity for recovery or self-correction.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C2IG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4a268-2245-48fb-b1a0-9f803b4e76cf_704x129.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C2IG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4a268-2245-48fb-b1a0-9f803b4e76cf_704x129.png 424w, https://substackcdn.com/image/fetch/$s_!C2IG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4a268-2245-48fb-b1a0-9f803b4e76cf_704x129.png 848w, https://substackcdn.com/image/fetch/$s_!C2IG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4a268-2245-48fb-b1a0-9f803b4e76cf_704x129.png 1272w, https://substackcdn.com/image/fetch/$s_!C2IG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4a268-2245-48fb-b1a0-9f803b4e76cf_704x129.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C2IG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4a268-2245-48fb-b1a0-9f803b4e76cf_704x129.png" width="704" height="129" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70e4a268-2245-48fb-b1a0-9f803b4e76cf_704x129.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:129,&quot;width&quot;:704,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C2IG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4a268-2245-48fb-b1a0-9f803b4e76cf_704x129.png 424w, https://substackcdn.com/image/fetch/$s_!C2IG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4a268-2245-48fb-b1a0-9f803b4e76cf_704x129.png 848w, https://substackcdn.com/image/fetch/$s_!C2IG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4a268-2245-48fb-b1a0-9f803b4e76cf_704x129.png 1272w, https://substackcdn.com/image/fetch/$s_!C2IG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70e4a268-2245-48fb-b1a0-9f803b4e76cf_704x129.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The researchers found that self-distillation inherently suppresses these epistemic signals due to the highly informative context provided to the teacher model. When the teacher possesses the final correct solution, it generates a reasoning trajectory filled with strong hints and minimal expressed uncertainty.</p><p>By forcing the student model to mimic this output, the training process encourages the student to imitate a highly confident reasoning style that presupposes information it lacks at inference time. As the conditioning context becomes richer, the model generates answers more confidently and systematically strips away its own epistemic verbalizations.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This trained removal of uncertainty has implications for out-of-distribution generalization. The value of epistemic verbalization scales directly with the generalization demands of a task. When task coverage is narrow and repetitive, suppressing uncertainty enables rapid optimization and efficiency.</p><p>Yet, as task diversity increases, the aggressive removal of epistemic markers actively interferes with the model&#8217;s ability to optimize across diverse tasks. Stripped of its exploratory mechanisms, the model cannot successfully navigate unseen and challenging problems.</p><h2>Practical implications</h2><p>If you want to train a self-distilled model or use one, the core tradeoff you must consider is response efficiency versus generalizable reasoning capability. Self-distillation can impressively compress a model&#8217;s response length, as shown in the experiments. It filters out unnecessary verbosity and significantly drives down inference compute costs. But this compression directly risks eliminating the vital signals that models rely on to self-correct and adjust hypotheses mid-generation.</p><p>You can comfortably apply self-distillation in narrow, well-defined domains where the task coverage is limited, familiar, or highly repetitive. For instance, the researchers found that self-distillation is very effective in specific scientific domains like chemistry or specialized coding environments. In these datasets, the underlying problem structures remain very similar, even if surface details change. In these precise scenarios, explicit expressions of uncertainty are largely redundant. They can be safely removed to make responses faster and potentially more accurate.</p><p>Conversely, developers should avoid relying heavily on self-distillation in broad, complex domains that demand strong out-of-distribution generalization beyond the initial training examples. When a model must handle a vast array of unseen, non-overlapping problem types, preserving its ability to express uncertainty and iteratively refine its beliefs is critical for success. If self-distillation is applied to these broad problem sets, the aggressive, trained removal of epistemic signals acts as a straitjacket. It actively interferes with the model&#8217;s ability to adapt to new challenges, fundamentally capping its reasoning potential.</p>]]></content:encoded></item><item><title><![CDATA[The art of AI harness engineering]]></title><description><![CDATA[The recent leak of Anthropic's Claude Code reveals a hard truth: as LLMs become commoditized, the sophisticated engineering harness built around them is becoming the real moat.]]></description><link>https://bdtechtalks.substack.com/p/the-art-of-ai-harness-engineering</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/the-art-of-ai-harness-engineering</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 07 Apr 2026 13:44:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0F9r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b07d7f4-7cd7-480b-a21b-dd884c090733_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0F9r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b07d7f4-7cd7-480b-a21b-dd884c090733_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0F9r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b07d7f4-7cd7-480b-a21b-dd884c090733_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0F9r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b07d7f4-7cd7-480b-a21b-dd884c090733_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0F9r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b07d7f4-7cd7-480b-a21b-dd884c090733_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0F9r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b07d7f4-7cd7-480b-a21b-dd884c090733_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0F9r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b07d7f4-7cd7-480b-a21b-dd884c090733_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b07d7f4-7cd7-480b-a21b-dd884c090733_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0F9r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b07d7f4-7cd7-480b-a21b-dd884c090733_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0F9r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b07d7f4-7cd7-480b-a21b-dd884c090733_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0F9r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b07d7f4-7cd7-480b-a21b-dd884c090733_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0F9r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b07d7f4-7cd7-480b-a21b-dd884c090733_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In March 2026, a simple packaging error exposed roughly 512,000 lines of <a href="https://venturebeat.com/technology/claude-codes-source-code-appears-to-have-leaked-heres-what-we-know">Anthropic&#8217;s Claude Code</a>. Version 2.1.88 of the npm package shipped with an unobfuscated source map file, revealing the complete TypeScript architecture of the company&#8217;s flagship coding assistant. Among other things, this leak shattered the popular narrative that AI models are becoming so advanced they can do anything out of the box with simple prompts.</p><p>What the source code revealed was not a thin wrapper around a language model. It is a very sophisticated harness: a complex orchestration layer, test-time reasoning loops, and persistent memory systems that act as the operating system for the AI agent.</p><p>Contrary to the belief that AI will replace developers, the reality of production-grade AI applications tells a different story. The raw model is merely a component. The true moat, and the key to building reliable AI software, lies in the engineered scaffolding built around it.</p><h2>The anatomy of a brilliant harness: Inside Claude Code</h2><p>Claude Code is designed to overcome the key limitations of the underlying model(s) and to build a robust system that enables developers and users to use them for different purposes.</p><p>At the core of Claude Code sits a self-healing query loop built as a state machine. Every AI model has a context window, a strict physical limit on the amount of text it can process at one time. Dumping an entire project&#8217;s history into this window balloons token costs and causes the model to lose track of information and hallucinate.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>To prevent this, the Claude Code query loop dynamically manages state across iterations. It automatically compacts messages to free up tokens. If the model exhausts its output budget mid-task, the harness silently injects instructions to resume without apology. If a tool fails, it steps through a sequence of recovery strategies. The loop absorbs these failures so the user never sees them.</p><p>Claude Code also addresses the ephemeral nature of AI memory through a process that mimics how humans solidify memory when they sleep. Normally, when you close a terminal session, the model forgets your architecture decisions, build commands, and coding patterns. Claude Code solves this with a background daemon called autoDream. After 24 hours of inactivity and at least five sessions, this subagent wakes up. It reads the project&#8217;s memory directory, consolidates learnings, deletes contradictions, and rewrites the memory index. It organizes past context while the developer sleeps so the next session starts faster and with accurate recall.</p><p>The harness enforces strict constraints. Instead of giving the model raw shell access, which is noisy and dangerous, Claude Code provides opinionated, validated tools that run in concurrency-safe batches. Anthropic also built compile-time feature elimination to prevent internal experimental tools from reaching external users.</p><p>The irony of the leak is that the code was dead code eliminated from the executable binary, but the standard build pipeline failed to exclude the source map. Even when orchestrating frontier AI models, standard software engineering practices and build configurations dictate success or failure.</p><h2>The Pareto shift: Industry proof that scaffolds win</h2>
      <p>
          <a href="https://bdtechtalks.substack.com/p/the-art-of-ai-harness-engineering">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How GhostClaw exploits macOS and OpenClaw to steal developer credentials]]></title><description><![CDATA[As developers rush to run local AI agents on Mac Minis, GhostClaw malware exploits macOS binaries to silently harvest credentials.]]></description><link>https://bdtechtalks.substack.com/p/how-ghostclaw-exploits-macos-and</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/how-ghostclaw-exploits-macos-and</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 31 Mar 2026 13:28:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RFi3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb215277-c4ab-4812-99f7-d78d237c47af_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RFi3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb215277-c4ab-4812-99f7-d78d237c47af_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RFi3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb215277-c4ab-4812-99f7-d78d237c47af_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RFi3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb215277-c4ab-4812-99f7-d78d237c47af_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RFi3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb215277-c4ab-4812-99f7-d78d237c47af_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RFi3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb215277-c4ab-4812-99f7-d78d237c47af_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RFi3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb215277-c4ab-4812-99f7-d78d237c47af_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb215277-c4ab-4812-99f7-d78d237c47af_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RFi3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb215277-c4ab-4812-99f7-d78d237c47af_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RFi3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb215277-c4ab-4812-99f7-d78d237c47af_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RFi3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb215277-c4ab-4812-99f7-d78d237c47af_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RFi3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb215277-c4ab-4812-99f7-d78d237c47af_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Threat actors are exploiting the rapid adoption of AI agents by designing malware that targets the agent itself. A new malware campaign, known as GhostClaw or GhostLoader, targets AI-assisted workflows and GitHub repositories to deliver credential-stealing payloads.</p><p>First discovered by <a href="https://research.jfrog.com/post/ghostclaw-unmasked/">JFrog Security Research</a> and later analyzed by <a href="https://www.jamf.com/blog/ghostclaw-ghostloader-malware-github-repositories-ai-workflows/">Jamf Threat Labs</a>, GhostClaw represents a new vector in software supply chain attacks. Instead of exclusively relying on human developers to download malicious packages, the operators build traps for AI agents like OpenClaw to trigger autonomously. Once executed, the malware establishes a persistent Remote Access Trojan (RAT), harvesting system credentials, browser data, developer tokens, and cryptocurrency wallets.</p><p>The campaign preys on the high-level system permissions developers grant to local AI agents. GhostClaw shows how the bot is becoming the primary attack surface and should be a wake-up call for development teams relying on these frameworks to automate coding tasks.</p><h2>The mechanics of GhostClaw</h2><p>To understand how GhostClaw operates, you first need to look at how developers deploy new AI tools. OpenClaw is an open-source AI agent that acts as an autonomous, always-on coding assistant. Because it requires significant compute power to run local models continuously, its popularity has sparked a global surge in Mac Mini sales. Developers use Apple&#8217;s unified memory architecture to host these resource-heavy local AI servers.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>GhostClaw operators designed their campaign to target this specific environment. The malware is heavily optimized for macOS, using native AppleScript and local directories to blend into the background.</p><p>The attack begins with social engineering. Attackers stage GitHub repositories that impersonate legitimate developer utilities, trading bots, or AI plugins. To avoid immediate detection, they leave these repositories benign for an incubation period of five to seven days. During this time, they gather stars and build follower counts to project credibility. After establishing trust, they swap in the malicious payload.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7IIu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc06e57ef-dfc6-423f-b94a-99ec1ac52c19_1280x746.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7IIu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc06e57ef-dfc6-423f-b94a-99ec1ac52c19_1280x746.png 424w, https://substackcdn.com/image/fetch/$s_!7IIu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc06e57ef-dfc6-423f-b94a-99ec1ac52c19_1280x746.png 848w, https://substackcdn.com/image/fetch/$s_!7IIu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc06e57ef-dfc6-423f-b94a-99ec1ac52c19_1280x746.png 1272w, https://substackcdn.com/image/fetch/$s_!7IIu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc06e57ef-dfc6-423f-b94a-99ec1ac52c19_1280x746.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7IIu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc06e57ef-dfc6-423f-b94a-99ec1ac52c19_1280x746.png" width="1280" height="746" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c06e57ef-dfc6-423f-b94a-99ec1ac52c19_1280x746.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:746,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7IIu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc06e57ef-dfc6-423f-b94a-99ec1ac52c19_1280x746.png 424w, https://substackcdn.com/image/fetch/$s_!7IIu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc06e57ef-dfc6-423f-b94a-99ec1ac52c19_1280x746.png 848w, https://substackcdn.com/image/fetch/$s_!7IIu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc06e57ef-dfc6-423f-b94a-99ec1ac52c19_1280x746.png 1272w, https://substackcdn.com/image/fetch/$s_!7IIu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc06e57ef-dfc6-423f-b94a-99ec1ac52c19_1280x746.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For developers using AI frameworks, the trap is set within a SKILL.md file. AI agents use skills to interact with the outside world, such as reading local files, executing shell commands, or managing emails. The SKILL.md format defines these external capabilities for the agent.</p><p>In the GhostClaw repositories, the SKILL.md file contains no malicious code. It defines benign metadata, dependencies, and commands. However, when an AI agent or a human developer follows the repository&#8217;s setup instructions (usually by running an install.sh script or installing dependencies via a package manager) they trigger a multi-stage infection.</p><p>JFrog researchers observed this behavior in malicious npm packages masquerading as OpenClaw installers. The package configuration appears normal, and the exposed source code contains harmless decoy utilities. The actual malware hides in installation scripts that run automatically. Using a postinstall hook (a feature in Node Package Manager (npm) that allows developers to run scripts immediately after a package finishes downloading) the script silently reinstalls the package globally and places a malicious binary on the system path.</p><p>From there, an obfuscated first-stage dropper takes over. It checks the host architecture, verifies the macOS version, and ensures Node.js is installed. If Node.js is missing, the script downloads it using the curl command with a -k flag. This flag tells the system to bypass Transport Layer Security (TLS) verification, allowing the download over an unverified or insecure connection.</p><p>While bypassing TLS might seem like an advanced evasion tactic, it points to a lack of sophisticated infrastructure. Setting up proper TLS certificates requires time and resources.</p><p>&#8220;Honestly, in most cases this comes down to operational laziness on the attacker&#8217;s side rather than any deliberate evasion strategy,&#8221; Jaron Bradley, Director at Jamf Threat Labs, told TechTalks. &#8220;Setting up proper TLS verification requires a level of infrastructure investment that many threat actors simply don&#8217;t bother with &#8212; especially when curl -k gets the job done just as well for their purposes.&#8221;</p><p>To maintain the illusion of a benign package, the malware actively works to keep its victim unsuspecting. The dropper executes a branded, terminal-based installation experience complete with fake progress bars. This manufactured output gives the impression that the OpenClaw agent or the promised developer tool is successfully installing on the host. By simulating a standard, time-consuming dependency installation, the script lulls the developer (or the AI agent monitoring the command-line logs) into a false sense of security right before it launches its credential-stealing prompts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wIAY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6b5062b-914d-4c7b-b6c0-6ab637b5de6e_1280x885.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wIAY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6b5062b-914d-4c7b-b6c0-6ab637b5de6e_1280x885.png 424w, https://substackcdn.com/image/fetch/$s_!wIAY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6b5062b-914d-4c7b-b6c0-6ab637b5de6e_1280x885.png 848w, https://substackcdn.com/image/fetch/$s_!wIAY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6b5062b-914d-4c7b-b6c0-6ab637b5de6e_1280x885.png 1272w, https://substackcdn.com/image/fetch/$s_!wIAY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6b5062b-914d-4c7b-b6c0-6ab637b5de6e_1280x885.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wIAY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6b5062b-914d-4c7b-b6c0-6ab637b5de6e_1280x885.png" width="1280" height="885" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6b5062b-914d-4c7b-b6c0-6ab637b5de6e_1280x885.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:885,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wIAY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6b5062b-914d-4c7b-b6c0-6ab637b5de6e_1280x885.png 424w, https://substackcdn.com/image/fetch/$s_!wIAY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6b5062b-914d-4c7b-b6c0-6ab637b5de6e_1280x885.png 848w, https://substackcdn.com/image/fetch/$s_!wIAY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6b5062b-914d-4c7b-b6c0-6ab637b5de6e_1280x885.png 1272w, https://substackcdn.com/image/fetch/$s_!wIAY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6b5062b-914d-4c7b-b6c0-6ab637b5de6e_1280x885.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Defending the AI workflow</h2><p>GhostClaw&#8217;s execution phase highlights why traditional security tools struggle to detect modern malware. The operators use a technique known as &#8220;Living off the Land&#8221; (LotL). Instead of dropping custom executable files that antivirus software can easily flag, LotL attacks use legitimate, pre-installed operating system tools to carry out malicious actions.</p><p>On UNIX-based platforms like macOS, administrators rely on thousands of native binaries for standard scripting and management. GhostClaw exploits two specific tools: dscl and osascript.</p><p>The dscl (Directory Service command line utility) tool natively creates, reads, and manages directory data. GhostClaw abuses it to silently validate system passwords behind the scenes. The malware also uses osascript, a tool for executing AppleScript, to generate native-looking prompts that trick users into handing over their credentials.</p><p>Because system administrators and power users run heavy, script-based workflows using these exact binaries, the malicious activity looks nearly identical to legitimate work. This overlap makes it difficult for Endpoint Detection and Response (EDR) tools to flag the behavior without generating excessive false positives.</p><p>To detect this activity, security engineers must analyze the full execution chain.</p><p>&#8220;When you see [dscl] being invoked in a context that doesn&#8217;t align with normal admin workflows, that&#8217;s a reliable signal worth investigating,&#8221; Bradley said. &#8220;Security engineers should be looking at the full execution chain: what spawned the process, what user context it ran under, and whether it&#8217;s showing up alongside other suspicious behaviors like unusual osascript prompts or outbound connections shortly after.&#8221;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>For individual developers, defense requires a shift in daily habits. Attackers actively exploit the &#8220;set it and forget it&#8221; mentality inherent in package management. Developers rarely re-examine an open-source package after the initial installation. Preventing an infection requires vetting repositories before running their setup scripts. This means checking the commit history for sudden anomalies, reviewing the code in setup scripts, and remaining skeptical of projects that accumulate stars rapidly without corresponding development activity.</p><p>Developers operating outside the Apple ecosystem must also remain vigilant. While the current GhostClaw campaign is heavily optimized for macOS directories, the malware contains logic to perform automated actions if it discovers it is running on a Windows system. The vectors themselves (GitHub, npm, and AI agents) are entirely platform-agnostic.</p><p>&#8220;The malware was also equipped to perform automated actions if it discovered it was running on a Windows system,&#8221; Bradley said.</p><h2>The evolving attack surface of AI agents</h2><p>The GhostClaw campaign demonstrates a fundamental vulnerability in the current trajectory of AI development. As developers transition from reactive coding copilots to autonomous agents, the security paradigm is fracturing.</p><p>AI agents execute actions on behalf of the user. To function effectively, they require deep system permissions, including shell access, file system control, and browser manipulation. When developers grant an agent these privileges, they implicitly trust that the agent will only execute safe commands.</p><p>GhostClaw flips this dynamic. By hiding malicious execution chains within the standard setup workflows that agents follow, the malware bypasses the human entirely. Tricking the agent achieves the exact same result as tricking the developer.</p><p>&#8220;This shift is already underway,&#8221; notes Bradley. &#8220;Autonomous coding agents have shown enormous promise, and that promise has attracted a wave of developers eager to adopt them &#8212; often before the security implications have been fully thought through. When adoption outpaces security, attackers notice.&#8221;</p><p>This problem is compounded by widespread misconfigurations. Early adopters are eager to deploy autonomous agents but frequently overlook the security implications. Security researchers have already identified <a href="https://www.bitsight.com/blog/openclaw-ai-security-risks-exposed-instances">thousands of OpenClaw instances</a> accidentally exposed to the open internet, running with default settings that listen for external connections without authentication.</p><p>When developers grant an AI agent root access to their machine, any malicious instruction the agent ingests becomes a system-level threat. Autonomous coding agents hold significant promise for productivity, but until the industry implements strict sandboxing and permission controls for AI workflows, the bot will remain a highly lucrative attack surface for threat actors.</p><p>&#8220;As long as that gap exists, we&#8217;ll absolutely see more of this,&#8221; Bradley said. &#8220;Tricking the agent is tricking the human &#8212; the bot becomes the attack surface precisely because developers trust its output implicitly.&#8221;</p>]]></content:encoded></item><item><title><![CDATA[Inside V-JEPA 2.1, the huge upgrade to Meta's world model]]></title><description><![CDATA[AI models have historically struggled to balance motion tracking with spatial detail. Meta&#8217;s V-JEPA 2.1 solves this, pushing the boundaries of video self-supervised learning.]]></description><link>https://bdtechtalks.substack.com/p/inside-v-jepa-21-the-huge-upgrade</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/inside-v-jepa-21-the-huge-upgrade</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Wed, 25 Mar 2026 14:41:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!EZEb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EZEb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EZEb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!EZEb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!EZEb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!EZEb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EZEb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:234365,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/192101246?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EZEb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!EZEb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!EZEb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!EZEb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcb1d199-10fe-42f1-847f-62e851a99058_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Researchers at Meta have released V-JEPA 2.1, the latest iteration of their video world model. Current artificial intelligence models often struggle to simultaneously capture the global dynamics of a video and the fine-grained, local spatial details necessary for precise physical interactions. <a href="https://arxiv.org/abs/2603.14482">V-JEPA 2.1</a> bridges this gap by introducing several innovations to its architecture, training recipe, and training data.</p><p>Experiments show that V-JEPA 2.1 yields much better and faster results in robotic grasping, autonomous navigation of the physical world, predicting object interactions, and estimating 3D depth. These are the kinds of advances that can unlock new applications for AI in the physical world.</p><h2>The state estimation challenge in world models</h2><p>To navigate the unpredictable physical world, AI systems need <a href="https://bdtechtalks.com/tag/world-models/">world models</a> that enable them to perceive their environment, predict future outcomes, and plan their actions effectively. At the core of building these world models is the &#8220;state-estimation&#8221; problem: the AI must learn how to take noisy, low-level perceptual inputs, such as raw pixels from a camera feed, and translate them into a reliable, structured summary of the current world state.</p><p>One of the key approaches for solving the state estimation challenge is <a href="https://bdtechtalks.com/2020/03/23/yann-lecun-self-supervised-learning/">self-supervised learning</a> from video. Instead of relying on humans to painstakingly label every object, depth layer, or action in a dataset, self-supervised learning allows models to learn directly from the raw data itself. If done properly, these models can naturally learn rich representations that capture the fundamental rules of reality, such as scene geometry, object dynamics, and intrinsic physical properties.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Despite the rapid progress in this field, a major hurdle remains. It is incredibly difficult to train a model that simultaneously captures the global dynamics of a scene, which is needed for high-level action recognition, while preserving fine-grained, dense spatio-temporal structures, which are needed for precise tracking, localization, and geometry throughout visual sequences (e.g., video input).</p><p>Currently, the AI community relies on two diverging approaches, each with a significant limitation. On one side are video-first models, like the earlier <a href="https://bdtechtalks.com/2025/04/28/v-jepa-intuitive-physics/">V-JEPA family</a>. Joint Embedding Predictive Architectures (JEPA) have proven highly effective at global video understanding. They excel in environments that require modeling motion and dynamics, making them incredibly promising for embodied agents that need to predict and plan future actions. Their major weakness is that their learned representations struggle to extract fine-grained, local spatial structures, making them less suited for tasks that require understanding pixel-perfect details (e.g., understanding the clear boundaries between objects).</p><p>On the other end of the spectrum are image-first models, such as <a href="https://ai.meta.com/research/dinov3/">DINO</a>. These image-based approaches yield high-quality, dense features that are perfect for precise <a href="https://bdtechtalks.com/2021/06/21/object-detection-deep-learning/">object detection and segmentation</a>. Because these models are primarily trained on static images, they do not directly learn temporal dynamics or motion from video. Developers are left to choose between an AI that understands how things move but lacks precise spatial detail, and an AI that understands where things are but does not grasp motion over time.</p><h2>How V-JEPA 2 works</h2><p>To understand the advances behind V-JEPA 2.1, we first need to look at how its predecessor works. V-JEPA 2 relies on a mask-denoising objective to learn latent representations. Imagine taking a video, chopping it into tiny patches across space and time, and hiding a large chunk of those patches from the model. The model is then tasked with predicting the abstract, mathematical representations of the hidden patches using only the patches it can still see.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W_fi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0f5148-cd49-408f-80c3-3fecd5939a53_1600x750.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W_fi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0f5148-cd49-408f-80c3-3fecd5939a53_1600x750.png 424w, https://substackcdn.com/image/fetch/$s_!W_fi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0f5148-cd49-408f-80c3-3fecd5939a53_1600x750.png 848w, https://substackcdn.com/image/fetch/$s_!W_fi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0f5148-cd49-408f-80c3-3fecd5939a53_1600x750.png 1272w, https://substackcdn.com/image/fetch/$s_!W_fi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0f5148-cd49-408f-80c3-3fecd5939a53_1600x750.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W_fi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0f5148-cd49-408f-80c3-3fecd5939a53_1600x750.png" width="1456" height="682" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b0f5148-cd49-408f-80c3-3fecd5939a53_1600x750.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:682,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W_fi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0f5148-cd49-408f-80c3-3fecd5939a53_1600x750.png 424w, https://substackcdn.com/image/fetch/$s_!W_fi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0f5148-cd49-408f-80c3-3fecd5939a53_1600x750.png 848w, https://substackcdn.com/image/fetch/$s_!W_fi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0f5148-cd49-408f-80c3-3fecd5939a53_1600x750.png 1272w, https://substackcdn.com/image/fetch/$s_!W_fi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b0f5148-cd49-408f-80c3-3fecd5939a53_1600x750.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It does this using a two-part system. First, an encoder processes the visible patches into context tokens. Next, a predictor takes these context tokens, combines them with blank mask tokens that carry information about exactly where and when the missing patches belong, and tries to output the correct representation for those missing pieces. The result is then compared to the encoder values for unmasked video segments from the original video.</p><p>During training, the system uses a loss function to penalize the model for incorrect predictions. In V-JEPA 2, this loss is only applied to the masked tokens. The model receives no supervision or correction on how it encodes the visible context tokens. Because the model isn&#8217;t explicitly forced to ground those visible patches in their exact local, spatial reality, it takes shortcuts that cause it to miss important details in the videos, such as the boundaries between objects. This results in grainy segmentation of the objects.</p><h2>The innovations behind V-JEPA 2.1</h2><p>To fix the shortcut problem of previous models, researchers rebuilt the architecture with four major innovations. First, instead of only evaluating the model on the hidden video patches, V-JEPA 2.1 applies a &#8220;dense predictive loss&#8221; to all tokens. Both the masked pieces it needs to predict and the visible pieces it uses for context are supervised. This forces the model to ground every single token in its precise spatial and temporal location and learn higher quality representations.</p><p>Normally, a model&#8217;s loss is only calculated at the very end of its processing network. V-JEPA 2.1 introduces &#8220;deep self-supervision,&#8221; which applies this loss hierarchically at multiple intermediate layers of the encoder. This allows local spatial information to flow more effectively into the final layers, improving performance across both fine-grained and high-level vision tasks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OT0W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff943b048-4728-4ced-be23-2a4f6b6e9d7c_1600x724.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OT0W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff943b048-4728-4ced-be23-2a4f6b6e9d7c_1600x724.png 424w, https://substackcdn.com/image/fetch/$s_!OT0W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff943b048-4728-4ced-be23-2a4f6b6e9d7c_1600x724.png 848w, https://substackcdn.com/image/fetch/$s_!OT0W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff943b048-4728-4ced-be23-2a4f6b6e9d7c_1600x724.png 1272w, https://substackcdn.com/image/fetch/$s_!OT0W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff943b048-4728-4ced-be23-2a4f6b6e9d7c_1600x724.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OT0W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff943b048-4728-4ced-be23-2a4f6b6e9d7c_1600x724.png" width="1456" height="659" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f943b048-4728-4ced-be23-2a4f6b6e9d7c_1600x724.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:659,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OT0W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff943b048-4728-4ced-be23-2a4f6b6e9d7c_1600x724.png 424w, https://substackcdn.com/image/fetch/$s_!OT0W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff943b048-4728-4ced-be23-2a4f6b6e9d7c_1600x724.png 848w, https://substackcdn.com/image/fetch/$s_!OT0W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff943b048-4728-4ced-be23-2a4f6b6e9d7c_1600x724.png 1272w, https://substackcdn.com/image/fetch/$s_!OT0W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff943b048-4728-4ced-be23-2a4f6b6e9d7c_1600x724.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Earlier video models handled static images awkwardly by essentially duplicating an image 16 times to mimic a video clip, a method that wasted compute power and confused the model by treating static images as video. V-JEPA 2.1 introduces &#8220;modality-specific tokenizers,&#8221; using a 2D processor for images and a 3D processor for video. Both feed into a single, shared encoder that can handle both formats in their native form.</p><p>Finally, the researchers proved that these architectural upgrades can scale effectively. They expanded the training data to 163 million images and videos, and increased the model&#8217;s size from 300 million to 2 billion parameters. This combination led to across-the-board performance gains in real-world downstream applications.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The V-JEPA 2.1 family includes multiple variants trained on this massive dataset. The flagship model is ViT-G, featuring 2 billion parameters and offering the best performance. A slightly smaller but still highly capable model, ViT-g, operates with 1 billion parameters. The researchers also used <a href="https://bdtechtalks.com/2023/09/18/what-is-llm-compression/">model distillation</a> to compress the knowledge of the massive ViT-G model into smaller, highly efficient variants. These include ViT-L, a distilled model with 300 million parameters, and ViT-B, the most lightweight variant with 80 million parameters.</p><h2>V-JEPA 2.1 in action</h2><p>The researchers evaluated V-JEPA 2.1 against the industry&#8217;s best visual AI models, including its predecessor V-JEPA 2, video models like InternVideo2, and image foundation models like DINOv2 and the 7-billion parameter DINOv3.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XcLB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51bcc5e8-d60e-4e70-9121-b0253015a426_1600x648.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XcLB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51bcc5e8-d60e-4e70-9121-b0253015a426_1600x648.png 424w, https://substackcdn.com/image/fetch/$s_!XcLB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51bcc5e8-d60e-4e70-9121-b0253015a426_1600x648.png 848w, https://substackcdn.com/image/fetch/$s_!XcLB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51bcc5e8-d60e-4e70-9121-b0253015a426_1600x648.png 1272w, https://substackcdn.com/image/fetch/$s_!XcLB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51bcc5e8-d60e-4e70-9121-b0253015a426_1600x648.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XcLB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51bcc5e8-d60e-4e70-9121-b0253015a426_1600x648.png" width="1456" height="590" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51bcc5e8-d60e-4e70-9121-b0253015a426_1600x648.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:590,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XcLB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51bcc5e8-d60e-4e70-9121-b0253015a426_1600x648.png 424w, https://substackcdn.com/image/fetch/$s_!XcLB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51bcc5e8-d60e-4e70-9121-b0253015a426_1600x648.png 848w, https://substackcdn.com/image/fetch/$s_!XcLB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51bcc5e8-d60e-4e70-9121-b0253015a426_1600x648.png 1272w, https://substackcdn.com/image/fetch/$s_!XcLB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51bcc5e8-d60e-4e70-9121-b0253015a426_1600x648.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The model was deployed zero-shot into a table-top Franka Panda robotic arm to perform reach, grasp, and pick-and-place tasks. V-JEPA 2.1 achieved a 20 percent improvement in grasping success rate over V-JEPA 2. Previous models failed because they could not fully comprehend depth, leading them to close their grippers too early or open them mid-transit and drop the object. V-JEPA 2.1&#8217;s rich, pixel-level depth understanding allows robots to physically interact with objects fluidly and reliably.</p><p>In autonomous navigation tests, the model was tasked with moving toward a visual goal using latent world models on datasets like Tartan Drive, Scand, and Sacson. V-JEPA 2.1 achieved state-of-the-art trajectory accuracy on Tartan Drive, while planning 10 times faster than previous records. It reduced the required internal simulation steps from 128 down to just 8, dropping planning time from over 100 seconds to 10.6 seconds. This has important implications for autonomous navigation applications, where speed and precision can make a huge difference, such as autonomous drone rescue.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wGVP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17416c54-1368-4f01-a7b7-52bc8a1d9069_986x332.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wGVP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17416c54-1368-4f01-a7b7-52bc8a1d9069_986x332.png 424w, https://substackcdn.com/image/fetch/$s_!wGVP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17416c54-1368-4f01-a7b7-52bc8a1d9069_986x332.png 848w, https://substackcdn.com/image/fetch/$s_!wGVP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17416c54-1368-4f01-a7b7-52bc8a1d9069_986x332.png 1272w, https://substackcdn.com/image/fetch/$s_!wGVP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17416c54-1368-4f01-a7b7-52bc8a1d9069_986x332.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wGVP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17416c54-1368-4f01-a7b7-52bc8a1d9069_986x332.png" width="986" height="332" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17416c54-1368-4f01-a7b7-52bc8a1d9069_986x332.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:332,&quot;width&quot;:986,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wGVP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17416c54-1368-4f01-a7b7-52bc8a1d9069_986x332.png 424w, https://substackcdn.com/image/fetch/$s_!wGVP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17416c54-1368-4f01-a7b7-52bc8a1d9069_986x332.png 848w, https://substackcdn.com/image/fetch/$s_!wGVP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17416c54-1368-4f01-a7b7-52bc8a1d9069_986x332.png 1272w, https://substackcdn.com/image/fetch/$s_!wGVP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17416c54-1368-4f01-a7b7-52bc8a1d9069_986x332.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>They then tested the model on forecasting human actions using first-person video datasets like Ego4D and EPIC-KITCHENS-100. V-JEPA 2.1 showed massive improvements over incumbents on both benchmarks, marking a 35 percent relative improvement over the previous state-of-the-art in Ego4D and setting a new record on EPIC-KITCHENS. This translates well to augmented reality applications and collaborative AI, where an assistant can predict human actions and provide real-time information or interventions exactly when needed.</p><p>When testing the model&#8217;s ability to map 3D geometric structures from 2D images using the NYUv2 dataset and define strict object boundaries on the ADE20K dataset, V-JEPA 2.1 improved drastically over V-JEPA 2. It even outperformed the much larger DINOv3 model on depth estimation. This is important for self-driving cars and mixed-reality headsets. A vehicle needs to instantly distinguish between a flat painting of a person on the side of a truck and a real 3D pedestrian crossing the street.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oDfW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4a03a1-a281-45e8-89e4-0ef5704de05f_1545x742.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oDfW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4a03a1-a281-45e8-89e4-0ef5704de05f_1545x742.png 424w, https://substackcdn.com/image/fetch/$s_!oDfW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4a03a1-a281-45e8-89e4-0ef5704de05f_1545x742.png 848w, https://substackcdn.com/image/fetch/$s_!oDfW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4a03a1-a281-45e8-89e4-0ef5704de05f_1545x742.png 1272w, https://substackcdn.com/image/fetch/$s_!oDfW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4a03a1-a281-45e8-89e4-0ef5704de05f_1545x742.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oDfW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4a03a1-a281-45e8-89e4-0ef5704de05f_1545x742.png" width="1456" height="699" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec4a03a1-a281-45e8-89e4-0ef5704de05f_1545x742.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:699,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oDfW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4a03a1-a281-45e8-89e4-0ef5704de05f_1545x742.png 424w, https://substackcdn.com/image/fetch/$s_!oDfW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4a03a1-a281-45e8-89e4-0ef5704de05f_1545x742.png 848w, https://substackcdn.com/image/fetch/$s_!oDfW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4a03a1-a281-45e8-89e4-0ef5704de05f_1545x742.png 1272w, https://substackcdn.com/image/fetch/$s_!oDfW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec4a03a1-a281-45e8-89e4-0ef5704de05f_1545x742.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The model was also evaluated on tracking a specific object across frames using the YouTube-VOS dataset and global action recognition via the Something-Something-v2 benchmark. The model performed impressively and set a new record on Something-Something-v2. In applications like sports broadcasting or security surveillance, subjects move fast, change shape, and become temporarily hidden behind other objects. V-JEPA 2.1 features are temporally consistent enough that, for example, a camera system could lock onto a specific hockey player and maintain a flawless tracking mask through rapid camera pans and visual distractions.</p><p>There is more work to be done. The current iteration of V-JEPA 2.1 is heavily focused on learning better visual representations. While V-JEPA 2 explored building complete world models on top of these representations, fully realizing world models with the new dense prediction capabilities of V-JEPA 2.1 remains an ongoing area of research.</p><p>The researchers have made <a href="https://github.com/facebookresearch/vjepa2">their code</a> and pretrained models publicly available to facilitate further research and applications. As the team notes, &#8220;We hope that these contributions will foster research in learning strong representations for physical world modelling, while empowering many applications in video understanding&#8221;.</p>]]></content:encoded></item><item><title><![CDATA[AI won't kill SaaS, but major shifts are coming]]></title><description><![CDATA[The recent tech selloff sparked fears of a SaaSpocalypse. Here is why the death of software subscriptions is a myth, and how AI agents are creating a developer boom.]]></description><link>https://bdtechtalks.substack.com/p/ai-wont-kill-saas-but-major-shifts</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/ai-wont-kill-saas-but-major-shifts</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 17 Mar 2026 14:13:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pzmh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pzmh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pzmh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!pzmh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!pzmh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!pzmh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pzmh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:311558,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/191255842?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pzmh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!pzmh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!pzmh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!pzmh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85016d42-4bec-47c9-a016-6db548aa816c_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>In early February 2026, the release of Anthropic&#8217;s Claude Cowork triggered a <a href="https://fortune.com/2026/02/06/anthropic-claude-opus-4-6-stock-selloff-new-upgrade/">massive selloff</a> in the technology sector, wiping $285 billion off the market capitalization of major software stocks in a single day. Investors panicked. The prevailing narrative suggested that AI agents and <a href="https://bdtechtalks.com/2025/04/09/demystifying-vibe-coding/">vibe coding</a> would immediately eradicate the need for traditional software as a service (SaaS).</p><p>The market assumed that anyone with an internet connection could simply ask a large language model to generate a custom enterprise resource planning system or communication platform on the fly. This is a sentiment that has been ebbing and flowing regularly since the release of ChatGPT.</p><p>A closer look at the very companies building these revolutionary models reveals a different reality. The leading artificial intelligence laboratories still rely heavily on established SaaS products to run their daily operations (both CEOs of <a href="https://www.businessinsider.com/openai-sam-altman-slack-his-most-used-app-not-chatgpt-2024-1">OpenAI</a> and <a href="https://www.youtube.com/shorts/HAQTMMzHUd8">Anthropic</a> have been on record saying their organization uses Slack). They have access to the most advanced code-generation tools on the planet, yet they continue to pay for off-the-shelf software. They do this because enterprise software involves much more than generating a functional user interface.</p><p>The immediate panic ignored the structural realities of enterprise IT, but it did signal a genuine, permanent shift in the software market. SaaS is not going away, but the dynamics will surely change, and software companies need to adapt.</p><h2>The shifting economics of buy versus build</h2><p>The tension between buying pre-packaged software and building custom internal tools always comes down to fundamental economics. Historically, SaaS provided immense value because building proprietary software was prohibitively expensive.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>A standard communication platform like Slack or project management tool like Notion typically costs around $20 per seat every month. A company with 100 employees spends roughly $24,000 annually for that single service. Developing, deploying, and maintaining a proprietary alternative requires hiring a team of dedicated developers, securing server architecture, and managing continuous updates. In the pre-AI era, the million-dollar price tag of an in-house build made the $24,000 SaaS subscription an obvious, unavoidable business expense.</p><p>Artificial intelligence fundamentally changes the math behind this calculation. Generative coding tools drastically reduce the initial friction and financial burden of software development. Industry data from AppDirect shows that vibe-coding and AI-enabled development can drive <a href="https://www.appdirect.com/blog/build-vs-buy-software-how-ai-enabled-software-development-and-vibe-coding-are-changing-the-game#tl;dr">up to a 70% reduction</a> in overall development costs. This cost compression shifts the breakeven point between buying and building.</p><p>Consider a mid-market organization with 300 employees relying on a standard enterprise stack of communication, customer relationship management, human resources, and project management tools. If the average cost across these disparate platforms totals $150 per seat each month, the company faces an annual software expenditure of $540,000.</p><p>With AI lowering the barrier to entry, that same company can alter its strategy. Instead of renewing expensive vendor contracts, the organization can hire two highly experienced software engineers to act as AI orchestrators. Paying those engineers fully loaded salaries totaling $320,000, plus an estimated $60,000 annually for cloud hosting and API token consumption, brings the total in-house build cost to $380,000. The company saves money while gaining fully customized, compliant internal tools. The massive profit margins traditional SaaS companies have enjoyed for the last decade face severe downward pressure as these alternatives become accessible.</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/unclebobmartin/status/2031354684145873192&quot;,&quot;full_text&quot;:&quot;AI agents have vastly changed the build vs buy calculus. The vast majority of tools can be built at virtually no cost.&quot;,&quot;username&quot;:&quot;unclebobmartin&quot;,&quot;name&quot;:&quot;Uncle Bob Martin&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/1985766360924704768/wtFmI695_normal.jpg&quot;,&quot;date&quot;:&quot;2026-03-10T13:01:12.000Z&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:85,&quot;retweet_count&quot;:22,&quot;like_count&quot;:326,&quot;impression_count&quot;:32403,&quot;expanded_url&quot;:null,&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><h2>The middle-market void and the rise of CSaaS</h2><p>The transition away from standard SaaS will not happen uniformly across the business landscape. Massive enterprises possess the capital and infrastructure to bring software development entirely in-house. Meanwhile, very small companies and early-stage startups will likely stick with cheap, off-the-shelf SaaS products because their SaaS costs still don&#8217;t justify building and running their own software, and dedicating any internal resources to software maintenance remains a distraction from their core business.</p>
      <p>
          <a href="https://bdtechtalks.substack.com/p/ai-wont-kill-saas-but-major-shifts">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How C-JEPA gives AI causal world models]]></title><description><![CDATA[By forcing AI to understand cause and effect instead of just predicting pixels, C-JEPA is laying the groundwork for smarter, more predictable autonomous systems.]]></description><link>https://bdtechtalks.substack.com/p/inside-c-jepa-the-architecture-that</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/inside-c-jepa-the-architecture-that</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 10 Mar 2026 16:17:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!F3FB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F3FB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F3FB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!F3FB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!F3FB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!F3FB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F3FB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184025,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/190523167?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F3FB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!F3FB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!F3FB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!F3FB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d5b254c-9c51-4dbe-bbee-38778623f037_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Picture a baseball flying toward a swinging bat. A human observer can effortlessly predict that the ball will abruptly change direction and speed upon impact. We possess an intuitive grasp of physics and causality. For artificial intelligence, however, predicting this kind of interaction-dependent object dynamics is incredibly difficult. Learning the causal relations and interactions of objects remains a key challenge for AI systems.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9PYD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92b3670c-960c-474d-8240-a742570a0c7f_640x360.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9PYD!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92b3670c-960c-474d-8240-a742570a0c7f_640x360.gif 424w, https://substackcdn.com/image/fetch/$s_!9PYD!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92b3670c-960c-474d-8240-a742570a0c7f_640x360.gif 848w, https://substackcdn.com/image/fetch/$s_!9PYD!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92b3670c-960c-474d-8240-a742570a0c7f_640x360.gif 1272w, https://substackcdn.com/image/fetch/$s_!9PYD!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92b3670c-960c-474d-8240-a742570a0c7f_640x360.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9PYD!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92b3670c-960c-474d-8240-a742570a0c7f_640x360.gif" width="640" height="360" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/92b3670c-960c-474d-8240-a742570a0c7f_640x360.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:360,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;baseball bat hitting ball&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="baseball bat hitting ball" title="baseball bat hitting ball" srcset="https://substackcdn.com/image/fetch/$s_!9PYD!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92b3670c-960c-474d-8240-a742570a0c7f_640x360.gif 424w, https://substackcdn.com/image/fetch/$s_!9PYD!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92b3670c-960c-474d-8240-a742570a0c7f_640x360.gif 848w, https://substackcdn.com/image/fetch/$s_!9PYD!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92b3670c-960c-474d-8240-a742570a0c7f_640x360.gif 1272w, https://substackcdn.com/image/fetch/$s_!9PYD!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92b3670c-960c-474d-8240-a742570a0c7f_640x360.gif 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Enter <a href="https://arxiv.org/abs/2602.11389">C-JEPA</a>, a new world model built on the Joint Embedding Predictive Architecture (JEPA). C-JEPA is designed to tackle these exact object dynamics. It embeds a causal inductive bias directly into the learning process, preventing the AI from adopting meaningless shortcuts. By doing so, it enables the model to reason accurately about object interactions and handle counterfactual scenarios.</p><p>In empirical tests, C-JEPA demonstrated significant improvements in visual question answering, particularly excelling at counterfactual reasoning when compared to other architectures (e.g., what happens if the bat misses the ball). Beyond reasoning, C-JEPA offers a highly efficient framework for building interaction-aware AI applications. It can execute complex predictive control tasks over eight times faster than standard models, using roughly 1% of the typical input features, which drastically reduces computational and memory overhead.</p><p>While the model still needs to show its mettle in the messiness of the real world, it could be an important milestone for AI in the physical world, such as robotics and self-driving cars.</p><h2>Object-centric world models</h2><p>To understand why C-JEPA matters, we have to look at how AI understands its environment. A &#8220;world model&#8221; serves as an AI system&#8217;s internal simulation of the physical space it operates in. Instead of simply reacting to immediate inputs frame-by-frame, an AI equipped with a world model learns the underlying rules and dynamics of a complex environment. The primary advantage of a world model is that it operates within a compressed &#8220;latent space.&#8221; Rather than trying to predict the future pixel-by-pixel (which is a computationally expensive and brittle process when dealing with high-dimensional observations like video feeds), the model works with abstract, mathematically dense representations. This makes prediction much more efficient and precise.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>However, if a model attempts to learn physics directly from raw pixels, it encounters severe visual noise. It might associate the wrong elements with motion or causation, getting confused by a moving shadow, a sudden change in lighting, or background static. For a world model to be effective, it must filter out this noise and focus on what actually drives the scene.</p><p>This is the problem object-centric models solve. Instead of learning representations from uniform grids of pixels, these models abstract the visual data into distinct entities. They parse the world much like humans and animals do, recognizing that a scene consists of separate objects (e.g., a ball, a block, and a table) rather than an array of color patches. By treating objects as whole entities, these models become significantly more efficient and theoretically more capable of understanding a scene.</p><p>Yet, current object-centric world modeling approaches face a major hurdle. While these models successfully break a video down into distinct entities, simply handing this roster of objects to an AI does not guarantee it will learn how they actually affect one another. In practice, when predicting what an object will do next, an AI might rely on &#8220;object self-dynamics.&#8221; It looks at a specific object&#8217;s past trajectory and blindly extrapolates it forward, completely ignoring the rest of the environment. If a ball is flying toward a swinging bat, a lazy AI model predicts that the ball just keeps flying right through the bat, failing to predict the impending collision and change in course.</p><p>In other cases, the model might exploit incidental correlations, which are spurious, coincidental patterns in the training data. For example, if in the training data, a robot arm always happens to move when a specific block is on the table, the model might falsely learn that the block&#8217;s presence caused the arm to move. It is exploiting a coincidence rather than learning the <a href="https://bdtechtalks.com/2021/03/15/machine-learning-causality/">actual causal structure</a> of the physical world.</p><h2>Learning causal world models</h2><p>Acknowledging the shortcomings of current models, researchers have developed several clever ways to force AI to pay attention to causality and interactions. One method involves separating temporal dynamics from object interactions. This approach hard-wires the neural network to explicitly divide its processing. It uses one pathway to calculate an object&#8217;s independent motion over time and a completely separate pathway to calculate how it affects other objects.</p><p>Another technique is attention sparsity. Neural networks naturally try to look at everything at once. This technique mathematically restricts the model&#8217;s attention, forcing it to focus only on the most critical, genuine interactions rather than background noise.</p><p>Developers also rely on graph structures, which involves giving the AI a predefined map. They impose a fixed relational graph on the data, explicitly telling the model which objects are connected or allowed to interact in the environment. Finally, instead of building a fundamentally smarter general world model, some approaches rely on task-specific methods tailored only to the final application, such as a specific robotic reinforcement learning task, to handle the physics.</p><p>The main problem with these approaches is that they act as structural constraints or external patches rather than fundamental changes to what the model is actually trying to learn.</p><p>The open question that the C-JEPA researchers set out to answer is how to adjust the training process to force the model to learn interactions and causality naturally. In machine learning, the learning objective is the ultimate goal the model optimizes for during training. If the objective is simply to reconstruct the next frame, the model will find the laziest shortcut to do that. The holy grail is to design a training objective where the only mathematical way for the model to succeed is to deeply understand the causal web of interactions.</p><h2>C-JEPA</h2><p>Before diving into C-JEPA, it helps to understand the foundation it is built upon: the Joint Embedding Predictive Architecture. Introduced by <a href="https://bdtechtalks.com/2022/03/07/yann-lecun-ai-self-supervised-learning/">Yann LeCun</a>, JEPA is a <a href="https://bdtechtalks.com/2020/03/23/yann-lecun-self-supervised-learning/">self-supervised learning architecture</a>. Self-supervised models don&#8217;t require labeled training examples. Traditional self-supervised models often rely on granular reconstruction objectives. For example, if you&#8217;re building a model to understand video frames, during training, you hide a patch of a video and the model tries to redraw the exact missing pixels. This creates massive computational overhead, especially for tasks where pixel-level prediction is unnecessary, such as autonomous driving or robotics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f9t2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d062cd-b991-4ae4-bbf9-8245c7b442dc_696x392.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f9t2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d062cd-b991-4ae4-bbf9-8245c7b442dc_696x392.jpeg 424w, https://substackcdn.com/image/fetch/$s_!f9t2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d062cd-b991-4ae4-bbf9-8245c7b442dc_696x392.jpeg 848w, https://substackcdn.com/image/fetch/$s_!f9t2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d062cd-b991-4ae4-bbf9-8245c7b442dc_696x392.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!f9t2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d062cd-b991-4ae4-bbf9-8245c7b442dc_696x392.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f9t2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d062cd-b991-4ae4-bbf9-8245c7b442dc_696x392.jpeg" width="696" height="392" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/24d062cd-b991-4ae4-bbf9-8245c7b442dc_696x392.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:392,&quot;width&quot;:696,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;JEPA model&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="JEPA model" title="JEPA model" srcset="https://substackcdn.com/image/fetch/$s_!f9t2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d062cd-b991-4ae4-bbf9-8245c7b442dc_696x392.jpeg 424w, https://substackcdn.com/image/fetch/$s_!f9t2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d062cd-b991-4ae4-bbf9-8245c7b442dc_696x392.jpeg 848w, https://substackcdn.com/image/fetch/$s_!f9t2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d062cd-b991-4ae4-bbf9-8245c7b442dc_696x392.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!f9t2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24d062cd-b991-4ae4-bbf9-8245c7b442dc_696x392.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">JEPA architecture (source: Meta)</figcaption></figure></div><p>JEPA abandons the granular reconstruction objective. Instead, it learns to predict outcomes within a compressed representation space. The system uses an encoder (a neural network component that compresses raw inputs like video frames) into abstract, mathematical representations called &#8220;latent embeddings.&#8221; These embeddings capture the core semantic structure of the data rather than the visual noise.</p><p>When parts of the data are masked or hidden, the JEPA predictor model does not try to generate a visible image of the missing content. Instead, it calculates what the mathematical embedding of the missing part should be, operating in a much smaller dimensional space. The model&#8217;s predicted embedding is then compared directly against the actual, true embedding produced by a frozen target encoder, and the system optimizes its parameters to align them.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Because JEPA focuses on modeling the predictive relationships between abstract concepts, it doesn&#8217;t need a heavy image decoder to translate its thoughts back into raw pixels. This decoder-free design makes JEPA-style models highly compute-efficient. The representations it learns are low-dimension and suitable for rapid autonomous decision-making, planning, and robust world modeling.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eUfL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754e158-5cac-44a2-aa8a-7d9c114516e6_792x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eUfL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754e158-5cac-44a2-aa8a-7d9c114516e6_792x660.png 424w, https://substackcdn.com/image/fetch/$s_!eUfL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754e158-5cac-44a2-aa8a-7d9c114516e6_792x660.png 848w, https://substackcdn.com/image/fetch/$s_!eUfL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754e158-5cac-44a2-aa8a-7d9c114516e6_792x660.png 1272w, https://substackcdn.com/image/fetch/$s_!eUfL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754e158-5cac-44a2-aa8a-7d9c114516e6_792x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eUfL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754e158-5cac-44a2-aa8a-7d9c114516e6_792x660.png" width="792" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0754e158-5cac-44a2-aa8a-7d9c114516e6_792x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:792,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eUfL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754e158-5cac-44a2-aa8a-7d9c114516e6_792x660.png 424w, https://substackcdn.com/image/fetch/$s_!eUfL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754e158-5cac-44a2-aa8a-7d9c114516e6_792x660.png 848w, https://substackcdn.com/image/fetch/$s_!eUfL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754e158-5cac-44a2-aa8a-7d9c114516e6_792x660.png 1272w, https://substackcdn.com/image/fetch/$s_!eUfL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0754e158-5cac-44a2-aa8a-7d9c114516e6_792x660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">C-JEPA training (source: arXiv)</figcaption></figure></div><p>However, previous versions of JEPA such as <a href="https://venturebeat.com/ai/meta-releases-i-jepa-a-machine-learning-model-that-learns-high-level-abstractions-from-images">I-JEPA</a> (for images) and <a href="https://bdtechtalks.com/2025/04/28/v-jepa-intuitive-physics/">V-JEPA</a> (for videos) used masking techniques that operated on random patches of an image. If you hide a square of a video frame, the AI just learns to fill in missing textures or local pixel patterns. This teaches the AI local correlations, but it completely fails to teach it about physics or object-level interactions (as we talked about earlier).</p><p>C-JEPA fixes this limitation by masking at the entity level. Instead of hiding random pixels, the system identifies an entire object, like our baseball, and masks its latent trajectory across a window of time. It leaves behind only a minimal identity anchor so the model knows what is missing, but hides what the object is currently doing. C-JEPA must then predict the trajectory of the missing objects in latent space. For example, imaging showing you the first few frames of the flying ball and the swinging bat, and then hiding the ball and asking you to guess what happens next.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HNnb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77bdcdad-2f40-4c41-a1a2-8d7ac443d632_400x300.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HNnb!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77bdcdad-2f40-4c41-a1a2-8d7ac443d632_400x300.gif 424w, https://substackcdn.com/image/fetch/$s_!HNnb!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77bdcdad-2f40-4c41-a1a2-8d7ac443d632_400x300.gif 848w, https://substackcdn.com/image/fetch/$s_!HNnb!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77bdcdad-2f40-4c41-a1a2-8d7ac443d632_400x300.gif 1272w, https://substackcdn.com/image/fetch/$s_!HNnb!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77bdcdad-2f40-4c41-a1a2-8d7ac443d632_400x300.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HNnb!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77bdcdad-2f40-4c41-a1a2-8d7ac443d632_400x300.gif" width="400" height="300" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77bdcdad-2f40-4c41-a1a2-8d7ac443d632_400x300.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:300,&quot;width&quot;:400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;billiard balls&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="billiard balls" title="billiard balls" srcset="https://substackcdn.com/image/fetch/$s_!HNnb!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77bdcdad-2f40-4c41-a1a2-8d7ac443d632_400x300.gif 424w, https://substackcdn.com/image/fetch/$s_!HNnb!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77bdcdad-2f40-4c41-a1a2-8d7ac443d632_400x300.gif 848w, https://substackcdn.com/image/fetch/$s_!HNnb!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77bdcdad-2f40-4c41-a1a2-8d7ac443d632_400x300.gif 1272w, https://substackcdn.com/image/fetch/$s_!HNnb!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77bdcdad-2f40-4c41-a1a2-8d7ac443d632_400x300.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Under this framing, it is mathematically impossible for the model to use trivial temporal interpolation. The lazy momentum shortcut of assuming an object keeps moving in a straight line is no longer viable. The only way the model can successfully minimize its prediction error is by analyzing the other objects in the scene. For example, in the sequence below, if you hide the yellow ball&#8217;s trajectory and the AI sees the white ball suddenly bounce away, it is forced to infer that the hidden ball must have collided with it. Interaction reasoning becomes functionally necessary to solve the puzzle.</p><p>To parse a raw video into distinct entities, C-JEPA relies on a frozen object-centric encoder. In their experiments, the researchers used a model called VideoSAUR, which is built on top of Meta&#8217;s <a href="https://dinov2.metademolab.com/">DINOv2</a> vision model.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tfQ6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1dd072-ba8a-49f9-9f91-5e8131278c6e_793x434.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tfQ6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1dd072-ba8a-49f9-9f91-5e8131278c6e_793x434.png 424w, https://substackcdn.com/image/fetch/$s_!tfQ6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1dd072-ba8a-49f9-9f91-5e8131278c6e_793x434.png 848w, https://substackcdn.com/image/fetch/$s_!tfQ6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1dd072-ba8a-49f9-9f91-5e8131278c6e_793x434.png 1272w, https://substackcdn.com/image/fetch/$s_!tfQ6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1dd072-ba8a-49f9-9f91-5e8131278c6e_793x434.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tfQ6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1dd072-ba8a-49f9-9f91-5e8131278c6e_793x434.png" width="793" height="434" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f1dd072-ba8a-49f9-9f91-5e8131278c6e_793x434.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:434,&quot;width&quot;:793,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tfQ6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1dd072-ba8a-49f9-9f91-5e8131278c6e_793x434.png 424w, https://substackcdn.com/image/fetch/$s_!tfQ6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1dd072-ba8a-49f9-9f91-5e8131278c6e_793x434.png 848w, https://substackcdn.com/image/fetch/$s_!tfQ6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1dd072-ba8a-49f9-9f91-5e8131278c6e_793x434.png 1272w, https://substackcdn.com/image/fetch/$s_!tfQ6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f1dd072-ba8a-49f9-9f91-5e8131278c6e_793x434.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">C-JEPA object masking (source: arXiv)</figcaption></figure></div><p>The researchers note that C-JEPA can also incorporate auxiliary variables. This means the model isn&#8217;t designed to be a passive video watcher. Developers can plug in a robot&#8217;s actual actions or proprioception (i.e., data from its internal joint sensors) as distinct inputs alongside the visual objects. Because of this flexible architecture, the model learns how its own physical commands intervene in the scene, making it highly effective for complex robotic planning and control.</p><h2>C-JEPA in action</h2><p>The researchers evaluated C-JEPA on visual reasoning and predictive control tasks.</p><p>For visual reasoning, the team tested C-JEPA on <a href="https://bdtechtalks.com/2020/05/04/clevrer-dataset-ai-video-reasoning/">CLEVRER</a>, a synthetic video question-answering benchmark built around complex, multi-object collisions. The AI is required to watch a video and answer descriptive, predictive, explanatory, and counterfactual questions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fAF0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff71e6823-44e9-4436-adc9-b30dd18bc17e_558x363.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fAF0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff71e6823-44e9-4436-adc9-b30dd18bc17e_558x363.png 424w, https://substackcdn.com/image/fetch/$s_!fAF0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff71e6823-44e9-4436-adc9-b30dd18bc17e_558x363.png 848w, https://substackcdn.com/image/fetch/$s_!fAF0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff71e6823-44e9-4436-adc9-b30dd18bc17e_558x363.png 1272w, https://substackcdn.com/image/fetch/$s_!fAF0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff71e6823-44e9-4436-adc9-b30dd18bc17e_558x363.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fAF0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff71e6823-44e9-4436-adc9-b30dd18bc17e_558x363.png" width="558" height="363" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f71e6823-44e9-4436-adc9-b30dd18bc17e_558x363.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:363,&quot;width&quot;:558,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fAF0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff71e6823-44e9-4436-adc9-b30dd18bc17e_558x363.png 424w, https://substackcdn.com/image/fetch/$s_!fAF0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff71e6823-44e9-4436-adc9-b30dd18bc17e_558x363.png 848w, https://substackcdn.com/image/fetch/$s_!fAF0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff71e6823-44e9-4436-adc9-b30dd18bc17e_558x363.png 1272w, https://substackcdn.com/image/fetch/$s_!fAF0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff71e6823-44e9-4436-adc9-b30dd18bc17e_558x363.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While C-JEPA showed consistent improvements across all question types, the standout metric was its ability to handle counterfactuals, such as predicting what would happen if a specific object were removed from the scene. When compared against OC-JEPA, a baseline model using the exact same architecture but without the object-level history masking, C-JEPA achieved a massive 20% absolute gain in counterfactual reasoning accuracy.</p><p>For predictive control, they deployed the model in the Push-T manipulation task . This is a robotic control environment where an agent must push a T-shaped block into a specific target goal using contact-rich interactions. They pitted C-JEPA against DINO-WM, a heavy, state-of-the-art patch-based world model. While both models successfully completed the control task at comparable rates, C-JEPA accomplished this using only 1.02% of the total input feature size, showing immense efficiency gains.</p><p>Because C-JEPA compresses the scene into such a tiny, mathematically dense object-centric footprint, calculating future rollouts requires exponentially less compute. In head-to-head testing on a single GPU, C-JEPA executed its model predictive control plans over eight times faster than DINO-WM. It took just 673 seconds to evaluate 50 trajectories, compared to DINO-WM&#8217;s 5,763 seconds.</p><p>C-JEPA is not without flaws. One of its key limitations of C-JEPA is that its performance ceiling is strictly bound by the quality and fidelity of the underlying object-centric encoder used to parse the scene. If the upstream encoder (i.e., VideoSAUR in their current architecture) struggles to perfectly separate objects, it weakens the intended causal intervention of the masking process.</p><p>The model also needs to be tested and adapted to more complex environments. While C-JEPA showed massive improvements in benchmarks like CLEVRER and Push-T, these are still relatively constrained environments. The authors explicitly note that evaluating C-JEPA in much more complex environments with richer, highly unpredictable interactions is a necessary next step to prove its viability as a universal world model.</p><p>JEPA is a very promising and underexplored area of research. With LeCun recently <a href="https://bdtechtalks.com/2025/11/24/what-is-next-for-yann-lecun-after-his-departure-from-meta/">having left Meta</a> to focus on JEPA-inspired world models, we can expect more exciting advances and applications in the field.</p>]]></content:encoded></item><item><title><![CDATA[Inside FlashOptim, the new trick that cuts LLM training memory by 50 percent]]></title><description><![CDATA[Training large language models usually requires a cluster of GPUs. FlashOptim changes the math, enabling full-parameter training on fewer accelerators.]]></description><link>https://bdtechtalks.substack.com/p/inside-flashoptim-the-new-trick-that</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/inside-flashoptim-the-new-trick-that</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 03 Mar 2026 16:46:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6k14!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6k14!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6k14!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!6k14!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!6k14!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!6k14!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6k14!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:321640,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/189783564?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6k14!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!6k14!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!6k14!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!6k14!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82d065e6-dc59-4f4e-8e10-f8e177cab5d8_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Training large language models is an expensive endeavor, largely due to the massive accelerator memory required for each parameter during the training process. To reduce the costs, researchers at Databricks introduced <a href="https://arxiv.org/abs/2602.23349">FlashOptim</a>, a suite of memory-optimization techniques designed for common deep learning optimizers. FlashOptim acts as a drop-in replacement that slashes per-parameter memory consumption by more than 50 percent. It achieves this without sacrificing training throughput or model quality. According to the research team, this efficiency &#8220;enables practitioners and researchers with limited hardware to train larger models than previously feasible.&#8221;</p><h2>The memory bottleneck of LLM training</h2><p>Before exploring how FlashOptim works, it helps to understand why training a neural network demands so much hardware. During training, every model parameter brings a heavy baggage of additional variables that must be stored in the GPU&#8217;s memory. First, you have the parameters themselves, which are the actual neural network weights being learned. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Developers frequently rely on mixed-precision training to speed up calculations, executing forward and backward passes using 16-bit floating-point numbers. However, standard practice requires keeping a high-precision 32-bit master weight in memory to prevent errors when accumulating very small gradient updates. Second, the training system calculates a gradient for every single parameter during the backward pass of backpropagation. Gradients dictate the direction and magnitude of the required update, and they typically occupy another 4 bytes of memory per parameter since they are stored as 32-bit floats.</p><p>Third, modern optimizers like Adam or AdamW track historical statistics to smooth out the learning trajectory. Adam maintains two specific state variables for every parameter: momentum, which is a running average of past gradients, and variance, a running average of squared gradients. Since both states are usually maintained in 32-bit precision, the optimizer alone eats up 8 bytes of memory per parameter. Finally, the model calculates intermediate outputs, known as &#8220;activations,&#8221; during the forward pass. The system must temporarily hold these activations in memory because the backward pass requires them to compute the gradients. Unlike the weights, gradients, and optimizer states, which scale strictly with the size of the model, activation memory scales based on your batch size (the number of training examples you feed the model before updating the weights).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!78I8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dfcc509-28a0-4c21-9b78-839c3fedc770_1600x873.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!78I8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dfcc509-28a0-4c21-9b78-839c3fedc770_1600x873.png 424w, https://substackcdn.com/image/fetch/$s_!78I8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dfcc509-28a0-4c21-9b78-839c3fedc770_1600x873.png 848w, https://substackcdn.com/image/fetch/$s_!78I8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dfcc509-28a0-4c21-9b78-839c3fedc770_1600x873.png 1272w, https://substackcdn.com/image/fetch/$s_!78I8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dfcc509-28a0-4c21-9b78-839c3fedc770_1600x873.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!78I8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dfcc509-28a0-4c21-9b78-839c3fedc770_1600x873.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9dfcc509-28a0-4c21-9b78-839c3fedc770_1600x873.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2580548,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/189783564?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dfcc509-28a0-4c21-9b78-839c3fedc770_1600x873.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!78I8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dfcc509-28a0-4c21-9b78-839c3fedc770_1600x873.png 424w, https://substackcdn.com/image/fetch/$s_!78I8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dfcc509-28a0-4c21-9b78-839c3fedc770_1600x873.png 848w, https://substackcdn.com/image/fetch/$s_!78I8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dfcc509-28a0-4c21-9b78-839c3fedc770_1600x873.png 1272w, https://substackcdn.com/image/fetch/$s_!78I8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dfcc509-28a0-4c21-9b78-839c3fedc770_1600x873.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When you combine the parameters, gradients, and optimizer states, a standard training setup using Adam demands roughly 16 bytes of memory for every single parameter. This means if a developer wants to train a 7-billion parameter language model, they must provision at least 112 gigabytes of accelerator memory purely to hold the model and its optimization variables. That calculation does not even include the extra memory needed to process the data batches.</p><h2>Current approaches fall short</h2><p>The deep learning community has developed several workarounds to deal with these hardware constraints, but each comes with significant trade-offs. One common method is distributed training with tensor sharding. Frameworks PyTorch&#8217;s <a href="https://huggingface.co/docs/transformers/en/fsdp">Fully Sharded Data Parallel</a> partition the memory load across a cluster of multiple GPUs. While this is the standard operating procedure inside well-resourced tech organizations, it strictly requires access to a fleet of accelerators. For independent developers, researchers, or smaller teams working with a single GPU, this approach is physically impossible to implement.</p><p>Another alternative is CPU offloading. GPU memory is expensive and scarce, but host system memory is relatively cheap and abundant.  Offloading techniques temporarily move certain memory-hungry tensors out of the GPU and into the host machine&#8217;s RAM, pulling them back only when the accelerator needs them for a specific calculation. The downside is that moving gigabytes of data back and forth over a PCIe bus creates a massive communication bottleneck. This shuffling introduces added overhead and complexity that ultimately slows down the training loop.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KIY6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21c7b06-5fcb-40e6-9e4b-163c441d27a8_1600x873.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KIY6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21c7b06-5fcb-40e6-9e4b-163c441d27a8_1600x873.png 424w, https://substackcdn.com/image/fetch/$s_!KIY6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21c7b06-5fcb-40e6-9e4b-163c441d27a8_1600x873.png 848w, https://substackcdn.com/image/fetch/$s_!KIY6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21c7b06-5fcb-40e6-9e4b-163c441d27a8_1600x873.png 1272w, https://substackcdn.com/image/fetch/$s_!KIY6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21c7b06-5fcb-40e6-9e4b-163c441d27a8_1600x873.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KIY6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21c7b06-5fcb-40e6-9e4b-163c441d27a8_1600x873.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d21c7b06-5fcb-40e6-9e4b-163c441d27a8_1600x873.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2515770,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/189783564?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21c7b06-5fcb-40e6-9e4b-163c441d27a8_1600x873.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KIY6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21c7b06-5fcb-40e6-9e4b-163c441d27a8_1600x873.png 424w, https://substackcdn.com/image/fetch/$s_!KIY6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21c7b06-5fcb-40e6-9e4b-163c441d27a8_1600x873.png 848w, https://substackcdn.com/image/fetch/$s_!KIY6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21c7b06-5fcb-40e6-9e4b-163c441d27a8_1600x873.png 1272w, https://substackcdn.com/image/fetch/$s_!KIY6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd21c7b06-5fcb-40e6-9e4b-163c441d27a8_1600x873.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A third popular workaround involves parameter-efficient methods, such as <a href="https://bdtechtalks.com/2023/05/22/what-is-lora/">low-rank adaptation</a> (LoRA). Instead of updating every single parameter in a massive model, these techniques freeze the vast majority of the original weights. The system then only calculates gradients and optimizer states for a tiny subset of the original weights, or for a small set of new auxiliary weights injected into the architecture. The catch is that intentionally ignoring most of the network fundamentally alters the training dynamics. Parameter-efficient fine-tuning is an approximation that does not follow the exact same learning trajectory as full-parameter fine-tuning, which can limit performance on complex tasks.</p><h2>Redesigning memory efficiency with FlashOptim</h2><p>The Databricks researchers took a different route, building FlashOptim as a set of techniques to compress parameter-associated memory directly within common deep learning optimizers. FlashOptim achieves this through improved float splitting, companded optimizer state quantization, and fused optimized kernels.</p><p>Developers typically keep a 32-bit master weight alongside a downcasted 16-bit version used for the actual forward and backward passes. Keeping both in memory is highly redundant because the 16-bit weight stores little information that isn&#8217;t already in the master weight. Previous attempts to split these weights stored the 16-bit base weight along with a 16-bit error correction, but this method wasted valuable data bits trying to cover the massive range of standard floating-point numbers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wefs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4174c6a6-e53f-4d51-9e69-1597d228f667_1600x873.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wefs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4174c6a6-e53f-4d51-9e69-1597d228f667_1600x873.png 424w, https://substackcdn.com/image/fetch/$s_!Wefs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4174c6a6-e53f-4d51-9e69-1597d228f667_1600x873.png 848w, https://substackcdn.com/image/fetch/$s_!Wefs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4174c6a6-e53f-4d51-9e69-1597d228f667_1600x873.png 1272w, https://substackcdn.com/image/fetch/$s_!Wefs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4174c6a6-e53f-4d51-9e69-1597d228f667_1600x873.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wefs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4174c6a6-e53f-4d51-9e69-1597d228f667_1600x873.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4174c6a6-e53f-4d51-9e69-1597d228f667_1600x873.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2540806,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/189783564?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4174c6a6-e53f-4d51-9e69-1597d228f667_1600x873.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wefs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4174c6a6-e53f-4d51-9e69-1597d228f667_1600x873.png 424w, https://substackcdn.com/image/fetch/$s_!Wefs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4174c6a6-e53f-4d51-9e69-1597d228f667_1600x873.png 848w, https://substackcdn.com/image/fetch/$s_!Wefs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4174c6a6-e53f-4d51-9e69-1597d228f667_1600x873.png 1272w, https://substackcdn.com/image/fetch/$s_!Wefs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4174c6a6-e53f-4d51-9e69-1597d228f667_1600x873.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>The Databricks team made a clever observation: under round-to-nearest rules, the rounding error between a 32-bit master weight and its 16-bit downcast version must fall within a microscopic, predictable interval. Instead of storing a wide-ranging float, FlashOptim&#8217;s &#8220;improved float splitting&#8221; technique rescales this tiny error interval and maps it to the nearest 8-bit integer. By combining the 16-bit base weight with this 8-bit error correction, FlashOptim successfully reconstructs a 24-bit master weight. This innovation cuts the total weight memory requirement from 4 bytes down to 3 bytes per parameter, with virtually no loss in precision.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The second major breakthrough is &#8220;companded optimizer state quantization.&#8221; Traditional attempts to shrink the optimizer state simply group the numbers and squeeze them into 8-bit integers. This linear quantization implicitly assumes that optimizer values are distributed evenly across the spectrum. However, the measurements showed that optimizer state distributions severely violate this assumption.  Variance, for instance, accumulates squared gradients, producing heavily skewed, heavy-tailed distributions. Forcing these highly skewed numbers into evenly spaced 8-bit bins creates massive quantization errors. Before converting the numbers, FlashOptim applies a mathematical trick called a companding function, which compresses extreme values and reshapes the data distribution so it is more uniform. After this companding step, the values fit perfectly into 8-bit bins with significantly reduced error. This reduces the optimizer state from 8 bytes per parameter down to just 2 bytes, plus a tiny fraction of a byte required for group scaling factors.</p><p>FlashOptim packages these techniques into fused optimized kernels. Splitting weights, dequantizing states, performing math updates, and re-compressing everything requires moving a lot of data back and forth. Implementing this naively would create a massive memory bandwidth bottleneck. FlashOptim solves this by implementing the entire optimizer step as a single fused Triton kernel designed for Nvidia hardware. The GPU pulls the compressed data into its fast local memory, unpacks it, calculates the update, compresses the results, and writes it all out in one seamless operation. This allows FlashOptim to cut memory consumption without causing any practical slowdown during training.</p><h2>FlashOptim in action</h2><p>To prove the framework&#8217;s real-world viability, the researchers tested FlashOptim on several standard vision and language benchmarks. This included pretraining a <a href="https://bdtechtalks.com/2019/09/02/openai-gpt-2-machine-learning-fake-news/">GPT-2</a> architecture and running supervised fine-tuning on the massive Llama-3.1-8B model. Across the <a href="https://bdtechtalks.com/2023/07/31/what-is-gradient-descent/">stochastic gradient descent</a> (SGD), AdamW, and Lion optimizers, models trained with FlashOptim matched the loss trajectories, convergence rates, and final validation accuracies of their standard, memory-hungry counterparts.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C9QI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef14374f-9c14-4a29-ad82-0e5337303d4a_397x218.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C9QI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef14374f-9c14-4a29-ad82-0e5337303d4a_397x218.png 424w, https://substackcdn.com/image/fetch/$s_!C9QI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef14374f-9c14-4a29-ad82-0e5337303d4a_397x218.png 848w, https://substackcdn.com/image/fetch/$s_!C9QI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef14374f-9c14-4a29-ad82-0e5337303d4a_397x218.png 1272w, https://substackcdn.com/image/fetch/$s_!C9QI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef14374f-9c14-4a29-ad82-0e5337303d4a_397x218.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C9QI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef14374f-9c14-4a29-ad82-0e5337303d4a_397x218.png" width="397" height="218" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef14374f-9c14-4a29-ad82-0e5337303d4a_397x218.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:218,&quot;width&quot;:397,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C9QI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef14374f-9c14-4a29-ad82-0e5337303d4a_397x218.png 424w, https://substackcdn.com/image/fetch/$s_!C9QI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef14374f-9c14-4a29-ad82-0e5337303d4a_397x218.png 848w, https://substackcdn.com/image/fetch/$s_!C9QI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef14374f-9c14-4a29-ad82-0e5337303d4a_397x218.png 1272w, https://substackcdn.com/image/fetch/$s_!C9QI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef14374f-9c14-4a29-ad82-0e5337303d4a_397x218.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>During the Llama-3.1-8B fine-tuning test, peak GPU memory dropped from 175 gigabytes to 113 gigabytes, representing a 36 percent overall reduction. Looking closer at the breakdown, the optimizer memory shrank by 61 percent, and the parameter memory dropped by 50 percent. Because FlashOptim executes its mathematical operations inside highly efficient fused kernels, these compressions don&#8217;t slow down the training process. In fact, the optimizer step time during the Llama-3.1 test actually dropped slightly from 12.5 milliseconds to 11.5 milliseconds.</p><p>For developers, FlashOptim&#8217;s greatest value might be its simplicity. It provides drop-in replacements for common optimizers, meaning developers do not need to rewrite their training loops, alter optimization semantics, or invent new tuning strategies. The researchers plan to release FlashOptim as an open-source PyTorch library on GitHub.</p>]]></content:encoded></item><item><title><![CDATA[How sparse attention is solving AI's memory bottleneck]]></title><description><![CDATA[As AI agents take on longer tasks, the KV cache of LLMs has become a massive bottleneck. Discover how sparse attention techniques are freeing up GPU memory.]]></description><link>https://bdtechtalks.substack.com/p/how-sparse-attention-is-solving-ais</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/how-sparse-attention-is-solving-ais</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 24 Feb 2026 16:32:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!2rdQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2rdQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2rdQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2rdQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2rdQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2rdQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2rdQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:370913,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/189033314?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2rdQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2rdQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2rdQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2rdQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49141d0f-5d48-4dc5-88e2-dd7b20014c88_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LLMs are getting pulled into longer and messier workflows, handling large inputs and generating longer and longer token sequences. Coding assistants need to keep track of repositories, issue threads, terminal outputs, and earlier edits. Research agents need to carry facts across long documents and tool calls. Deep think systems may generate several parallel chains before settling on an answer.</p><p>All of that means more tokens in memory, and more time spent attending to past tokens. In practice, that pushes modern models into a bottleneck that is less about raw model size and more about how they store and read attention state. That state is the key-value cache, or KV cache. It is one of the reasons autoregressive generation is fast enough to be usable at all, but it is also one of the main reasons long-context inference gets expensive and slow.</p><p>Recent research shows that if you want better long reasoning and multi-agent workflows, you need to address the attention memory bottleneck and optimize attention while preserving accuracy. And this focus has led to some very interesting &#8220;sparse attention&#8221; techniques.</p><h2>Why attention becomes a memory problem</h2><p>In <a href="https://bdtechtalks.com/2022/05/02/what-is-the-transformer/">transformer models</a>, each new token is processed by attention layers that compare the current token&#8217;s query vector against key vectors from earlier tokens, and then combine the corresponding value vectors. In autoregressive generation, the model creates text one token at a time, so it would be wasteful to recompute keys and values for the entire prior sequence at every step. Instead, it stores them in the KV cache and reuses them.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This method of calculating and storing all attention values (aka &#8220;dense attention&#8221;) creates a quadratic compute and memory tax. Dense causal attention means each new token can attend to all prior tokens, so the work grows rapidly with sequence length.</p><p>But compute is only part of it. The KV cache grows linearly with the number of generated tokens, it sits in GPU VRAM, and reading it from high-bandwidth memory can dominate runtime during generation. At larger sequence lengths and batch sizes, the latency contribution from KV cache reads becomes the dominant term. That is why long prompts and long outputs slow systems down even when the model architecture itself has not changed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x_4W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff81b2899-6601-4aa1-bc86-f1073b146ffd_1600x873.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x_4W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff81b2899-6601-4aa1-bc86-f1073b146ffd_1600x873.png 424w, https://substackcdn.com/image/fetch/$s_!x_4W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff81b2899-6601-4aa1-bc86-f1073b146ffd_1600x873.png 848w, https://substackcdn.com/image/fetch/$s_!x_4W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff81b2899-6601-4aa1-bc86-f1073b146ffd_1600x873.png 1272w, https://substackcdn.com/image/fetch/$s_!x_4W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff81b2899-6601-4aa1-bc86-f1073b146ffd_1600x873.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x_4W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff81b2899-6601-4aa1-bc86-f1073b146ffd_1600x873.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f81b2899-6601-4aa1-bc86-f1073b146ffd_1600x873.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!x_4W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff81b2899-6601-4aa1-bc86-f1073b146ffd_1600x873.png 424w, https://substackcdn.com/image/fetch/$s_!x_4W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff81b2899-6601-4aa1-bc86-f1073b146ffd_1600x873.png 848w, https://substackcdn.com/image/fetch/$s_!x_4W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff81b2899-6601-4aa1-bc86-f1073b146ffd_1600x873.png 1272w, https://substackcdn.com/image/fetch/$s_!x_4W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff81b2899-6601-4aa1-bc86-f1073b146ffd_1600x873.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It is also worth noting the difference between &#8220;prefill&#8221; and &#8220;decode,&#8221; which matters a lot for application behavior. Prefill is the first pass over the prompt, where the model processes the input tokens and fills the KV cache. Decode is the token-by-token generation loop that follows, where each new token reads from that cache. Many optimizations help decode more than prefill. The challenge of processing all the input tokens (documents, in-context learning examples, etc.) and storing attention values remains unsolved.</p><h2>The basic idea behind sparse attention</h2><p>Sparse attention starts from a simple observation: most tokens in a long sequence are not equally useful for every next-token decision. If a model can identify which past tokens matter for the current query token, it can avoid reading and scoring the full history. That cuts memory movement and compute.</p><p>There are different ways to make attention sparse. Some methods permanently remove (or &#8220;evict&#8221;) tokens from the cache. Some keep the cache but retrieve only a subset of pages or blocks. Others change the architecture so fewer layers rely on full self-attention. The common goal is the same: reduce the amount of attention values that the model must carry and touch at inference time.</p><p>A simple method is &#8220;sliding-window attention,&#8221; which keeps a fixed number of tokens in the KV cache. As the model&#8217;s context grows, the sliding window evicts older tokens to free space for new ones.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kq8s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47ce7af-c96a-4242-a511-518c953fccf7_1600x873.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kq8s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47ce7af-c96a-4242-a511-518c953fccf7_1600x873.png 424w, https://substackcdn.com/image/fetch/$s_!kq8s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47ce7af-c96a-4242-a511-518c953fccf7_1600x873.png 848w, https://substackcdn.com/image/fetch/$s_!kq8s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47ce7af-c96a-4242-a511-518c953fccf7_1600x873.png 1272w, https://substackcdn.com/image/fetch/$s_!kq8s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47ce7af-c96a-4242-a511-518c953fccf7_1600x873.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kq8s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47ce7af-c96a-4242-a511-518c953fccf7_1600x873.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b47ce7af-c96a-4242-a511-518c953fccf7_1600x873.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kq8s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47ce7af-c96a-4242-a511-518c953fccf7_1600x873.png 424w, https://substackcdn.com/image/fetch/$s_!kq8s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47ce7af-c96a-4242-a511-518c953fccf7_1600x873.png 848w, https://substackcdn.com/image/fetch/$s_!kq8s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47ce7af-c96a-4242-a511-518c953fccf7_1600x873.png 1272w, https://substackcdn.com/image/fetch/$s_!kq8s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb47ce7af-c96a-4242-a511-518c953fccf7_1600x873.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Other early techniques lean on heuristics to evict tokens. For example, <a href="https://github.com/schwartz-lab-NLP/TOVA">Token Omission via Attention</a> (TOVA) drops tokens based on attention values. TOVA looks at the current step and checks which tokens attention heads are focusing on. It keeps the top-ranked tokens and deletes the rest. TOVA assumes that if a piece of information isn&#8217;t relevant to the current thought process, it won&#8217;t be needed later. <a href="https://github.com/FMInference/H2O">Heavy-Hitter Oracle</a> (H2O), another heuristics-based method, tracks which tokens have been paid attention to frequently over time. It assumes that important concepts (like a character&#8217;s name in a story or a variable x in code) will be looked at repeatedly. It keeps these important tokens, which it calls &#8220;Heavy Hitters,&#8221; and a sliding window of the most recent attention values, evicting everything else.</p><p>These methods can be fast and easy to deploy, but they tend to hit a wall when compression becomes aggressive. The key issue is information loss. If a system drops a token too early, later steps cannot recover it. That hurts exact retrieval, long-range dependencies, and multi-step reasoning. Also problematic is that most LLMs use <a href="https://nvidia.github.io/TensorRT-LLM/advanced/gpt-attention.html">Grouped Query Attention</a> (GQA), where multiple query heads share a single block of memory to save space. Standard eviction methods often delete a piece of memory because one head isn&#8217;t using it, accidentally starving other heads that did need it.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Another method, Query-Aware Sparsity (<a href="https://github.com/mit-han-lab/Quest">Quest</a>), keeps the cache but tries to retrieve only the most relevant blocks from memory. It groups tokens into fixed-size pages (e.g., blocks of 16 tokens). During the generation step, it uses a heuristic to estimate which specific pages contain relevant information for the current token being generated. It then copies only those pages from the GPU&#8217;s memory (HBM) to faster on-chip memory (SRAM). This makes it possible to better manage the scarce and expensive high-speed memory. Because the attention bottleneck is usually caused by moving data from memory to the chip, Quest speeds up generation significantly by moving less data. However, it does not solve &#8220;Out of Memory&#8221; (OOM) errors. In fact, Quest creates a slight memory overhead because it has to store additional metadata to manage the pages.</p><h2>DeepSeek Sparse Attention (DSA)</h2><p><a href="https://bdtechtalks.com/2025/12/05/deepseek-v3-2-efficiency/">DeepSeek Sparse Attention</a> (DSA) is one of the more important recent ideas in this space because it tries to keep the efficiency gains of sparsity without the usual quality collapse. First used in DeepSeek-V3.2, the mechanism has two parts: a &#8220;lightning indexer&#8221; that scores relevance between the current query token and earlier tokens, and a token-selection stage that keeps a fixed number of top-scoring KV entries for full attention computation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cbdP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06994796-7e16-4edb-bc82-e205500bc622_696x362.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cbdP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06994796-7e16-4edb-bc82-e205500bc622_696x362.png 424w, https://substackcdn.com/image/fetch/$s_!cbdP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06994796-7e16-4edb-bc82-e205500bc622_696x362.png 848w, https://substackcdn.com/image/fetch/$s_!cbdP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06994796-7e16-4edb-bc82-e205500bc622_696x362.png 1272w, https://substackcdn.com/image/fetch/$s_!cbdP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06994796-7e16-4edb-bc82-e205500bc622_696x362.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cbdP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06994796-7e16-4edb-bc82-e205500bc622_696x362.png" width="696" height="362" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06994796-7e16-4edb-bc82-e205500bc622_696x362.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:362,&quot;width&quot;:696,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cbdP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06994796-7e16-4edb-bc82-e205500bc622_696x362.png 424w, https://substackcdn.com/image/fetch/$s_!cbdP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06994796-7e16-4edb-bc82-e205500bc622_696x362.png 848w, https://substackcdn.com/image/fetch/$s_!cbdP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06994796-7e16-4edb-bc82-e205500bc622_696x362.png 1272w, https://substackcdn.com/image/fetch/$s_!cbdP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06994796-7e16-4edb-bc82-e205500bc622_696x362.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Unlike heuristics-based methods, DSA requires training, which means it cannot be plugged into any model out-of-the-box. The indexer is trained first, then adapts the model to the sparse pattern. The indexer warm-up runs for 1,000 steps with dense attention and frozen base weights, then the model switches to a sparse training stage where the model and indexer are trained together. During sparse training, DeepSeek selects 2,048 KV tokens per query token.</p>
      <p>
          <a href="https://bdtechtalks.substack.com/p/how-sparse-attention-is-solving-ais">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[New jailbreak attack dupes image generation models]]></title><description><![CDATA[Semantic Chaining exploits the fragmented safety architecture of multimodal models, bypassing filters by hiding prohibited intent within a sequence of benign edits.]]></description><link>https://bdtechtalks.substack.com/p/new-jailbreak-attack-dupes-image</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/new-jailbreak-attack-dupes-image</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 17 Feb 2026 15:47:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ewFL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ewFL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ewFL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ewFL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ewFL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ewFL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ewFL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261530,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/188274464?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ewFL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ewFL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ewFL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ewFL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169d07be-a691-47b6-8360-c818c9f57128_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>NeuralTrust researchers have identified a critical vulnerability in the safety architecture of leading multimodal models, including Grok 4, Gemini Nano Banana Pro, and Seedance 4.5. The technique, named &#8220;<a href="https://neuraltrust.ai/blog/semantic-chaining">Semantic Chaining</a>,&#8221; allows users to bypass core safety filters and generate prohibited content by exploiting the models&#8217; ability to perform complex, multi-stage image modifications. This discovery demonstrates a functional flaw in how multimodal intent is governed, proving that even advanced models can be guided to produce policy-violating outputs by bypassing &#8220;black box&#8221; safety layers.</p><h2>Weaponizing the workflow</h2><p>Semantic Chaining differs from traditional jailbreaks that rely on a single, overtly harmful prompt. Instead, the attacker introduces a chain of semantically &#8220;safe&#8221; instructions that converge on a forbidden result. The attack works by weaponizing the model&#8217;s own inferential reasoning and compositional abilities against its safety guardrails. Current safety filters typically scan for &#8220;bad words&#8221; or specific concepts in isolated prompts, lacking the reasoning depth to track &#8220;latent intent&#8221; (the underlying, unstated goal of the user) across a multi-step instruction chain.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The exploit follows a specific four-step pattern to circumvent safety protocols. First, the user establishes a &#8220;safe base&#8221; by asking the model to imagine a generic, non-problematic scene, such as a historical setting. This creates a neutral initial context and habituates the model to the task. The second step involves a &#8220;first substitution,&#8221; where the user instructs the model to change one element of the original scene. This permitted alteration habituates the model to working through subsequent modifications and shifts its focus from creation to modification.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BRsv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc786e37b-aef0-4455-92c7-6476da2a8b2b_1600x873.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BRsv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc786e37b-aef0-4455-92c7-6476da2a8b2b_1600x873.png 424w, https://substackcdn.com/image/fetch/$s_!BRsv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc786e37b-aef0-4455-92c7-6476da2a8b2b_1600x873.png 848w, https://substackcdn.com/image/fetch/$s_!BRsv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc786e37b-aef0-4455-92c7-6476da2a8b2b_1600x873.png 1272w, https://substackcdn.com/image/fetch/$s_!BRsv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc786e37b-aef0-4455-92c7-6476da2a8b2b_1600x873.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BRsv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc786e37b-aef0-4455-92c7-6476da2a8b2b_1600x873.png" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c786e37b-aef0-4455-92c7-6476da2a8b2b_1600x873.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BRsv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc786e37b-aef0-4455-92c7-6476da2a8b2b_1600x873.png 424w, https://substackcdn.com/image/fetch/$s_!BRsv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc786e37b-aef0-4455-92c7-6476da2a8b2b_1600x873.png 848w, https://substackcdn.com/image/fetch/$s_!BRsv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc786e37b-aef0-4455-92c7-6476da2a8b2b_1600x873.png 1272w, https://substackcdn.com/image/fetch/$s_!BRsv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc786e37b-aef0-4455-92c7-6476da2a8b2b_1600x873.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Semantic Chaining attack</figcaption></figure></div><p>Once the model is in this modification mode, the attacker executes the &#8220;critical pivot.&#8221; The user commands the model to replace another key element with a highly sensitive or controversial topic. Because the model focuses on modifying an existing image rather than creating a new one, the safety filters fail to recognize the emerging prohibited context. Finally, the attacker concludes by telling the model to &#8220;answer only with the image.&#8221; The result is a fully rendered, prohibited image that successfully bypasses moderation layers in models like <a href="https://bdtechtalks.com/2025/09/22/xai-grok-4-fast/">Grok 4</a> and <a href="https://bdtechtalks.com/2025/11/20/googles-nano-banana-pro/">Gemini Nano Banana Pro</a>.</p><p>A significant aspect of this vulnerability is its ability to bypass text-based safety filters by rendering prohibited information directly into the generated image. Models that would typically refuse to provide text instructions on sensitive topics in a standard chat response can be forced to write these exact instructions onto a generated image. For example, a user might ask the model to create an image of a &#8220;technical diagram&#8221; or &#8220;educational poster&#8221; within a neutral scene.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FNzC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F268f034c-429e-42c2-879f-2daef648f014_1600x841.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FNzC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F268f034c-429e-42c2-879f-2daef648f014_1600x841.png 424w, https://substackcdn.com/image/fetch/$s_!FNzC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F268f034c-429e-42c2-879f-2daef648f014_1600x841.png 848w, https://substackcdn.com/image/fetch/$s_!FNzC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F268f034c-429e-42c2-879f-2daef648f014_1600x841.png 1272w, https://substackcdn.com/image/fetch/$s_!FNzC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F268f034c-429e-42c2-879f-2daef648f014_1600x841.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FNzC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F268f034c-429e-42c2-879f-2daef648f014_1600x841.png" width="1456" height="765" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/268f034c-429e-42c2-879f-2daef648f014_1600x841.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:765,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FNzC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F268f034c-429e-42c2-879f-2daef648f014_1600x841.png 424w, https://substackcdn.com/image/fetch/$s_!FNzC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F268f034c-429e-42c2-879f-2daef648f014_1600x841.png 848w, https://substackcdn.com/image/fetch/$s_!FNzC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F268f034c-429e-42c2-879f-2daef648f014_1600x841.png 1272w, https://substackcdn.com/image/fetch/$s_!FNzC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F268f034c-429e-42c2-879f-2daef648f014_1600x841.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example of semantic chaining attack (source: NeuralTrust)</figcaption></figure></div><p>The user then instructs the model to replace the generic text on that poster with specific, prohibited instructions. The safety filters, which scan the chat output for prohibited text, remain blind to the &#8220;bad words&#8221; being drawn pixel-by-pixel into the image. This effectively turns the image generation engine into a bypass for the model&#8217;s entire text-safety alignment. Even when the text is large and readable to humans, standard Optical Character Recognition (OCR) safety filters often fail to catch it due to font choices, stylization, perspective, or rendering artifacts.</p><h2>Why the safety layer fails</h2><p>To understand why this attack works, it is necessary to distinguish between the model&#8217;s &#8220;attention&#8221; and its safety overlay. Attention mechanisms in <a href="https://bdtechtalks.com/2022/05/02/what-is-the-transformer/">transformer models</a> allow the AI to focus on different parts of the input sequence to understand context and relationships. One might assume the model &#8220;forgets&#8221; the safety constraints or loses track of the context during the modification steps. However, Alessandro Pignati, an AI researcher at NeuralTrust, clarified this distinction in comments provided to TechTalks.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>&#8220;From a technical perspective, this is not a failure of the attention mechanism, nor a case where the model &#8216;loses&#8217; the original context,&#8221; Pignati said. &#8220;The model correctly attends to all parts of the prompt, but safety evaluation is applied in a fragmented way. Each instruction is interpreted as a legitimate transformation of an otherwise allowed concept, and the system never reassesses the global intent that emerges from chaining those transformations together.&#8221; The model does not forget safety constraints; rather, the safety layer fails to reason over the cumulative semantic effect of the prompt.</p><p>This vulnerability is not limited to native chat interfaces. Pignati noted that the same vulnerability applies to API-based usage, even when the entire attack is contained within a single prompt that embeds multiple semantic operations. Current APIs typically evaluate safety at the request or output level and do not expose indicators or flags signaling that a prompt contains progressive semantic escalation. Consequently, the request appears compliant to the application, making this class of attack difficult for developers to detect.</p><h2>Hardening the architecture</h2><p>The discovery of Semantic Chaining suggests that traditional safety filters are insufficient against intent-based attacks. NeuralTrust argues that enterprises need a defense that can track and govern the entire instruction chain in real-time. They propose the use of &#8220;Shadow AI,&#8221; a browser plugin that acts as a proactive governance layer. By monitoring the search bar and input fields, the plugin intercepts the intent at the source before the query reaches the model.</p><p>Beyond browser-side interventions, model providers face the challenge of architecting safety layers that can understand intent across multiple turns. Pignati suggests that simply extending the safety context window is unlikely to solve the problem because the core issue is that safety mechanisms operate locally while the attack exploits global semantic structure. &#8220;Model providers need safety layers that evaluate intent across the entire prompt, even when it is presented as a sequence of transformations,&#8221; Pignati said. &#8220;This implies shifting from turn-based or instruction-based filtering to intent-aware analysis that can recognize when benign operations converge toward a disallowed outcome, regardless of how they are phrased.&#8221;</p><p>This shift is increasingly critical as the industry moves toward agentic systems that can plan and execute complex workflows. While one might expect that agents with better long-term memory would be better at detecting latent intent, Pignati warns that the opposite may be true. &#8220;While improved memory and planning could theoretically help detect latent intent, increased complexity also introduces more opportunities for semantic fragmentation,&#8221; he said. &#8220;Without treating intent aggregation as a first-class safety primitive, more capable agentic workflows are likely to amplify, rather than reduce, this class of blind spots.&#8221;</p>]]></content:encoded></item><item><title><![CDATA[RePo provides an innovative solution to long-context tasks in LLMs]]></title><description><![CDATA[RePo, Sakana AI&#8217;s new technique, solves the "needle in a haystack" problem by allowing LLMs to organize their own memory.]]></description><link>https://bdtechtalks.substack.com/p/repo-provides-an-innovative-solution</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/repo-provides-an-innovative-solution</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 03 Feb 2026 18:00:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IIuo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IIuo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IIuo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IIuo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IIuo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IIuo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IIuo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:271189,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/186766559?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IIuo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IIuo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IIuo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IIuo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ca764db-4fc1-4dc7-a90c-d2bec66e51db_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A new technique developed by researchers at Sakana AI, called <a href="https://arxiv.org/abs/2512.14391">Context Re-Positioning</a> (RePo), allows Large language models (LLMs) to dynamically re-organize their internal view of their input data to better handle long-context tasks.</p><p>LLMs process information in a strictly linear fashion, reading input from left to right regardless of the content&#8217;s actual structure. While this mimics how humans read a novel, it fails to capture the complexity of real-world data such as coding repositories, databases, or scattered documents in a retrieval-augmented generation (RAG) pipeline.</p><p>The core friction in current architectures lies in how they handle position. To handle their input sequences, LLMs must assign a position value to each token. Standard methods like <a href="https://blog.eleuther.ai/rotary-embeddings/">Rotary Positional Embedding</a> (RoPE) assign a fixed integer index to every token in the sequence. These methods rely on a &#8220;locality bias,&#8221; assuming that information physically closer together in the input stream is more semantically relevant. While this holds true for simple chat interfaces, it breaks down in complex tasks where the answer to a user&#8217;s query might be buried thousands of tokens away in a retrieved document.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This rigidity creates what the researchers describe as &#8220;extraneous cognitive load.&#8221; Drawing on <a href="https://en.wikipedia.org/wiki/Cognitive_load">cognitive load theory</a>, the paper argues that forcing a model to track the arbitrary physical distance between related concepts wastes finite &#8220;working memory&#8221; (attention capacity). Instead of focusing on reasoning, the model spends resources managing the presentation of the data. RePo aims to eliminate this waste by allowing the model to virtually &#8220;move&#8221; tokens closer together in the mathematical space used for attention, without changing their actual order in the input text.</p><h2>How RePo works</h2><p>RePo introduces a lightweight, differentiable neural network module that sits before the model&#8217;s attention mechanism. Instead of using a pre-defined integer sequence (0, 1, 2&#8230;), this module predicts a scalar position value based on the content of the token itself. By analyzing the &#8220;what&#8221; (the token&#8217;s hidden state) the module determines the &#8220;where.&#8221; This allows the model to project tokens into a continuous mathematical space where semantically related items are clustered together, effectively shortening the distance the attention mechanism must traverse.</p><p>The training process for RePo is straightforward because the module is fully differentiable. It is optimized jointly with the rest of the model using standard backpropagation. Because the assigned positions are continuous real values rather than fixed integers, the model can adjust them granularly to minimize the standard next-token prediction loss. This means the model isn&#8217;t explicitly taught how to organize data; instead, it discovers the optimal organization strategies on its own (e.g., as clustering related concepts) simply as a way to predict the next token more accurately during training on general data.</p><p>In comments provided to TechTalks, paper co-author Huayang Li explained that this mechanism allows the model to combat the limitations of recency bias. &#8220;RePo has the potential to alleviate [recency bias] by dynamically assigning closer positions to relevant information,&#8221; Li said. This shift moves the architecture from a passive reader that accepts the input order as truth, to an active organizer that restructures data to minimize the loss in next-token prediction.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4tbJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F589f1255-47a2-41bf-98dc-dd77376f8d6e_1600x522.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4tbJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F589f1255-47a2-41bf-98dc-dd77376f8d6e_1600x522.png 424w, https://substackcdn.com/image/fetch/$s_!4tbJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F589f1255-47a2-41bf-98dc-dd77376f8d6e_1600x522.png 848w, https://substackcdn.com/image/fetch/$s_!4tbJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F589f1255-47a2-41bf-98dc-dd77376f8d6e_1600x522.png 1272w, https://substackcdn.com/image/fetch/$s_!4tbJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F589f1255-47a2-41bf-98dc-dd77376f8d6e_1600x522.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4tbJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F589f1255-47a2-41bf-98dc-dd77376f8d6e_1600x522.png" width="1456" height="475" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/589f1255-47a2-41bf-98dc-dd77376f8d6e_1600x522.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:475,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4tbJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F589f1255-47a2-41bf-98dc-dd77376f8d6e_1600x522.png 424w, https://substackcdn.com/image/fetch/$s_!4tbJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F589f1255-47a2-41bf-98dc-dd77376f8d6e_1600x522.png 848w, https://substackcdn.com/image/fetch/$s_!4tbJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F589f1255-47a2-41bf-98dc-dd77376f8d6e_1600x522.png 1272w, https://substackcdn.com/image/fetch/$s_!4tbJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F589f1255-47a2-41bf-98dc-dd77376f8d6e_1600x522.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">How RePo organizes attention values (source: arXiv)</figcaption></figure></div><p>The flexibility of RePo leads to surprising emergent behaviors where the model invents its own organizational schemes. In one visualization from the paper, the model physically clustered the &#8220;prompt,&#8221; &#8220;question,&#8221; and &#8220;answer&#8221; tokens together in the positional space, even though they were separated by long passages of text. The researchers also observed a &#8220;mirror effect,&#8221; where the model assigned negative position values, effectively learning to count backwards or rotate in a reverse direction to better capture relationships in the data.</p><p>These patterns suggest that when given the freedom, models prefer non-linear structures. Li noted that this adaptability is particularly visible when processing structured data. &#8220;When the input is a table, the model can adaptively learn positional assignments that segment table rows,&#8221; he said. The model recognizes that a cell in Row 1 is related to the corresponding cell in Row 2 and assigns them similar positional values, reconstructing the table&#8217;s structure that is usually lost when flattened into text.</p><h2>RePo in action</h2><p>To validate the technique, the researchers applied RePo to an OLMo-2 1B backbone and evaluated it against standard baselines. The most significant gains appeared in tasks involving &#8220;noisy&#8221; context, where the model must identify relevant facts hidden among irrelevant information. On the RULER benchmark, which tests long-context understanding, RePo outperformed standard RoPE models by over 11 points on variable tracking tasks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JSrp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb7f81ae-fd07-4d67-8375-921c06b5fc61_1438x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JSrp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb7f81ae-fd07-4d67-8375-921c06b5fc61_1438x1054.png 424w, https://substackcdn.com/image/fetch/$s_!JSrp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb7f81ae-fd07-4d67-8375-921c06b5fc61_1438x1054.png 848w, https://substackcdn.com/image/fetch/$s_!JSrp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb7f81ae-fd07-4d67-8375-921c06b5fc61_1438x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!JSrp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb7f81ae-fd07-4d67-8375-921c06b5fc61_1438x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JSrp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb7f81ae-fd07-4d67-8375-921c06b5fc61_1438x1054.png" width="1438" height="1054" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb7f81ae-fd07-4d67-8375-921c06b5fc61_1438x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1054,&quot;width&quot;:1438,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JSrp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb7f81ae-fd07-4d67-8375-921c06b5fc61_1438x1054.png 424w, https://substackcdn.com/image/fetch/$s_!JSrp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb7f81ae-fd07-4d67-8375-921c06b5fc61_1438x1054.png 848w, https://substackcdn.com/image/fetch/$s_!JSrp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb7f81ae-fd07-4d67-8375-921c06b5fc61_1438x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!JSrp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb7f81ae-fd07-4d67-8375-921c06b5fc61_1438x1054.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">RePo performance on different tasks (source: arXiv)</figcaption></figure></div><p>This improvement is not merely a result of the model learning to ignore noise. According to Li, RePo actively highlights the signal. &#8220;RePo is able to assign more attention to the most critical &#8216;needle&#8217; tokens in a noisy context compared with standard attention mechanisms,&#8221; Li said. By assigning these critical tokens distinct positions, the model prevents them from being drowned out by the surrounding text, effectively solving the &#8220;needle-in-a-haystack&#8221; problem at the architectural level.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The model also demonstrated strong extrapolation capabilities. Although trained on a context length of 4,000 tokens, RePo maintained high accuracy when tested on contexts up to 16,000 tokens. In Question Answering (QA) tasks over these extended lengths, the method outperformed RoPE and other baselines by more than 13 points. This suggests that the learned positional strategies generalize better to unseen lengths than fixed arithmetic progressions.</p><h2>Engineering implications</h2><p>For developers looking to implement RePo, the technique offers a path to upgrade existing models without starting from scratch. The module adds only 0.9% to the total parameter count and is compatible with standard inference optimizations like FlashAttention and vLLM. This means engineering teams can integrate RePo into high-throughput production pipelines without sacrificing speed.</p><p>However, the method is not without challenges. Li warned that simple supervised fine-tuning (SFT) is insufficient to instill these positional strategies. Implementing RePo requires a specific training regimen known as &#8220;Continual Pre-Training&#8221; (CPT). Unlike simple fine-tuning which might use a small targeted dataset, CPT involves resuming the training of a pre-existing checkpoint (like OLMo-2) on a massive amount of general data. &#8220;Based on our recent experiments with 1B and 7B models, CPT on more than 50B tokens yields significantly better results,&#8221; Li said.</p><p>As AI systems evolve toward agentic workflows that manage massive, long-term contexts, techniques like RePo may become essential components of the stack and help organize memory.</p><p>&#8220;I view RePo as orthogonal to agentic systems,&#8221; Li said. &#8220;However, a &#8216;memory sorting&#8217; mechanism could serve as additional information that benefits long-context management in such systems.&#8221;</p>]]></content:encoded></item><item><title><![CDATA[How MIT’s new framework solve LLM's memory barrier and 'context rot' problem]]></title><description><![CDATA[Brute-forcing larger context windows is hitting a mathematical wall. Recursive language model (RLM) solves "context rot" to process 10 million tokens and beyond.]]></description><link>https://bdtechtalks.substack.com/p/how-mits-new-framework-solve-llms</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/how-mits-new-framework-solve-llms</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Tue, 27 Jan 2026 16:08:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!keMM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!keMM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!keMM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!keMM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!keMM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!keMM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!keMM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:219750,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/185974946?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!keMM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!keMM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!keMM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!keMM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F883c3382-a8c5-4796-acc2-3f9a5c6ddd60_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Recursive Language Models (RLMs), a new framework developed by researchers at MIT CSAIL, provide a solution to the limited context window of large language models (LLMs). <a href="https://arxiv.org/abs/2512.24601">This approach</a> enables models to process arbitrarily long prompts without incurring massive memory costs or requiring the models to undergo special training to extend their context windows.</p><p>RLMs treat long prompts as part of an external environment, allowing the LLM to programmatically examine, decompose, and extract snippets of the prompt. The system is designed to be compatible with existing models, serving as a drop-in replacement for standard inference frameworks. Experiments show that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs.</p><h2>How recursive language models work</h2><p>The concept behind RLMs draws inspiration from the way computers switch data from active and permanent storage. A computer&#8217;s RAM is limited and can only process a certain amount of data at any given time. To work around this, computers store large-scale data on a hard drive and only fetch small &#8220;chunks&#8221; into the fast main memory as needed.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>RLMs apply this logic to the limited context window of LLMs by treating the text data as part of an environment that the model can interact with. Instead of forcing the entire document into the neural network&#8217;s context window, the RLM keeps the text outside the model and selectively retrieves only the necessary pieces when required.</p><p>Rather than loading the full prompt and documents directly into the model, the RLM loads them into a Read-Eval-Print Loop (REPL) environment powered by Python. The LLM receives &#8220;general context&#8221; about the data, such as the total length of the string, but does not see the text itself initially.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ei0U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c2b8268-ef86-4f31-bac6-a9933215b4b0_1600x1219.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ei0U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c2b8268-ef86-4f31-bac6-a9933215b4b0_1600x1219.png 424w, https://substackcdn.com/image/fetch/$s_!Ei0U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c2b8268-ef86-4f31-bac6-a9933215b4b0_1600x1219.png 848w, https://substackcdn.com/image/fetch/$s_!Ei0U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c2b8268-ef86-4f31-bac6-a9933215b4b0_1600x1219.png 1272w, https://substackcdn.com/image/fetch/$s_!Ei0U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c2b8268-ef86-4f31-bac6-a9933215b4b0_1600x1219.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ei0U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c2b8268-ef86-4f31-bac6-a9933215b4b0_1600x1219.png" width="1456" height="1109" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c2b8268-ef86-4f31-bac6-a9933215b4b0_1600x1219.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1109,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ei0U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c2b8268-ef86-4f31-bac6-a9933215b4b0_1600x1219.png 424w, https://substackcdn.com/image/fetch/$s_!Ei0U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c2b8268-ef86-4f31-bac6-a9933215b4b0_1600x1219.png 848w, https://substackcdn.com/image/fetch/$s_!Ei0U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c2b8268-ef86-4f31-bac6-a9933215b4b0_1600x1219.png 1272w, https://substackcdn.com/image/fetch/$s_!Ei0U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c2b8268-ef86-4f31-bac6-a9933215b4b0_1600x1219.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The LLM interacts with the document through coding commands. For example, it might check the first 500 characters to understand the format or use regular expressions to search for specific keywords like &#8220;festival&#8221; or &#8220;Chapter 1.&#8221; When the LLM finds a relevant snippet via code, it pulls that specific data into its active context window to analyze it and decide on the next steps.</p><p>This reliance on code generation means RLMs effectively require &#8220;reasoning&#8221; or &#8220;coding&#8221; grade models (e.g., <a href="https://bdtechtalks.com/2025/08/09/openai-gpt-5/">GPT-5</a>, Claude 3.5 Sonnet, or Qwen-Coder) to work reliably. Standard open-source models (like Llama 3 8B) would likely struggle to navigate the Python environment without specific distillation or fine-tuning.</p><p>The framework is called &#8220;recursive&#8221; because it enables the model to write code that can call itself to process specific chunks of the data. If the prompt is a long book, the LLM might write code to split the book into chapters. Then, inside a loop, it can recursively call a query on each chapter individually to summarize it.</p><p>Although the RLM is designed to be interacted with like any other LLM, it is composed of multiple models under the hood to maximize efficiency. The architecture typically includes a &#8220;root language model&#8221; that is powered by a strong LLM such as GPT-5 or <a href="https://bdtechtalks.com/2025/11/18/google-gemini-3-0-pro/">Gemini 3</a>. This root model acts as the orchestrator, interacts with the user, plans the solution, and sends commands to the REPL environment. Then there is the &#8220;recursive LM,&#8221; which is usually a smaller, faster model (e.g., GPT-5-mini) acting as the worker. The recursive LM is called by the root LM&#8217;s code to process specific &#8220;chunks&#8221; or snippets of the text. For example, it can summarize the chunks of text retrieved from the full prompt.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!358f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3b8b206-4e7d-4070-bf19-ed7984dc276f_1600x651.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!358f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3b8b206-4e7d-4070-bf19-ed7984dc276f_1600x651.png 424w, https://substackcdn.com/image/fetch/$s_!358f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3b8b206-4e7d-4070-bf19-ed7984dc276f_1600x651.png 848w, https://substackcdn.com/image/fetch/$s_!358f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3b8b206-4e7d-4070-bf19-ed7984dc276f_1600x651.png 1272w, https://substackcdn.com/image/fetch/$s_!358f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3b8b206-4e7d-4070-bf19-ed7984dc276f_1600x651.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!358f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3b8b206-4e7d-4070-bf19-ed7984dc276f_1600x651.png" width="1456" height="592" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c3b8b206-4e7d-4070-bf19-ed7984dc276f_1600x651.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:592,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!358f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3b8b206-4e7d-4070-bf19-ed7984dc276f_1600x651.png 424w, https://substackcdn.com/image/fetch/$s_!358f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3b8b206-4e7d-4070-bf19-ed7984dc276f_1600x651.png 848w, https://substackcdn.com/image/fetch/$s_!358f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3b8b206-4e7d-4070-bf19-ed7984dc276f_1600x651.png 1272w, https://substackcdn.com/image/fetch/$s_!358f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3b8b206-4e7d-4070-bf19-ed7984dc276f_1600x651.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Because the prompt is stored in the environment&#8217;s memory rather than the model&#8217;s active context window, the model can handle inputs orders of magnitude larger than its training limit, such as processing 10 million tokens on a model technically limited to 272k. Notably, the RLM looks and behaves exactly like a standard LLM to the user. It accepts a string prompt and returns a string answer, allowing users to swap a standard model for an RLM without changing their code or workflow.</p><p>This approach represents a natural evolution from prompt engineering to optimizing how models manage their own memory. The code for RLM is currently available on <a href="https://github.com/alexzhang13/rlm">GitHub</a>. researchers plan to integrate RLMs directly into DSPy, a popular framework for programming language models.</p><h2>Why it matters</h2>
      <p>
          <a href="https://bdtechtalks.substack.com/p/how-mits-new-framework-solve-llms">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Inside Rho-Alpha, Microsoft’s new robotics model]]></title><description><![CDATA[The new architecture upgrades Vision-Language-Action models with tactile data to bridge the gap between semantic reasoning and low-level motor control.]]></description><link>https://bdtechtalks.substack.com/p/inside-rho-alpha-microsofts-new-robotics</link><guid isPermaLink="false">https://bdtechtalks.substack.com/p/inside-rho-alpha-microsofts-new-robotics</guid><dc:creator><![CDATA[Ben Dickson]]></dc:creator><pubDate>Sat, 24 Jan 2026 17:24:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nIVn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nIVn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nIVn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nIVn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nIVn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nIVn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nIVn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg" width="1440" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:231236,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://bdtechtalks.substack.com/i/185647624?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nIVn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nIVn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nIVn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nIVn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d948bb3-fd5a-4f08-bcce-497b17dc732e_1440x900.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>While large language models (LLMs) have mastered the art of processing text and images, they remain largely confined to the digital realm. Moving from generating code to folding laundry requires a fundamental shift in how AI perceives the world. Microsoft is attempting to bridge this gap with <a href="https://www.microsoft.com/en-us/research/story/advancing-ai-for-the-physical-world/">Rho-alpha</a> (&#9076;&#593;), a new robotics foundation model designed to bring adaptivity to physical tasks.</p><p>Rho-alpha falls under the category of Vision-Language-Action (VLA) models. These systems ingest visual data and natural language commands to output robot arm actions. However, standard VLAs often struggle with precision tasks where vision is obstructed or insufficient, such as manipulating a slippery object or inserting a plug behind a desk. Rho-alpha addresses this by integrating tactile sensing directly into its decision-making process, a capability Microsoft refers to as &#8220;VLA+.&#8221;</p><h2>The architecture of VLA+</h2><p>The core innovation of Rho-alpha lies in how it processes sensory data. Most multimodal models attempt to tokenize every input, converting images and text into discrete units that a transformer can process. However, tactile feedback is a high-frequency, continuous signal that represents force and resistance and can&#8217;t be represented as discrete tokens.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>To handle this, Microsoft engineered a split architecture. The model uses a standard vision-language model (VLM) backbone, derived from Microsoft&#8217;s Phi family, to handle high-level reasoning and semantic understanding. However, the actual motor control is managed by a specialized module called the &#8220;action expert,&#8221; which is attached to the VLM. The tactile data is fused with image, text, and proprioception embeddings in the action expert. However, the tactile data bypasses the VLM component and is not tokenized.</p><p>In comments to TechTalks, Andrey Kolobov, Principal Research Manager at Microsoft Research, explained that this architecture allows the system to bypass the slower reasoning components when immediate physical reaction is needed.</p><p>&#8220;The model treats tactile as a continuous data source, providing information on the currently applied forces at the gripper fingertips,&#8221; Kolobov said.</p><p>This bypass mechanism is critical for latency. Feeding high-frequency force data through a massive transformer would introduce delays that make real-time control impossible. By fusing tactile data in the smaller, faster action expert, the robot can react to physical resistance instantly while still leveraging the VLM for broader context.</p><p>&#8220;We view the purpose of physical sensing modalities as helping our model be more reactive and adaptive,&#8221; Kolobov added. &#8220;Accordingly, we feed these modalities into the action expert, which is a small fraction of the overall architecture, bypassing the VLM.&#8221;</p><p>The long-term goal, Kolobov said, is to have the action expert or a part of it operate on proprioception and physical sensing modalities at a significantly higher frequency than on visual and language data.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;b2c62e8c-1dcf-451b-9af4-29d5d452463d&quot;,&quot;duration&quot;:null}"></div><h2>Establishing priors in simulation</h2><p>Training a model to interact with the physical world presents a data scarcity challenge. Unlike text, which can be scraped from the web in petabytes, robot interaction data is expensive and slow to collect. Microsoft addresses this by training Rho-alpha in a simulated environment using Nvidia Isaac Sim.</p><p>A problem in robotics is the difference between the simulated environment and the real world, a hurdle known as the &#8220;sim-to-real gap.&#8221; However, Microsoft&#8217;s approach sidesteps the need for perfectly bridging the sim-to-real gap. The goal of the simulation is not to create a 1:1 replica of the physical world, but to teach the model general concepts of physics and force.</p><p>&#8220;We actually don&#8217;t rely on the sim-to-real gap being small and do only conventional data augmentation,&#8221; Kolobov said. &#8220;The purpose of using simulated data during training is to give  a rough prior idea of what tactile and force feedback looks like and how it can be useful.&#8221;</p><p>By learning these &#8220;priors&#8221; in simulation, the model enters the real world already understanding that a spike in force readings usually means it has hit an obstacle. This allows it to fine-tune its policy with significantly less real-world data.</p><h2>Online learning and forgetting</h2><p>Once deployed, Rho-alpha continues to learn through human interaction. If the robot fails a task, a human operator can intervene via teleoperation (using devices like a 3D mouse) to correct the movement. The model ingests this feedback to update its policy.</p><p>However, this online learning capability introduces the risk of &#8220;<a href="https://en.wikipedia.org/wiki/Catastrophic_interference">catastrophic forgetting</a>,&#8221; where learning a new task causes the model to lose proficiency in previous ones.</p><p>&#8220;As the model learns from feedback on a given task, its performance on tasks not being exercised at the moment may degrade, unless care is taken to combat this,&#8221; Kolobov noted.</p><p>To mitigate this, the system can aggregate data and perform updates at regular intervals, effectively &#8220;reminding&#8221; the model of past experiences to maintain a balanced skill set.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://bdtechtalks.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">TechTalks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Bimanual manipulation and future applications</h2><p>Currently, Rho-alpha is optimized for bimanual (two-armed) manipulation. While many tasks can theoretically be performed with a single arm, the coordination of two end-effectors significantly improves efficiency in industrial settings.</p><p>&#8220;In many scenarios beyond pick-and-place, from folding laundry to packaging food to assembly, performing tasks with two end-effectors rather than one increases execution speed and robustness &#8211; and hence throughput,&#8221; Kolobov explained.</p><p>The model does have hardware limitations in its current state. It supports manipulation only, meaning it cannot control the mobile base of a robot or the body of a humanoid. Furthermore, the training data is heavily biased toward two-finger grippers, so using complex multi-fingered hands or suction cups would require additional post-training data.</p><p>Despite these constraints, the architecture offers a glimpse into the future of physical AI. By separating high-level semantic reasoning from low-level, high-frequency motor control, Microsoft is building a system that can think like an LLM but act with the reflexes required for the real world.</p>]]></content:encoded></item></channel></rss>