MUSE-Autoskill: ByteDance's Fix for AI Agents That Forget What They Learn

In short: ByteDance’s MUSE-Autoskill treats an agent skill as a lifecycle asset that gets created, remembered, tested, patched, and migrated, instead of a throwaway prompt. On the SkillsBench benchmark, human-written skills lifted it from 53.19% to 68.40%, and on the 35 tasks where it generated its own skill, accuracy reached 87.94%. Your coding agent spends twenty minutes working out a tricky deploy step. It works. The next day you hand it a nearly identical task, and it starts from zero: the same dead ends, the same twenty minutes. It read the docs, ran the commands, and even jotted down a lesson, but that lesson stayed trapped inside one task. When the task ended, the experience went with it. Agents forget, and for anyone who uses them daily this is a familiar, costly habit. On May 27, 2026, the ByteDance team released MUSE-Autoskill to attack exactly that: how an agent turns the experience it builds while doing tasks into skills it can reuse over the long term. There is more than one way to solve continual learning for agents. Some update the model weights, some optimize the outer workflow, and some externalize experience into memory and skills. This article focuses first on the two approaches most closely tied to skills. ...

 · 17 min · hohoda

SkillOpt: Stop Hand-Writing AI Agent Skills. Train Them.

In short: Microsoft Research’s SkillOpt turns AI agent skills into trainable artifacts. Instead of hand-writing CLAUDE.md, AGENTS.md, or best_skill.md and hoping the rules work, SkillOpt runs the agent, studies its failures, applies bounded text edits, validates the candidate skill, and keeps only changes that improve performance. Every serious AI agent user eventually starts writing instruction files: CLAUDE.md, AGENTS.md, best_skill.md, project rules, tool-use notes, formatting constraints, debugging routines. The pattern is familiar. You watch the agent fail a few times, write a better rule, rerun the task, then add another note. After a while, the instruction file becomes a small operating manual. If you work with Claude Code, Codex, Cursor, or any agent that lives inside a real project, this file quickly becomes part of the product. It tells the agent how to inspect files, when to run tests, how to format answers, which tool calls are safe, what to avoid in production code, and how to recover from common mistakes. The problem is that most of these files are written by feel. You notice a failure, write a rule, and hope the next run behaves better. Sometimes it does. Sometimes the new rule helps one task and harms another. Sometimes the instruction sounds precise to you but remains too vague for the model that has to act on it. ...

 · 14 min · hohoda

Humans Domesticate AI. AI Is Domesticating Us Too.

In short: AI agents do more than automate work. As humans domesticate AI with prompts, evals, and workflows, AI is also domesticating us by taking over the first move: the first outline, the first judgment, the first messy sentence, the first uncomfortable question. Wheat, Humans, and the Direction of Domestication In Sapiens: A Brief History of Humankind, Yuval Noah Harari makes a slightly uncomfortable point about the agricultural revolution: perhaps humans did not domesticate wheat so much as wheat domesticated humans. It sounds like a clever reversal at first, but the accounting is fairly plain. Wheat started as a wild grass in the Middle East. Over time it spread across the world, occupied enormous amounts of land, and got humans to clear fields, bend their backs, pull weeds, dig channels, build granaries, and stop wandering. Wheat did well. Human backs, less so. That story is useful before talking about AI, because it cuts through a lot of vague language about technology changing the world. A tool is not always something you use and then put back on the table. Stay with it long enough and it starts changing your movements, your schedule, and your sense of what feels normal. Wheat changed posture and settlement. The internet changed attention. AI is reaching a little further inward. It is changing how we begin to think about things. ...

 · 10 min · hohoda

Code as a Trained Output: The New Model of AI Coding

In short: AI coding agents are changing the status of code. In mature agentic workflows, code is no longer only written by humans; it is repeatedly generated, tested, corrected, and selected by an optimization loop. That makes tests look like loss functions, production failures look like generalization failures, architecture look like inductive bias, and harness engineering look like optimizer design. Introduction: A Shift We Have Not Yet Named Precisely Over the past eighteen months, software development has undergone a quiet but forceful restructuring. Tools such as Cursor, Claude Code, and Codex are pushing us away from the old workflow of “humans write code, machines assist with completion” toward something structurally different: humans describe intent, define constraints, and provide feedback, while agents repeatedly generate, run, and revise code until some convergence condition is met. Most industry commentary still frames this shift in productivity terms: “AI makes us write code N times faster.” That framing misses a more basic ontological question: in this new workflow, what has happened to the nature of code itself? ...

 · 18 min · hohoda

Why AI Agents Drift: Belief State Is the Real Bottleneck, Not Context Length

In short: Many AI agents look productive but are actually drifting — confidently executing the wrong moves on a wrong picture of the situation. The bottleneck for the next phase of agent systems is not larger context windows or stronger base models; it is whether the system can construct and maintain a stable belief state. This piece argues why belief state quality is the right optimization target, proposes five proxy metrics to measure it, and lays out where to put incremental engineering resources next. AI agents that look productive often turn out to be drifting — confidently executing the wrong moves on a wrong picture of the situation. Competition in agent systems is shifting from “whose model is stronger” toward “who can keep producing higher-quality belief state.” If you accept that framing, several seemingly unrelated problems suddenly line up: the same model behaves very differently inside different product shells; long-running agents fail not because they cannot answer but because their judgment of the situation is wrong; context windows keep growing, but system capability does not scale linearly with them; and scattered engineering pieces — skill, memory, retrieval, tool use, trace, summary — all start to matter at the same time. ...

 · 25 min · hohoda

Is AI Making Us Give Up Too Soon? What a 1,222-Person Study Revealed

Is AI Making Us Give Up Too Soon? What a 1,222-Person Study Revealed In short: A new randomized study (N = 1,222) shows that AI assistance can improve performance in the moment, while reducing independent performance once AI is removed and increasing how often people give up. The strongest negative effect appears in users who ask AI for direct answers. The fix is not to stop using AI, but to change when you bring it in. Ten minutes with an AI assistant. That is all it took, in a new randomized study of 1,222 people, for participants to perform worse on the next problem without AI — and to give up on that problem more often. Not because they were lazy. Because they had stopped expecting hard things to feel hard. This is the second time in a year that careful research has pointed at the same shape of risk. The familiar version of the question is whether AI is making us lazy. Every new tool brings a version of this worry. Calculators made people do less mental arithmetic. Search engines made people remember less. Navigation apps made people worse at finding their way around. ...

 · 11 min · hohoda

Compression Is All You Need

Inside a new Freedman paper: a Googol hidden in 100 tokens, and why mathematics is a three-thousand-year AlphaZero run. In March this year, Michael Freedman, who won the Fields Medal back in 1986, published a paper with a few collaborators. The title is brash: Compression Is All You Need: Modeling Mathematics. I am borrowing it for this essay, because once you see what they measured, no other title does the job. They did something that sounds dull at first. They took MathLib, the Lean 4 library with roughly half a million theorems, definitions, and lemmas, turned the whole thing into a dependency graph, and measured two numbers for every element. One they call wrapped length: how many tokens you write in the Lean source to state this thing. The other is unwrapped length: if you recursively expand every reference down to the base axioms, how many raw symbols do you end up with. Then they went looking for the deepest element in MathLib. They found a theorem in algebraic geometry called AlgebraicGeometry.Scheme.exists_hom_hom_comp_eq_comp_of_locallyOfFiniteType. Wrapped, it takes about 100 tokens. Fully unwrapped, it contains around $10^{104}$ raw symbols. ...

 · 9 min · hohoda

5,000 Feeds, 20 Highlights: Your AI Agent Is Killing Your Serendipity

A friend recently showed me his new tool, beaming with excitement. He follows about 5,000 people on X. Researchers, founders, investors, developers, media figures — after years of accumulating, his feed had long since become a bottomless waterfall. He’d tried “read later” apps before, bookmarking over a thousand articles and actually reading five. Like most people. Now he uses an AI agent that reads the full output of all 5,000 accounts, compressing everything into 20 curated highlights per day. Fifty-four structured briefings in ten days. What used to take two hours to skim now takes five minutes. Ninety-five percent of noise, filtered out. “The root of information anxiety is the cost of filtering,” he said. “Hand the filtering to an agent, and the anxiety disappears.” He’s right. But only about the first half. The anxiety does disappear. What also disappears is everything you didn’t know you needed to know. Five thousand tweets compressed to twenty. Among the 4,980 discarded, there might have been one from a field you’ve never followed, using logic you’ve never encountered, explaining a problem you thought you’d already figured out. ...

 · 12 min · hohoda

10 Claude Code Skill-Writing Patterns the Docs Don't Teach You

On March 31, Anthropic accidentally shipped a source map file in their Claude Code npm package — and for a brief window, the complete TypeScript source (512,000 lines across ~1,900 files) was publicly accessible. The community archived it before Anthropic could pull it down. I spent a few days going through the built-in skills: simplify, batch, skillify, and a dozen others. Most of the community attention went to the hidden feature flags and the easter egg pet system. What caught my eye was less flashy: the way Anthropic’s engineers write their own skills differs from what the official docs teach. Claude Code Skills has two official references — the Skills docs and the Agent Skills Best Practices guide. Both are worth reading. Neither prepares you for what the built-in skills actually look like. This post distills 10 patterns that are in the source but not in the docs. Each one shows a ❌ typical doc-style approach vs ✅ the actual built-in skill approach. If you write SKILL.md files for Claude Code, these patterns change how you structure them. ...

 · 10 min · hohoda

web-access: Browser Automation Skill for Claude Code Agents

Claude Code ships with search and fetch. They work fine for public pages with clean HTML. The moment you need anything behind a login, inside a JS-rendered app, or across multiple sites in parallel, they hit a wall. This is not a model problem. The tools just were not designed for that. Independent developer Eze (一泽Eze) built web-access to close that gap — an Agent Skill for Claude Code (and OpenClaw) that adds real browser automation, parallel tab management, and automatic site-experience memory. It is published at eze-is/web-access under the MIT license. Language note: the skill is currently Chinese-only. There is no official English version yet. The installation prompt in this article has been translated, but the skill’s internal documentation loads in Chinese. Keep that in mind if you are working with a model that performs better on English context. What Claude Code’s built-in web tools cannot do Claude Code gives agents two web tools: search — queries Brave Search and returns summaries fetch — pulls the plain-text content of a URL OpenClaw’s web_search and web_fetch are the same pattern. ...

 · 5 min · hohoda