{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "Claude Blog",
  "home_page_url": "https://claude.com/blog",
  "description": "Practical guidance and best practices for building with Claude",
  "language": "en",
  "items": [
    {
      "id": "/blog/building-with-claude-managed-agents",
      "url": "https://claude.com/blog/building-with-claude-managed-agents",
      "title": "The evolution of agentic surfaces: building with Claude Managed Agents",
      "content_html": "\u003cp\u003eGetting an agent into production takes more than a good prompt. The agent needs somewhere to run the code it writes, credentials to reach your data, observable sessions, and infrastructure that scales with usage. On the Applied AI team, we work at the intersection of product, research, and the customers building on Claude—and we see the same pattern repeatedly: infrastructure is what separates a prototype from a production agent. All too often, teams burn development cycles on security, state management, permissioning, and harness tuning. \u003c/p\u003e\u003cp\u003e\u003ca href=\"https://platform.claude.com/docs/en/managed-agents/overview\" target=\"_blank\"\u003eClaude Managed Agents\u003c/a\u003e, our suite of composable APIs for building and deploying production-grade agents, pairs an agent harness tuned for performance with production infrastructure, allowing teams to go from prototype to launch in days rather than months. In this post, we\u0026#39;ll cover the evolution of Anthropic’s agentic building blocks, why we built Claude Managed Agents, and how teams are using it in production today. \u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eEvolving the agent architecture\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eWhen we opened up Claude to developers in 2023, the API was deliberately simple: tokens in, tokens out. You sent a prompt, Claude returned a completion, and you built the harness and underlying infrastructure.\u003c/p\u003e\u003cp\u003eThe API grew steadily richer over the years, but the contract underneath never changed: one request, one model turn, and your application decides what happens next. For a long time, that was enough. Summarizing a document, classifying a support ticket, rewriting a block of text—the kind of work that fits comfortably in a single turn.\u003c/p\u003e\u003cp\u003eOver time, however, the tasks people wanted to hand off stopped fitting.  They wanted Claude to carry a task all the way through, look something up, act on it, see what changed, and decide what to do next. And they wanted it to operate \u003cem\u003ein\u003c/em\u003e the systems their work already ran on, like a codebase, internal wiki, or ticketing system.\u003c/p\u003e\u003cp\u003eWith the API, turning Claude into an agent meant building your own loop: ask the model what to do, run the tool, feed the result back, and repeat. You were responsible for building and deploying the agent scaffolding, which may need tuning as models evolve. For agents that require full customization, this approach makes sense. For agentic workloads that are more predictable and less complex, optimizing harnesses as models and products evolved became tedious. \u003c/p\u003e\u003cfigure style=\"max-width:4800pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a298c28f950480f89a8dfcf_01%20_%20Messages%20API.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003e\u003ca href=\"https://code.claude.com/docs/en/overview\" target=\"_blank\"\u003eClaude Code\u003c/a\u003e, the agentic coding tool we launched in 2025 that lets Claude interact directly with your codebase, contained our own version of that harness: the loop, tool execution, subagents, context management, and rich capabilities that made it an effective agent. Developers naturally wanted similar harness machinery for their own agents across various domains.\u003c/p\u003e\u003cp\u003eTo enable teams to build agents on top of the Claude Code harness, we released \u003ca href=\"https://code.claude.com/docs/en/agent-sdk/overview\" target=\"_blank\"\u003eClaude Agent SDK\u003c/a\u003e. Claude Agent SDK gives developers tools to build their own agents on the same machinery that runs Claude Code instead of maintaining a homegrown loop. For a lot of teams, this is when agents became practical: the harness arrived already tuned for Claude with infrastructure primitives and it kept improving as Claude Code did.\u003c/p\u003e\u003cp\u003eEven with a harness, though, deploying agents in production environments can be challenging for several reasons:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eHosting and scaling.\u003c/strong\u003e Where does the agent run, how long can a process stay alive for a multi-hour task, and what scales it when usage grows? \u003c/li\u003e\u003cli\u003e\u003cstrong\u003eSession management.\u003c/strong\u003e Where does an agent\u0026#39;s history and progress live? Can a run survive an interruption and resume unencumbered? Can you go back and inspect what happened in previous sessions?\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eFilesystem management. \u003c/strong\u003eDoing real work means producing artifacts: editing code, writing files, building outputs. Where does the agent get a workspace to act on, and what happens to that workspace between runs?\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eExecution isolation.\u003c/strong\u003e The code Claude writes has to execute somewhere. What\u0026#39;s the blast radius if it\u0026#39;s wrong, and what boundary would you actually trust in production?\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eCredentials. \u003c/strong\u003eThe agent needs access to your systems. How does it get that access without exposing proprietary information to the code it generates?\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eObservability. \u003c/strong\u003eWhen an agent works autonomously for an hour and does something surprising, can you reconstruct every step it took?\u003c/li\u003e\u003c/ul\u003e\u003cp\u003eWith the Agent SDK, many elements of the aforementioned production infrastructure are provided through Claude Code’s machinery. The agent gets a real filesystem to work in, session state is persisted locally or on external storage, and observability is exportable through OpenTelemetry into whatever monitoring stack you already run.\u003c/p\u003e\u003cfigure style=\"max-width:4800pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a298c53aaeeee508f2b3166_02%20_%20Claude%20Agent%20SDK.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eHowever, as teams increasingly built agents that moved out of local development into production, they needed a way to deploy them at scale and with managed infrastructure. And as models and their surrounding harnesses become more advanced–running longer, executing more code, touching more systems, and taking more actions– scaling, security, and sandboxing became more challenging.\u003c/p\u003e\u003cp\u003eSeveral of these hurdles stem from a common architectural choice: agent harnesses often run \u003cem\u003einside the same container\u003c/em\u003e as the filesystem it works on. A container has to spin up (paying a startup cost) before Claude can think, the agent along with code execution lives right next to your credentials, and when the container dies, the run dies with it.\u003c/p\u003e\u003cp\u003eManaged Agents solves these problems by \u003ca href=\"https://www.anthropic.com/engineering/managed-agents\" target=\"_blank\"\u003edecoupling the brain from the hands\u003c/a\u003e. The harness that calls Claude runs separately from the sandbox where code executes, and the session–an append-only log of every model call, tool call, and result–connects the two. Claude can start reasoning before any container exists, the sandbox stays far away from your credentials, and a whole run can be reconstructed from its session at any point.\u003c/p\u003e\u003cfigure style=\"max-width:4800pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a298c97d4a887f2666a50b6_03%20_%20Claude%20Managed%20Agents.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003ch2\u003e\u003cstrong\u003eWhen and why to use Claude Managed Agents \u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eWhen building with Managed Agents, users define the task, the tools, and the guardrails, and Anthropic runs the agent on our infrastructure and handles the agentic loop underneath: how to give an agent an execution environment to call tools, how to recover when something fails, multi-agent orchestration, and more.\u003c/p\u003e\u003cp\u003eWhen the harness doesn’t evolve alongside model intelligence, \u003ca href=\"https://www.anthropic.com/engineering/harness-design-long-running-apps\" target=\"_blank\"\u003ethe agent breaks down\u003c/a\u003e. On Claude Sonnet 4.5, an agent would rush to finish as it neared the end of its context, cutting work short rather than using the room it had left—a pattern called \u0026#34;context anxiety.\u0026#34; Our fix was to add context resets to the harness, baking in an assumption that Claude needed help staying coherent near the limit. That assumption didn\u0026#39;t survive the next model. On Claude Opus 4.5, the behavior was gone, and the resets we\u0026#39;d added were just overhead.\u003c/p\u003e\u003cp\u003eFor most organizations, maintaining a harness is overhead that doesn\u0026#39;t differentiate their product. Harnesses have to be tuned for certain model behaviors; primitives like compaction, tool execution, and caching works differently on Claude than other models. With Claude Managed Agents, the harness evolves alongside the model, allowing teams to focus on what will differentiate their agents: \u003cstrong\u003econtext management and domain expertise.\u003c/strong\u003e \u003c/p\u003e\u003cp\u003eTo enable developers to configure the context and tools necessary to build effective agents, Managed Agents is built around three primary resources: agents, environments, and sessions. An \u003cem\u003eagent\u003c/em\u003e is a configuration: a model, a prompt, a set of tools, and the guardrails around them. An \u003cem\u003eenvironment\u003c/em\u003e is the execution context the agent runs in: the sandbox container, its networking rules, and the packages pre-installed in it, hosted on our cloud or on infrastructure you control. Each run is a \u003cem\u003esession\u003c/em\u003e, which pairs an agent with an environment and gets its own isolated sandbox instance. Sessions persist their full event history, sandbox state, and outputs server-side, so long-running work can pause, resume cleanly, and be traced step by step after the fact. With Managed Agents, you can define an agent and an environment once, then run many sessions against the same configuration as your workload grows.\u003c/p\u003e\u003cfigure style=\"max-width:4800pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a29a18bb07e245f8389acb9_04%20_%20Agents_%20environments_%20sessions%20(2).png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003ch2\u003e\u003cstrong\u003eBuilding for production and scale on Managed Agents\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eWithin Applied AI, we see agents go from prototype to production both inside Anthropic and across our customers’ systems, across coding, finance, support, legal, and a dozen other domains. This gives us a clear view of what separates a demo from a production-ready agent and where teams often get stuck.\u003c/p\u003e\u003cp\u003eBelow, we share the most common reasons to build on a managed service like Claude Managed Agents: \u003c/p\u003e\u003cp\u003e\u003cstrong\u003e1. Credentials are kept out of the sandbox.\u003c/strong\u003e When everything runs in one container, the code Claude generates sits right next to your credentials, so prompt injections could lead the model to leak a token by convincing the model to read its own environment. We can protect against this by setting up robust guardrails within the same container, but decoupling the architecture enables a much more secure approach by keeping credentials out of the sandbox entirely. Tokens for tools like MCPs, CLIs, and GitHub repos live in a separate vault, and a proxy fetches them and decrypts them only on demand. Managed Agents provides \u003ca href=\"https://platform.claude.com/docs/en/managed-agents/vaults\" target=\"_blank\"\u003eVaults\u003c/a\u003e that handle credentials out-of-the-box, so you don’t need to run your own secret store, transmit tokens on every call, or lose track of which end user an agent acted on behalf of. Vault credentials are protected with envelope encryption before storage, and retrieval requires a signed request token for verification.\u003c/p\u003e\u003cfigure style=\"max-width:4800pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a29a19cebb4eb7adac0a8ec_05%20_%20Managed%20Agents%20runtime%20(1).png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003e\u003cstrong\u003e2. Lower latency from eliminated sandbox overhead.\u003c/strong\u003e Latency is a metric that is top-of-mind for many enterprise teams, since users acutely feel when they’re waiting for Claude to respond. Without the Managed Agents architecture, a container has to be spun up for every session, even the ones where the agent only needs to think and never runs a tool. That setup time is wasted, and the user feels it as a delay before the first response. With Managed Agents, Claude begins reasoning immediately while the environment spins up in parallel, and sessions that never run a tool skip the container entirely. This means the user sees the first token without waiting on container startup, and the environment is ready by the time the agent needs to run something. In our testing, that cut the time-to-first-token by roughly 60% in the median case (p50) and by over 90% in the slowest cases (p95). \u003c/p\u003e\u003cp\u003e\u003cstrong\u003e3. Reliable, persistent sessions that enable session management, observability, and memory. \u003c/strong\u003eInstead of request/response, Managed Agents thinks in terms of \u003cem\u003eevents. \u003c/em\u003eA session is an ongoing stream of events: every model call, tool call, and result, are appended to a log that lives outside the process running the agent. With this architecture, you get real-time updates as events stream in while the agent works, and you can resume any session later with no database or save-points to manage. History is preserved between interactions unless you delete the session, and when a session goes idle its container is checkpointed so you can pick up cleanly from where it paused. And because the whole run is already a record of events, observability and memory come with it: the Claude Developer Console offers a native visual timeline view of your agent sessions, and a debugging experience that allows you to examine any transcript in-depth. Managed Agents also comes with features like Memory and Dreaming that also use this session durability. \u003ca href=\"https://platform.claude.com/docs/en/managed-agents/dreams\" target=\"_blank\"\u003eDreaming\u003c/a\u003e is a scheduled process that reviews your agent sessions and memory stores, extracts patterns, and curates memories so your agents improve over time. Dreaming refines memory between sessions so that it can improve from recurring mistakes and user preferences by reading from the persistent session logs.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003e4. Flexibility in Anthropic-managed or self-hosted cloud containers.\u003c/strong\u003e By default, with Managed Agents, you can delegate both orchestration and tool execution to Anthropic-managed cloud containers. This makes hosting and scaling simple and easy, delivering a faster path to production. Because the brain is decoupled from the hands in Managed Agents, the hands can live anywhere, including inside your Virtual Private Cloud (VPC). Thus, we also offer \u003ca href=\"https://platform.claude.com/docs/en/managed-agents/self-hosted-sandboxes\" target=\"_blank\"\u003eself-hosted sandboxes\u003c/a\u003e for teams that want control over tool execution, so the agent’s code, filesystem, and network egress never leave their environment. We also provide \u003ca href=\"https://platform.claude.com/docs/en/agents-and-tools/mcp-tunnels/overview\" target=\"_blank\"\u003eMCP tunnels\u003c/a\u003e, which let you connect Claude to Model Context Protocol (MCP) servers that run inside your private network. So self-hosted sandboxes control \u003cem\u003ewhere the agent’s code executes\u003c/em\u003e, and MCP tunnels control \u003cem\u003ehow Anthropic reaches MCP servers in your network\u003c/em\u003e, giving you the ability to control exactly what stays inside your boundary.\u003c/p\u003e\u003cfigure class=\"w-richtext-align-center w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a298e427c7a804ea4295163_image7.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003cfigcaption\u003e\u003cem\u003eThe built-in observability console for Claude Managed Agents records every event, so you can scrub the timeline, open any step, and read its raw payload.\u003c/em\u003e\u003c/figcaption\u003e\u003c/figure\u003e\u003cp\u003eBeyond these features, additional capabilities include outcomes that let an agent grade its own work against a rubric, multiagent orchestration, permission policies, and webhooks. Learn more \u003ca href=\"https://platform.claude.com/docs/en/managed-agents/overview\"\u003ehere\u003c/a\u003e.\u003c/p\u003e\u003ch3\u003e\u003cstrong\u003eHow customers are building on Managed Agents today\u003c/strong\u003e\u003c/h3\u003e\u003cp\u003eAcross industries, customers are already shipping agents in production with Claude Managed Agents. Here are a few examples:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003ca href=\"https://claude.com/customers/notion\" target=\"_blank\"\u003eNotion\u003c/a\u003e runs its Custom Agents on Managed Agents: teams assign work to Claude straight from a task board, Claude picks up the docs, meeting notes, and connected data around each task, and the finished code, decks, and sites land back in the workspace for review. Dozens of tasks run in parallel, and their team has described an early prototype turning roughly twelve hours of work into twenty minutes.\u003c/li\u003e\u003cli\u003e\u003ca href=\"https://claude.com/customers/rakuten\" target=\"_blank\"\u003eRakuten\u003c/a\u003e used Managed Agents to ship specialist agents across product, sales, marketing, and finance, each live within about a week.\u003ca href=\"https://claude.com/customers/sentry\"\u003e \u003c/a\u003e\u003c/li\u003e\u003cli\u003e\u003ca href=\"https://claude.com/customers/sentry\" target=\"_blank\"\u003eSentry\u003c/a\u003e paired its Seer debugging agent with a Claude agent that writes the patch and opens the PR, built in weeks instead of months by a single engineer.\u003ca href=\"https://claude.com/blog/claude-managed-agents\"\u003e \u003c/a\u003e\u003c/li\u003e\u003cli\u003e\u003ca href=\"https://claude.com/blog/claude-managed-agents\" target=\"_blank\"\u003eAsana\u003c/a\u003e built AI Teammates that pick up tasks inside projects, and\u003ca href=\"https://claude.com/blog/claude-managed-agents\" target=\"_blank\"\u003e Atlassian\u003c/a\u003e put developer agents into Jira workflows. \u003c/li\u003e\u003c/ul\u003e\u003ch2\u003e\u003cstrong\u003eGetting started with Claude Managed Agents\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eWe built Managed Agents to make it as easy as possible to spin up agents through Claude Code and the Claude Developer Console at \u003ca href=\"http://platform.claude.com\" target=\"_blank\"\u003eplatform.claude.com\u003c/a\u003e. The Console’s quickstart, for example, lets you start from an agent template or describe an agent in plain language, then turn it into a production-ready agent you can secure and deploy in minutes.\u003c/p\u003e\u003cfigure class=\"w-richtext-align-center w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a298e9b866a4402a3c9bb5d_image5.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003cfigcaption\u003e\u003cem\u003eThe agent quickstart at platform.claude.com: start from a template or describe what you want to build.\u003c/em\u003e\u003c/figcaption\u003e\u003c/figure\u003e\u003cfigure class=\"w-richtext-align-center w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a298ebdff6d26839e052c63_image9.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003cfigcaption\u003e\u003cem\u003eA few steps later: the agent is created, the environment is configured, and a session is live. The console streams the run as it happens.\u003c/em\u003e\u003c/figcaption\u003e\u003c/figure\u003e\u003cp\u003eIn Claude Code, the \u003ca href=\"https://platform.claude.com/docs/en/agents-and-tools/agent-skills/claude-api-skill\" target=\"_blank\"\u003e/claude-api skill \u003c/a\u003eis provided by default and provides Claude with detailed, up-to-date reference material for building applications on Claude Managed Agents. We highly recommend that you utilize it for the best practices on setting up your Managed Agents application. Get started by running /claude-api managed-agents-onboard for an interview-driven walkthrough for setting up a new Managed Agent from scratch.\u003c/p\u003e\u003cfigure class=\"w-richtext-align-center w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a298ef3765ce453971174cd_image6.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003ch2\u003e\u003cstrong\u003eThe future of building managed agents\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eAs teams share what they’re building with Managed Agents, we see that the time they used to spend on production infrastructure now goes to what differentiates their agents: managing context and tailoring the experience to users. Now, when a new model comes out, you update your agent to use it, rerun your evals, and ship the improvement without touching the architecture underneath.\u003c/p\u003e\u003cp\u003eWe’re excited to see what you build.\u003c/p\u003e\u003cp\u003e\u003ca href=\"https://platform.claude.com/docs/en/managed-agents/overview\" target=\"_blank\"\u003e\u003cstrong\u003e\u003cem\u003eGet started\u003c/em\u003e\u003c/strong\u003e\u003c/a\u003e\u003cstrong\u003e\u003cem\u003e with Claude Managed Agents.\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\u003cp\u003e\u003cem\u003eThis article was written by Gagan Bhat and Isabella He, Members of Technical Staff on Anthropic’s Applied AI team. They\u0026#39;d like to thank Hema Thanki, Jess Yan, and Molly Vorwerck for their contributions.\u003c/em\u003e\u003c/p\u003e",
      "summary": "Claude Managed Agents allows teams to build and deploy agents in production environments reliably at scale. Here’s why and how teams are using it.",
      "date_published": "0001-01-01T00:00:00Z",
      "tags": [
        "Agents"
      ]
    },
    {
      "id": "/blog/whats-new-in-claude-managed-agents",
      "url": "https://claude.com/blog/whats-new-in-claude-managed-agents",
      "title": "New in Claude Managed Agents: run agents on a schedule and store environment variables in vaults",
      "content_html": "\u003cp\u003eStarting today, Claude Managed Agents can run on a schedule and securely access CLI tools and other authenticated services. Both features are now available in public beta on the Claude Platform.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eRun agents on a schedule\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eAgents can now run on a schedule, completing routine work automatically. A \u003ca href=\"https://platform.claude.com/docs/en/managed-agents/scheduled-deployments\"\u003escheduled deployment\u003c/a\u003e gives an agent a cron schedule. Each time the schedule fires, the agent starts a new session and completes its task, with no scheduler for you to build or host. \u003c/p\u003e\u003cp\u003eUse it for recurring work like a nightly data sync, a weekly compliance scan, or a daily digest. Once a deployment is live, you can pause, resume, or archive it at any time, or trigger additional runs on demand.\u003c/p\u003e\u003cfigure style=\"max-width:3840pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a2704ab5b6bc1de3bb952fc_Claude-Console-Scheduled-Deployments.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eTeams are already using scheduled deployments to automate recurring work:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003ca href=\"https://claude.com/customers/rakuten-qa\"\u003eRakuten\u003c/a\u003e uses scheduled deployments to analyze spreadsheet data and produce reports and decks on a weekly or monthly schedule. Teams also monitor production logs and metrics, allowing product managers to see application health without creating a dashboard.\u003c/li\u003e\u003cli\u003e\u003ca href=\"https://actively.ai/\"\u003eActively AI\u003c/a\u003e uses Managed Agents to power cross-account agentic search for sales teams. Scheduled deployments refresh answers regularly, simplifying their stack by replacing scheduling infrastructure the team initially built themselves.\u003ca href=\"https://ando.so\"\u003e‍\u003c/a\u003e\u003c/li\u003e\u003cli\u003e\u003ca href=\"https://ando.so\"\u003eAndo\u003c/a\u003e uses scheduled deployments to keep hiring and sales teams moving. Agents autonomously watch channels for proposed next steps, follow up when they\u0026#39;re due, and send meeting reminders.\u003c/li\u003e\u003c/ul\u003e\u003ch2\u003e\u003cstrong\u003eStore environment variables in vaults to authenticate CLIs and other tools\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eAgents\u003ca href=\"https://claude.com/blog/building-agents-that-reach-production-systems-with-mcp\"\u003e connect to external systems\u003c/a\u003e through direct API calls, CLIs, and MCP. Now we\u0026#39;re extending \u003ca href=\"https://platform.claude.com/docs/en/managed-agents/vaults\"\u003evaults\u003c/a\u003e to support environment variables, so CLIs and other tools can make authenticated requests. CLIs let agents drive existing command-line tools directly through a shell, making them a fast, lightweight integration path. Register an API key with an environment variable name and the domains it can reach, and the CLIs installed in an agent\u0026#39;s sandbox can use it to make authenticated API calls.\u003c/p\u003e\u003cp\u003eThe agent never sees your key because the sandbox only holds a placeholder. The real key is attached at the network boundary, and only on requests to domains you allow, so it only goes where you’ve approved. To change a key, update it in the vault, and running sessions will pick up the new value on their next call. Most CLIs that send their key in an HTTP request work this way, including the Browserbase, KERNEL, Notion, Ramp, and Sentry CLIs. \u003ca href=\"https://docs.browserbase.com/integrations/anthropic/managed-agents/quickstart\"\u003eBrowserbase\u003c/a\u003e and \u003ca href=\"https://www.kernel.sh/docs/integrations/claude-managed-agents\"\u003eKERNEL\u003c/a\u003e give Managed Agents browser capabilities for the first time, so agents can navigate and interact with the web alongside their other tools.\u003c/p\u003e\u003cfigure style=\"max-width:3840pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a27074e40b19ba74e79b134_Claude-Managed-Agents-CLI-credential-vaults-diagram%20(1).png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eTeams are using environment variables in vaults to give agents secure access to authenticated tools:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003ca href=\"https://claude.com/customers/notion-qa\"\u003eNotion\u003c/a\u003e uses environment variables in vaults to roll out its CLI alongside MCP tools, adding file-upload capabilities to its agents without API tokens ever being handed to the model.\u003c/li\u003e\u003cli\u003e\u003ca href=\"https://www.browserbase.com/\"\u003eBrowserbase\u003c/a\u003e built its public catalog of browser skills using the \u003ca href=\"https://www.npmjs.com/package/browse\"\u003ebrowse CLI\u003c/a\u003e, authenticated through vaults. A scheduled deployment periodically validates the catalog to keep it accurate.\u003c/li\u003e\u003cli\u003e\u003ca href=\"https://www.kernel.sh/docs/integrations/claude-managed-agents\"\u003eKERNEL\u003c/a\u003e uses environment variables in vaults to securely connect agents to the databases where it tracks usage and customer conversations. The agent flags usage surges as they happen, so the team can confirm with customers if the activity is intended.\u003ca href=\"https://getmilana.ai/\"\u003e‍\u003c/a\u003e\u003c/li\u003e\u003cli\u003e\u003ca href=\"https://getmilana.ai/\"\u003eMilana\u003c/a\u003e uses environment variables in vaults to securely connect its AI product engineer to a customer\u0026#39;s codebase. The agent finds and fixes bugs automatically, with large-scale data analysis running faster than before.\u003c/li\u003e\u003c/ul\u003e",
      "summary": "Claude Managed Agents can now run on a schedule and securely access CLI tools and other authenticated services.",
      "date_published": "0001-01-01T00:00:00Z",
      "tags": [
        "Product announcements"
      ]
    },
    {
      "id": "/blog/claude-for-foundation-models",
      "url": "https://claude.com/blog/claude-for-foundation-models",
      "title": "Building intelligent apps for Apple platforms with Claude in the Foundation Models framework",
      "content_html": "\u003cp\u003eToday we\u0026#39;re releasing Foundation Models framework support for Claude through a new Swift package that lets Apple developers use Apple\u0026#39;s Foundation Models framework to call Claude for more complex workflows.\u003c/p\u003e\u003cfigure style=\"max-width:2048pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a26f71ab79bc169ff9bdec4_8dfc12d1.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eApple’s Foundation Models framework gives developers access to tap into models natively from Swift. It is very easy to use and can return typed Swift values through guided generation in as few as three lines of code. Developers can use this to tap into Apple’s on-device models for fast, local tasks like summarization or extraction.\u003c/p\u003e\u003cp\u003eDevelopers can now use Apple’s Foundation Models framework to hand off to Claude when a request calls for multi-step reasoning, code generation, and more. Claude can also search the web for current information and execute code for data analysis. Stream Claude\u0026#39;s response back into the same view.\u003c/p\u003e\u003cp\u003eBecause Apple\u0026#39;s framework returns typed Swift values from @Generable annotations, developers arrive at the Claude API call with clean inputs instead of raw user text.\u003c/p\u003e\u003ch2\u003eWhat this unlocks\u003c/h2\u003e\u003cp\u003eThe Foundation Models framework already powers a range of intelligent on-device features — journaling apps that surface personalized prompts, document apps that summarize contracts, learning apps that explain a concept at a student\u0026#39;s level. Adding Claude extends each of those patterns.\u003c/p\u003e\u003cfigure style=\"max-width:2048pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a26f71ab79bc169ff9bdec1_7c4a5aaf.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eA journaling app can generate daily prompts on-device, then ask Claude to find threads across months of entries. A study app can define a term on-device, then hand off to Claude when the student follows up with \u0026#34;why does this matter for everything else we\u0026#39;ve covered?\u0026#34;\u003c/p\u003e\u003cp\u003eIt\u0026#39;s one experience for the user, backed by the right model for each step.\u003c/p\u003e\u003ch2\u003eGetting started\u003c/h2\u003e\u003cp\u003eClaude support with the Foundation Models framework will be available tomorrow and works through Apple\u0026#39;s Foundation Models framework on iOS 27, iPadOS 27, macOS 27, and visionOS 27, and watch OS 27. Add it to your project, sign in with an Anthropic API key, and pass typed outputs from Apple\u0026#39;s on-device pass into a Claude request — the package handles streaming, tool calls, and structured responses back into your SwiftUI view.\u003c/p\u003e",
      "summary": "A new Swift package connects Apple's Foundation Models framework to Claude. Hand off complex reasoning from on-device models with typed Swift outputs.",
      "date_published": "0001-01-01T00:00:00Z",
      "tags": [
        "Product announcements"
      ]
    },
    {
      "id": "/blog/observability-for-developers-building-connectors",
      "url": "https://claude.com/blog/observability-for-developers-building-connectors",
      "title": "Observability for developers building connectors",
      "content_html": "\u003ch2\u003eMonitor, debug, and improve connectors\u003c/h2\u003e\u003cp\u003ePublished connectors in the \u003ca href=\"https://claude.ai/directory/connectors\"\u003edirectory\u003c/a\u003e now have a dashboard showing how they’re performing across Claude product surfaces. Connector owners can use it to:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eTrack adoption.\u003c/strong\u003e Monitor active users, total tool calls, and directory rank over time.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eDiagnose errors and latency.\u003c/strong\u003e See health score, error rates, and latency at a glance, with per-tool error breakdowns to pinpoint what\u0026#39;s failing.\u003cstrong\u003e‍\u003c/strong\u003e\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eBreak down usage by product.\u003c/strong\u003e Compare tool calls across Claude, Claude Code, Cowork, and more to understand where users are engaging.\u003c/li\u003e\u003c/ul\u003e\u003cfigure class=\"w-richtext-align-center w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a26eb0505466f798299b38a_MCP%20Observability.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003cfigcaption\u003e\u003cem\u003eStylized view of observability for connectors. Data is illustrative.\u003c/em\u003e\u003c/figcaption\u003e\u003c/figure\u003e\u003cp\u003eAvailable today in public beta. Find it in Claude under\u003ca href=\"https://claude.ai/admin-settings/directory/submissions\"\u003e Directory\u003c/a\u003e in\u003ca href=\"https://claude.ai/admin-settings/organization\"\u003e Organization settings\u003c/a\u003e. Requires Admin or Owner access on a Team or Enterprise plan. On Enterprise, Owners can also delegate access with a \u003ca href=\"https://support.claude.com/en/articles/13930452-manage-custom-roles-on-enterprise-plans\"\u003ecustom role\u003c/a\u003e that has the Directory management or Libraries permission.\u003c/p\u003e\u003ch2\u003eJoining the directory\u003c/h2\u003e\u003cp\u003eConnectors are built on the\u003ca href=\"https://modelcontextprotocol.io/docs/getting-started/intro\"\u003e Model Context Protocol (MCP)\u003c/a\u003e. There are over 300 third-party connectors in the\u003ca href=\"https://claude.ai/directory/connectors\"\u003e directory\u003c/a\u003e, used by millions of people every day. If you wish to submit your MCP server to the directory, you can now do so directly in Claude.\u003cstrong\u003e \u003c/strong\u003e\u003ca href=\"https://claude.com/docs/connectors/building/submission\"\u003eLearn more\u003c/a\u003e.\u003c/p\u003e",
      "summary": "Monitor connector performance across Claude, diagnose errors and latency, and submit your MCP server to the directory in-app. Public beta now live.",
      "date_published": "0001-01-01T00:00:00Z",
      "tags": [
        "Product announcements"
      ]
    },
    {
      "id": "/blog/the-claude-cowork-product-guide",
      "url": "https://claude.com/blog/the-claude-cowork-product-guide",
      "title": "The Claude Cowork product guide",
      "content_html": "\u003cp\u003eMost AI tools are conversational. You ask a question, you get an answer, and the work of turning that answer into something useful—a deck, a doc, a spreadsheet, an email—is still manual. \u003c/p\u003e\u003cp\u003eClaude Cowork, our new knowledge work agent, lets you delegate this work to Claude so you can focus on solving more the strategic and creative problems that\u0026#39;s occupying your time. Running in the Claude desktop app, Claude Cowork reads and writes local files, works across connected apps like Slack and Google Drive, and carries multi-step tasks through to real deliverables, with citations back to the actual files and messages. You describe the goal, desired outcome and cadence, and Claude lays out the steps and does the work, ensuring that you\u0026#39;re along every step of the way.\u003c/p\u003e\u003cp\u003eTo help you get started, we put together a practical product guide for Claude Cowork. We share: \u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eWhat makes Claude Cowork different\u003c/strong\u003e from conversational AI tools and its core capabilities, including local file access, subagents, long-running work, and scheduled tasks\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eA product matrix for understanding when to use what tool:\u003c/strong\u003e chat for conversational drafting, Claude Code for coding, and Claude Cowork for cross-app knowledge work\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eSetup and system requirements\u003c/strong\u003e, the permissions model, and what to try first\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eSeven common Claude Cowork workflows\u003c/strong\u003e, including research briefs, meeting prep, and recurring reports\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eHow to use plugins\u003c/strong\u003e for specific knowledge work like marketing and product management\u003c/li\u003e\u003c/ul\u003e\u003cp\u003e\u003cstrong\u003eCheck it out, \u003c/strong\u003e\u003ca href=\"https://cdn.prod.website-files.com/6889473510b50328dbb70ae6/6a2313fa599bd2e2270fda75_Claude-eBook-Claude-Cowork-product-guide-06052026.pdf\"\u003e\u003cstrong\u003ehere\u003c/strong\u003e\u003c/a\u003e\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e\u003cp\u003eGet started with \u003ca href=\"https://support.claude.com/en/articles/13345190-get-started-with-claude-cowork\"\u003eClaude Cowork\u003c/a\u003e today.\u003c/p\u003e",
      "summary": "We share how to get started with Claude Cowork, from setting up the tool to kicking off your first task.",
      "date_published": "0001-01-01T00:00:00Z",
      "tags": [
        "Enterprise AI"
      ]
    },
    {
      "id": "/blog/how-anthropic-uses-claude-gtm-engineering",
      "url": "https://claude.com/blog/how-anthropic-uses-claude-gtm-engineering",
      "title": "How one Anthropic seller rebuilt his team's workflows with Claude Code",
      "content_html": "\u003cp\u003eBefore joining Anthropic in 2024, Jared Sires had never written a line of code. And why would he? He was a startup account executive. \u003c/p\u003e\u003cp\u003eAs is often the case at fast-growing companies, Jared’s book quickly grew to 600 or 700 accounts. With 10 to 15 customer calls a day and an expanding account list, Jared found himself at his desk, answering customer emails until 9 or 10 p.m. every night. \u003c/p\u003e\u003cp\u003e\u0026#34;It was almost impossible to manage my inbox,\u0026#34; he says. \u0026#34;And doing outbound on top of that, you don’t really know where to focus.\u0026#34;\u003c/p\u003e\u003cp\u003eJared turned to Claude Code for help. With no coding experience, he created CLAFTS, short for \u003cem\u003eClaude Drafts\u003c/em\u003e: an application that lives inside Gmail and uses the Claude API to draft replies to customer emails. It took multiple iterations, but eventually Jared estimates CLAFTS was saving him two to three hours a day. He shared it in Slack the next morning, and within 24 hours, others from the sales organization had started using it with similar results. \u003c/p\u003e\u003cp\u003eThat led to a shift in Jared’s role at Anthropic: today, he is product manager of the go-to-market team, a role focused exclusively on identifying problems in how the Anthropic sales organization operates and building Claude-powered solutions to fix them. In this new role, Jared has built tools that automate customer communications, research customer backgrounds before calls, and generate follow-up emails from meeting transcripts. He then packages them as a plugin inside \u003ca href=\"https://claude.com/product/cowork\" target=\"_blank\"\u003eClaude Cowork\u003c/a\u003e so the whole sales team can use them. \u003c/p\u003e\u003cp\u003eHe describes the shift in his career as “the most empowering thing I’ve ever experienced.”\u003c/p\u003e\u003cp\u003eHere\u0026#39;s how Jared used Claude to handle his inbox at scale, what he\u0026#39;s building next, and best practices GTM teams can take from his approach.\u003c/p\u003e\u003cdiv class=\"w-embed w-iframe\"\u003e\u003cdiv style=\"position:relative;padding-bottom:56.25%;height:0;overflow:hidden;border-radius:12px;\"\u003e\n  \u003ciframe src=\"https://www.youtube.com/embed/n4ZxEznNaIY\" style=\"position:absolute;top:0;left:0;width:100%;height:100%;border:0;\" title=\"How Anthropic uses Claude in GTM Engineering\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen=\"\"\u003e\u003c/iframe\u003e\n\u003c/div\u003e\u003c/div\u003e\u003ch2\u003e\u003cstrong\u003eA sales rep buried in administrative tasks\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eEmail volume wasn’t the only challenge Jared faced as an account executive. Anthropic ships product changes every 24 to 48 hours, and customer questions tend to land on the most recent details: batch API SLAs, prompt caching discounts, model pricing, SDK behavior. Answering them well meant searching across Slack, Google Docs, internal knowledge bases, and the developer documentation—and doing it again day after day, with a slightly different set of facts.\u003c/p\u003e\u003cp\u003e\u0026#34;Having to relay technical documentation to customers is pretty hard, especially here at Anthropic when your products evolve so quickly,\u0026#34; Jared says.\u003c/p\u003e\u003cp\u003eOne of his first experiments with Claude was narrow and practical. Using Apps Script (Google\u0026#39;s lightweight development platform) and Claude, Jared pulled product usage data from internal systems and had Claude rank his accounts each morning by how fast they were growing. \u003c/p\u003e\u003cp\u003e\u0026#34;Each day Claude would give me a brief on who I needed to focus on based on how much they were using,\u0026#34; he says. With 700 accounts in his book, the daily brief helped him determine where to spend his outbound time.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eDeveloping CLAFTS\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eThe harder challenge was Jared’s inbox and the late hours he had to spend responding to emails. \u003c/p\u003e\u003cp\u003eWith Claude Code, he started to develop a system that drafts replies to customer emails in his voice. \u0026#34;Claude Code, having the terminology \u0026#39;code\u0026#39; at the end of it, made me feel a little bit intimidated just to even start,\u0026#34; he says. \u0026#34;But after a certain time frame, I understood the power of it being able to hook up to my computer and answer things about files on it.\u0026#34;\u003c/p\u003e\u003cp\u003eCLAFTS is built on roughly 4,300 lines of code, almost all of it written by Claude Code. It pulls context from a shared Google Drive folder and other third-party tools, references Anthropic\u0026#39;s public documentation through web search, and matches Jared\u0026#39;s writing style. By the time he opens his drafts folder at the end of the day, the responses are waiting for review.\u003c/p\u003e\u003cp\u003eWhenever Anthropic ships a product change, the documentation reflects it and Claude picks up the change on the next draft. \u0026#34;Claude is able to use web search to understand our latest documentation from our website and reference that material when generating emails,\u0026#34; Jared says. \u0026#34;I don\u0026#39;t need to keep all of that in my head.\u0026#34;\u003c/p\u003e\u003cp\u003eOut of the box, Claude’s writing tended to be longer and heavier on hedging phrases, so Jared reworked the system prompt until the drafts matched his own writing style. \u0026#34;I\u0026#39;ve probably gone through hundreds of iterations with CLAFTS in the system prompt to generate different pieces of writing for me,\u0026#34; he says.\u003c/p\u003e\u003cp\u003eNext, he developed the CLAFTS Tones feature, which uses pattern matching to mimic his voice across different relationships. Customers, peers, and family threads all read differently, and the drafts adjust to each.\u003c/p\u003e\u003cp\u003eJared tested the feature by writing himself a sequence of increasingly angry emails on his personal account. Claude picked up the tone, then refused to keep going.\u003c/p\u003e\u003cp\u003e\u0026#34;Claude started to mimic that, and at some point I started to have refusals because Claude didn\u0026#39;t want to generate angry emails to customers,\u0026#34; he says. \u0026#34;That was when I knew CLAFTS Tones was working.\u0026#34;\u003c/p\u003e\u003ch3\u003e\u003cstrong\u003eMeasuring the impact of CLAFTS\u003c/strong\u003e\u003c/h3\u003e\u003cp\u003eWhile CLAFTS saves Jared 10-15 hours per week, the shift he cares about most is the accuracy of the work. With Claude pulling current product details from documentation on every draft, the answers customers receive are tied to whatever shipped most recently rather than to whatever Jared happened to remember.\u003c/p\u003e\u003cp\u003e\u0026#34;Before CLAFTS, I felt like I was doing more administrative work than actually spending time with customers,\u0026#34; Jared says. \u0026#34;After CLAFTS, I was actually able to do more of what I wanted to do, which is sales.\u0026#34;\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eScaling Anthropic’s GTM toolkit \u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eBeyond Jared, a BDR regularly working past midnight emailing customers, John Albert, helped co-build Clafts. The rest of the business development team came on board once they saw their teammate was getting hours back each day. From there, they did most of the evangelism themselves.\u003c/p\u003e\u003cp\u003eThe next set of tools Jared built was a set of \u003ca href=\"https://docs.claude.com/en/docs/agents-and-tools/agent-skills/overview\"\u003eskills\u003c/a\u003e bookending his calendar: daily brief and daily recap.\u003c/p\u003e\u003cp\u003eEach morning, the daily brief skill reads his calendar, runs a web search on whoever he\u0026#39;s meeting with, and produces talking points before the first call. The skill connects to Google Calendar and CRM data through \u003ca href=\"https://www.anthropic.com/news/model-context-protocol\"\u003eMCP servers\u003c/a\u003e, pulling relevant information about each customer.\u003c/p\u003e\u003cp\u003eAnd at the end of the day, the daily recap skill pulls from Google Docs and meeting notes to draft follow-up emails, similar to CLAFTS.\u003c/p\u003e\u003cp\u003e\u0026#34;You couple those together and you get Claude managing your daily tasks, which essentially becomes an agent,\u0026#34; he says. \u003c/p\u003e\u003cp\u003eJared’s work now pushes further into agent territory as he’s experimenting with the \u003ca href=\"https://code.claude.com/docs/en/agent-sdk/overview\"\u003eAgent SDK\u003c/a\u003e and chaining workflows where the output of one Claude run feeds the input of the next.\u003c/p\u003e\u003cp\u003eTo make sure the tools he builds with Claude Code scale across the wider team he supports, he ships them with Claude Cowork, packaging skills and MCP connectors into a plugin anyone can install in minutes.\u003c/p\u003e\u003cp\u003eWithin months of launching the Sales plugin, roughly 80 percent of Anthropic’s sales org was using it. The remaining 20 percent are largely new hires, which Jared considers the next challenge to tackle, since the skills were built specifically to help people ramp faster.\u003c/p\u003e\u003cp\u003eBefore the plugin, every new hire used to spend weeks figuring out their own workflow. Now a new hire can install it on day one and have 20-plus skills already wired into the tools they use: Salesforce, Intercom, Gong, Google Calendar, Gmail, Google Drive, and BigQuery.\u003c/p\u003e\u003cp\u003eTwo skills anchor the sales team plugin:\u003c/p\u003e\u003col role=\"list\"\u003e\u003cli\u003e \u003ccode\u003e/customer-context \u003c/code\u003epulls a 360-degree account view across all those sources in about 90 seconds. \u003c/li\u003e\u003cli\u003e\u003ccode\u003e/pipeline-management\u003c/code\u003e surfaces at-risk deals, forecasting guidance, and progression recommendations.\u003c/li\u003e\u003c/ol\u003e\u003cp\u003eThe package also integrates with Cowork\u0026#39;s scheduling feature, which lets reps queue skills to run automatically.\u003c/p\u003e\u003cp\u003e\u0026#34;Our sales people can get back to having meaningful conversations instead of going to update all these different applications,” he says.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eArchitecting what’s next \u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eJared’s role has changed alongside the tooling. As a GTM Architect, he now sits in design conversations with product engineers and helps shape new tools for Anthropic’s sales team. \u003c/p\u003e\u003cp\u003e\u0026#34;I feel like with the technical barrier dissolving, I\u0026#39;m almost able to design more products and have senior engineers help me implement to the final stretch,\u0026#34; Jared says. \u0026#34;I\u0026#39;m able to augment and do more things.\u0026#34;\u003c/p\u003e\u003cp\u003eFor sellers wondering whether they could build something similar, his advice is to open Claude Code, find one task that\u0026#39;s slowing them down, and ask Claude how to build a solution for it.\u003c/p\u003e\u003cp\u003e\u0026#34;If you told me I was going to be a go-to-market product manager at Anthropic a year ago, I would be pretty surprised,\u0026#34; he says. \u0026#34;I never had the technical chops to be in these conversations. With Claude, I\u0026#39;m able to design and build things that don’t just improve my own day-to-day workflows, but also those of my broader team. I have space to work more creatively and strategically, and there’s no turning back.\u0026#34;\u003c/p\u003e\u003cp\u003e\u003cem\u003eGet started with \u003c/em\u003e\u003ca href=\"https://claude.ai/\" target=\"_blank\"\u003e\u003cem\u003eClaude\u003c/em\u003e\u003c/a\u003e\u003cem\u003e today. Stay tuned for more stories in the \u0026#34;How Anthropic uses Claude\u0026#34; series.\u003c/em\u003e\u003c/p\u003e",
      "summary": "How one Anthropic account executive turned GTM product manager used Claude Code to save 10-15 hours per week on email and pre-call research.",
      "date_published": "0001-01-01T00:00:00Z",
      "tags": [
        "Claude Code"
      ]
    },
    {
      "id": "/blog/how-anthropic-enables-self-service-data-analytics-with-claude",
      "url": "https://claude.com/blog/how-anthropic-enables-self-service-data-analytics-with-claude",
      "title": "How Anthropic enables self-service data analytics with Claude",
      "content_html": "\u003cp\u003eAs many data science and data engineering teams can attest, enabling self-service business analytics has traditionally been a slog. \u003c/p\u003e\u003cp\u003eMaking the data model more accessible to less technical coworkers via wide and denormalized tables often leads to overlapping views with inconsistent definitions as the business scales (and does little to bridge the gap for employees with little desire to learn SQL). Alternatively, creating more ringfenced environments for users often misses the long tail of business questions and leads to metric and dashboard bloat as teams silo their work.\u003c/p\u003e\u003cp\u003eThe rise of LLMs provides an additional path for self-service analytics that avoids those challenges. However, pointing Claude at a warehouse and letting the agents execute can create a false sense of precision. \u003c/p\u003e\u003cp\u003eThe initial elation of liberation from ad-hoc requests turns into dread with the realization that this setup separates stakeholders from the underlying infrastructure, documentation, and expertise that previously steered them toward carefully curated datasets. \u003c/p\u003e\u003cp\u003eAt Anthropic, 95% of business analytics queries are automated via Claude, with ~95% accuracy in aggregate. By giving this often rote, repetitive work to Claude, our data science team can focus on more strategic work like causal modeling, forecasting, and machine learning. \u003c/p\u003e\u003cp\u003eAfter meeting with dozens of Anthropic’s top Claude Code users and having seen myriad design patterns for analytics agents, we’ve cultivated some best practices for other data teams working with LLMs. In this post, we’ll share these tips and approaches to maximizing Claude’s ability to drive self-serve business insights, including:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003eWhy analytics accuracy is a context and verification problem, not a code generation issue;\u003c/li\u003e\u003cli\u003eThe three failure modes that cause most errors; \u003c/li\u003e\u003cli\u003eThe agentic analytics stack we built to address these errors;\u003c/li\u003e\u003cli\u003eHow we measure effectiveness; and\u003c/li\u003e\u003cli\u003eA basic template for how we create the majority of our skills (see the appendix)\u003c/li\u003e\u003c/ul\u003e\u003ch2\u003e\u003cstrong\u003eData is not software\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eLLMs\u0026#39; generative abilities are a double-edged sword: the mechanisms that enable creative solutions to complex problems can also hallucinate erroneous output. To fully understand the challenges with analytics agents, it’s useful to compare them to coding agents.\u003c/p\u003e\u003cp\u003eCoding is an open-ended solution space that rewards the models\u0026#39; creativity, while documentation and tests provide natural guardrails against hallucination. In contrast, for analytics use cases, there’s often only a single correct answer using a single correct source in which there’s no deterministic way of proving the correctness. \u003c/p\u003e\u003cfigure style=\"max-width:698pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a20480bedac32484c00d6b9_4a7645b6.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eFor self-service agentic business analytics, the complexity mainly lies in the ambiguity of the data. The central problem comes down to our\u003cstrong\u003e\u003cem\u003e ability to map a user’s question to specific and up-to-date entities in our data model and know the correct way of working with them\u003c/em\u003e\u003c/strong\u003e. If we can do that, then the resulting execution and SQL becomes trivial.\u003c/p\u003e\u003cp\u003eWe’ve identified three attributes of this problem that account for an overwhelming majority of inaccurate responses:\u003c/p\u003e\u003col role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eConcept \u0026lt;\u0026gt; entity ambiguity\u003c/strong\u003e: with hundreds of viable options in a data model (out of potentially millions of fields), the agent is unable to choose the correct fields that best answer a user’s question. For example, in measuring the number of active users: what actions constitute being “active”? Do you include fraudulent users? What lookback window do you use?\u003c/li\u003e\u003c/ol\u003e\u003col start=\"2\" role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eData staleness\u003c/strong\u003e: data sources, business definitions, and schemas change constantly; assets and agent knowledge go stale and start returning subtly wrong answers.\u003c/li\u003e\u003c/ol\u003e\u003col start=\"3\" role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eRetrieval failure\u003c/strong\u003e: the right information may actually be in the data model and properly annotated, but given the vastness of the search space, the agent simply doesn’t find it.\u003c/li\u003e\u003c/ol\u003e",
      "summary": "Tips and approaches to maximizing Claude’s ability to drive self-serve data insights",
      "date_published": "0001-01-01T00:00:00Z",
      "tags": [
        "Enterprise AI"
      ]
    },
    {
      "id": "/blog/lessons-from-building-claude-code-how-we-use-skills",
      "url": "https://claude.com/blog/lessons-from-building-claude-code-how-we-use-skills",
      "title": "Lessons from building Claude Code: How we use skills",
      "content_html": "\u003cp\u003eSkills have become one of the most used extension points in Claude Code. They’re flexible, easy to make, and easy to distribute.\u003c/p\u003e\u003cp\u003eBut this flexibility also makes it hard to know what works best. What type of skills are worth making? How do you structure a skill? When do you share them with others?\u003c/p\u003e\u003cp\u003eWe\u0026#39;ve been using skills in Claude Code extensively at Anthropic with hundreds of them in active use. These are the lessons we\u0026#39;ve learned about using skills to accelerate our development.\u003c/p\u003e\u003ch2\u003eWhat are skills?\u003c/h2\u003e\u003cp\u003eSkills are folders of instructions, scripts, and resources that agents can discover and use to do things more accurately and efficiently. This blog post assumes familiarity with skills basics; if you’re new, start with our \u003ca href=\"https://anthropic.skilljar.com/introduction-to-agent-skills\"\u003eIntroduction to agent skills course on Skilljar\u003c/a\u003e.\u003c/p\u003e\u003cp\u003eA common misconception we hear about skills is that they are “just markdown files.” They’re actually folders that can include scripts, assets, data, etc. that the agent can discover, explore and manipulate. \u003c/p\u003e\u003cp\u003eIn Claude Code, skills also have a \u003ca href=\"https://code.claude.com/docs/en/skills#frontmatter-reference\"\u003ewide variety of configuration options\u003c/a\u003e including registering dynamic hooks.\u003c/p\u003e\u003cp\u003eWe’ve found that some of the most effective skills in Claude Code use these configuration options and folder structure effectively. \u003c/p\u003e",
      "summary": "What we learned building and scaling hundreds of skills internally at Anthropic.",
      "date_published": "0001-01-01T00:00:00Z",
      "tags": [
        "Claude Code"
      ]
    },
    {
      "id": "/blog/best-practices-for-getting-started-with-claude-cowork",
      "url": "https://claude.com/blog/best-practices-for-getting-started-with-claude-cowork",
      "title": "Best practices for getting started with Claude Cowork",
      "content_html": "\u003cdiv class=\"w-embed\"\u003e\u003cstyle\u003e\n  /* Fluid breakout for blog embeds + code blocks */\n  .u-rich-text-blog .w-embed,\n  .u-rich-text-blog pre.w-code-block {\n    --max-w: 860px;\n    --gutter: 24px;\n    --available: calc(100vw - (var(--gutter) * 2));\n    --w: min(var(--max-w), var(--available));\n    width: var(--w);\n    max-width: var(--w);\n    margin-left: calc((640px - var(--w)) / 2);\n    margin-right: calc((640px - var(--w)) / 2);\n    box-sizing: border-box;\n  }\n  @media (max-width: 720px) {\n    .u-rich-text-blog .w-embed,\n    .u-rich-text-blog pre.w-code-block {\n      width: 100%;\n      max-width: 100%;\n      margin-left: 0;\n      margin-right: 0;\n    }\n    /* Constrain the post column to the viewport so nothing overflows the page */\n    .blog_post_layout.u-column-custom,\n    .blog_post_content_wrap,\n    .u-rich-text-blog {\n      max-width: 100% !important;\n      box-sizing: border-box;\n    }\n    html,\n    body {\n      overflow-x: hidden;\n    }\n  }\n  /* Embed inner wrappers: scroll horizontally when content overflows */\n  .u-rich-text-blog .w-embed figure {\n    width: 100% !important;\n    max-width: 100% !important;\n    margin: 0 !important;\n  }\n  .u-rich-text-blog .w-embed figure \u003e div {\n    width: 100% !important;\n    max-width: 100% !important;\n    overflow-x: auto !important;\n    -webkit-overflow-scrolling: touch;\n  }\n  /* Tables: ratios on wider screens, natural width + scroll on mobile */\n  .u-rich-text-blog .w-embed table {\n    width: 100% !important;\n    table-layout: fixed !important;\n  }\n  .u-rich-text-blog .w-embed table th:nth-child(1),\n  .u-rich-text-blog .w-embed table td:nth-child(1) {\n    width: 22%;\n  }\n  .u-rich-text-blog .w-embed table th:nth-child(2),\n  .u-rich-text-blog .w-embed table td:nth-child(2) {\n    width: 39%;\n  }\n  .u-rich-text-blog .w-embed table th:nth-child(3),\n  .u-rich-text-blog .w-embed table td:nth-child(3) {\n    width: 39%;\n  }\n  .u-rich-text-blog .w-embed td code,\n  .u-rich-text-blog .w-embed th code {\n    overflow-wrap: anywhere;\n    word-break: break-word;\n    white-space: normal;\n  }\n  @media (max-width: 639px) {\n    .u-rich-text-blog .w-embed table {\n      width: auto !important;\n      min-width: 640px !important;\n      table-layout: auto !important;\n    }\n    .u-rich-text-blog .w-embed table th,\n    .u-rich-text-blog .w-embed table td {\n      min-width: 0 !important;\n      width: auto !important;\n    }\n  }\n  /* Code blocks */\n  .u-rich-text-blog pre.w-code-block {\n    overflow-x: auto;\n    -webkit-overflow-scrolling: touch;\n  }\n  @media (max-width: 639px) {\n    .u-rich-text-blog pre.w-code-block {\n      font-size: 0.82rem;\n    }\n  }\n\u003c/style\u003e\u003c/div\u003e\u003cp\u003eIn 2024, we had Claude in a chat window. You asked a question and you got an answer, but it was up to you to turn that answer into something useful. In 2025, Claude Code let engineers ship at a pace that made the rest of us a little jealous. \u003c/p\u003e\u003cp\u003eThis year, we can all catch up with \u003ca href=\"https://claude.com/product/cowork\" target=\"_blank\"\u003eClaude Cowork\u003c/a\u003e.\u003c/p\u003e\u003cp\u003eI started using Claude Code last year for long, multi-step tasks that chat wasn’t equipped to handle. Within a week, I went from not knowing what a terminal was to building out \u003ca href=\"https://claude.com/blog/how-anthropic-uses-claude-marketing\" target=\"_blank\"\u003eClaude Code workflows that completed 30-minute tasks in 30 seconds\u003c/a\u003e. I was using Claude Code for non-technical work because Claude Cowork didn’t exist yet.  \u003c/p\u003e\u003cp\u003eNow, 90% of my work happens in \u003ca href=\"https://claude.com/product/cowork\" target=\"_blank\"\u003eClaude Cowork\u003c/a\u003e. In this post, I\u0026#39;ll show you how to tell which of your tasks belong there, walk through real examples from my own work, and get you to your first finished deliverable in about ten minutes.\u003c/p\u003e\u003ch2\u003eUsing Chat vs Claude Cowork vs Claude Code\u003c/h2\u003e\u003cp\u003eIf your job is non-technical knowledge work–emails, decks, spreadsheets, docs, meetings, and \u0026#34;can you pull together a summary of this”–then Claude Cowork is for you. You don\u0026#39;t need to know how to code. You don\u0026#39;t need to know what an \u0026#34;agent\u0026#34; is or how to build one. \u003c/p\u003e\u003cp\u003eIf you\u0026#39;ve spent the last two years with an AI chat tab open among 100 other tabs and files, copy-pasting prompts or questions into it and copy-pasting the answers back out, you already know how to use Claude Cowork. It\u0026#39;s that, minus the copy-pasting.\u003c/p\u003e\u003cp\u003eThe same Claude models power chat, Claude Cowork, Claude Code, Claude Design, and every other place Claude appears. These are separate workspaces that you use for different types of work, but the same models run inside all. Here’s a framework for how to think about when to use which one: \u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eChat\u003c/strong\u003e is often how knowledge workers get introduced to Claude. You bring what you have to Claude: upload a file, paste some text, describe what\u0026#39;s going on, and get an answer. Chat is for answers, brainstorming, and thinking out loud.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eClaude Cowork\u003c/strong\u003e in the Claude desktop app flips that around. Instead of bringing your work to Claude, you bring Claude to your work. You point it at a folder on your computer, connect it to the apps you already use, and tell it what you want done. With Claude Cowork, you describe an outcome, step away, and come back to finished work.\u003cstrong\u003e‍\u003c/strong\u003e\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eClaude Code\u003c/strong\u003e is made for developers building and shipping software. If your work lives in code, start there.\u003c/li\u003e\u003c/ul\u003e\u003cp\u003eMany people don\u0026#39;t know that Claude Cowork and Claude Code run on the same \u003ca href=\"https://code.claude.com/docs/en/how-claude-code-works\"\u003eengine\u003c/a\u003e under the hood. \u003c/p\u003e\u003ch3\u003eWhen should you use Claude Cowork?\u003c/h3\u003e\u003cp\u003eUnderstanding when to use Claude Cowork vs chat is the spot where most people get stuck, so here\u0026#39;s my rule of thumb:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eUse chat \u003c/strong\u003eif what you want fits in a few exchanges, like a question, an explanation, a brainstorm, or a gut check. \u003c/li\u003e\u003cli\u003e\u003cstrong\u003eUse Claude Cowork\u003c/strong\u003e if what you need is a deliverable, for example, a file someone will open, a deck someone will present, or a spreadsheet to be sorted. Use it for anything that’s multi-step or touches more than one file/file type or more than one app, or that you\u0026#39;d describe as a task rather than a question. With Claude Cowork, you are \u003cem\u003edelegating\u003c/em\u003e work to Claude.\u003c/li\u003e\u003c/ul\u003e\u003cp\u003eA few examples of where the line falls:\u003c/p\u003e\u003cdiv class=\"w-embed\"\u003e\u003cfigure\u003e\n  \u003cdiv role=\"region\" tabindex=\"0\"\u003e\n    \u003ctable\u003e\n      \u003ccolgroup\u003e\n        \u003ccol style=\"width: 70%\"/\u003e\n        \u003ccol style=\"width: 30%\"/\u003e\n      \u003c/colgroup\u003e\n      \u003cthead\u003e\n        \u003ctr\u003e\n          \u003cth scope=\"col\"\u003eSample question or task\u003c/th\u003e\n          \u003cth scope=\"col\"\u003eUse\u003c/th\u003e\n        \u003c/tr\u003e\n      \u003c/thead\u003e\n      \u003ctbody\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eWhat should I cover in our business review meeting?\u003c/td\u003e\n          \u003ctd\u003eChat\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eRead the last three months of meeting notes in this Google Drive folder and build me a QBR deck using our template.\u003c/td\u003e\n          \u003ctd\u003eClaude Cowork\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eHow do I VLOOKUP something?\u003c/td\u003e\n          \u003ctd\u003eChat\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eGo through my spreadsheets and change all the VLOOKUP to INDEX MATCH.\u003c/td\u003e\n          \u003ctd\u003eClaude Cowork\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eSuggest a better title tag and meta description for this page.\u003c/td\u003e\n          \u003ctd\u003eChat\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eUse the new title tags and meta descriptions for these 30 pages from this sheet and update them using the CMS connector.\u003c/td\u003e\n          \u003ctd\u003eClaude Cowork\u003c/td\u003e\n        \u003c/tr\u003e\n      \u003c/tbody\u003e\n    \u003c/table\u003e\n  \u003c/div\u003e\n\u003c/figure\u003e\u003c/div\u003e\u003cp\u003eThe most common mistake is reaching for chat for everything and never feeling the difference Claude Cowork can make. The opposite mistake is handling Claude Cowork one-off questions, then waiting around for something chat would\u0026#39;ve answered in five seconds.\u003c/p\u003e\u003ch3\u003eThe five ingredients of a Claude Cowork-shaped task\u003c/h3\u003e\u003cp\u003eIf you’re not sure what projects to delegate to Claude Cowork when you’re first getting started, run them through this checklist. You don\u0026#39;t need all five criteria, but a good candidate hits a few:\u003c/p\u003e\u003col role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eMore than one thing goes in.\u003c/strong\u003e Multiple files, a whole folder, or a file plus some connectors. If there\u0026#39;s only one input, chat probably handles it fine for the most part (you should still experiment).\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eA file comes out.\u003c/strong\u003e You need a deliverable that you can attach, present, share, or repurpose: a doc, a deck, a spreadsheet, or a CSV. \u003c/li\u003e\u003cli\u003e\u003cstrong\u003eYou\u0026#39;ll do it again.\u003c/strong\u003e One-offs are fine, but recurring tasks are the sweet spot. You can schedule them to run before you\u0026#39;re even at your desk.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eYou already know what good looks like\u003c/strong\u003e. You\u0026#39;re familiar with the shape of the output, so you can tell in 15 seconds whether the output is right, wrong, or 70% there.\u003cstrong\u003e‍\u003c/strong\u003e\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eThe middle is the boring part.\u003c/strong\u003e The thinking lives at the start (deciding what you want) and the end (deciding if it\u0026#39;s right). Everything in between (extract, compile, reconcile, and reformat) is what you hand off.\u003c/li\u003e\u003c/ol\u003e\u003ch2\u003eHow I use Claude Cowork at Anthropic\u003c/h2\u003e\u003cp\u003eI manage growth marketing at Anthropic, so my examples are marketing-flavored. Don\u0026#39;t read these looking for a workflow to copy—that\u0026#39;s not going to be helpful in the long run. Watch how each one hits a few items from the checklist above, because that\u0026#39;s the pattern you\u0026#39;ll be looking for in your own Claude Cowork workflows.\u003c/p\u003e\u003ch3\u003eDaily briefing\u003c/h3\u003e\u003cp\u003eThe number of Slack channels and emails a marketer receives every day can be  overwhelming. I have a \u0026#34;daily briefing\u0026#34; task that runs every morning at 6am. Claude Cowork is connected to my Slack and Gmail, and my prompt tells it to review my unread emails and the channels I care about, sort them into buckets, and produce a short report.\u003c/p\u003e\u003cfigure style=\"max-width:1815pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1e1e027cf5a76278798b40_CleanShot%202026-05-19%20at%2014.56.25.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eThe report gives me a TLDR of what to look into, flagged emails grouped by type, channel summaries, and any overnight product-related incidents that could have impacted marketing. Anyone drowning in Slack and email can run some version of this workflow.\u003c/p\u003e\u003ch3\u003eBudget pacing\u003c/h3\u003e\u003cp\u003ePart of my job includes budget pacing for performance marketing. It\u0026#39;s the kind of work nobody wants because it\u0026#39;s boring and tedious. Many performance marketing teams track daily spend and run rate in Google Sheets to estimate pacing to goal. Either you\u0026#39;re manually exporting daily spend from each channel and pasting it into the sheet, or you\u0026#39;re paying for a third-party tool to extract, transform, and load data for you. \u003c/p\u003e\u003cfigure style=\"max-width:4272pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1e1e995df01f6fda548ac6_CleanShot%202026-05-19%20at%2013.33.32%402x.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eWith Claude Cowork, I connect to Google Ads and Meta Ads and create a live artifact (basically an HTML dashboard) in the desktop app that automatically pulls in my daily spend and calculates pacing for me. I can also just tell Claude in plain English how to filter my campaigns and what to look out for.\u003c/p\u003e\u003cp\u003eRun that against the checklist above: multiple sources in (every channel\u0026#39;s spend), a file out (in this case it\u0026#39;s the dashboard), I rerun it constantly, and the middle is the mindless soul-sucking download-copy-paste grind I absolutely do not want to do myself. Since the ad platforms are integrated through my connectors, I can update this dashboard at any time.\u003c/p\u003e\u003ch3\u003eReporting\u003c/h3\u003e\u003cp\u003eInstead of exporting a pile of CSVs and building pivot tables or combining files manually, I have Claude Cowork connected to Google Search Console. It pulls what I care about (queries, countries, pages) and reconciles it into a single sheet, instead of Google\u0026#39;s default of one CSV per dimension when you export data manually.\u003c/p\u003e\u003cfigure style=\"max-width:3630pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1e1ed16cb3dd91de3b5612_CleanShot%202026-05-19%20at%2013.46.03%402x.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eI also give Claude the context on what to focus on, like looking at the last seven days vs the prior seven, filtering to only specific countries, flagging anything that moved meaningfully, and writing up the report in the template that I want. From there I can go ahead and tweak anything or ask Claude follow up questions.\u003c/p\u003e\u003cfigure style=\"max-width:3630pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1e1f056c0d01abd9a59479_CleanShot%202026-05-19%20at%2013.46.22%402x.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eWith scheduling in Claude Cowork, this runs automatically every week. Reporting used to take me ~30 minutes a week; now it takes five and I  spend them on the part that needs my judgement: filling in missing context and workshopping the callouts.\u003c/p\u003e\u003cp\u003eThese are just some examples of how I use Claude Cowork, but they barely scratch the surface. Check out another article I wrote that highlights a \u003ca href=\"https://www.linkedin.com/feed/update/urn:li:activity:7448056387772833795/\" target=\"_blank\"\u003emore detailed walkthrough\u003c/a\u003e of another complex use case that spans plugins, skills, local MCPs, and \u003ca href=\"https://support.claude.com/en/articles/13947068-assign-tasks-from-anywhere-in-claude-cowork\" target=\"_blank\"\u003eDispatch\u003c/a\u003e for more best practices.\u003c/p\u003e\u003ch2\u003eYour first 10 minutes with Claude Cowork\u003c/h2\u003e\u003cp\u003eFirst time opening the app? Here’s how to get started: \u003c/p\u003e\u003col role=\"list\"\u003e\u003cli\u003eOpen the Claude desktop app and switch to the Claude Cowork tab.\u003c/li\u003e\u003cli\u003eGive Claude something to work with. Drop in a few files, point it at a folder on your computer, or connect an app you frequently use (Slack, Gmail, Notion, CRM, etc). The difference between a mediocre Claude Cowork output and a great one is almost never your prompt, but whether you\u0026#39;re providing enough rich context for Claude to work with.\u003c/li\u003e\u003cli\u003eTell Claude the outcome you want. Describe the deliverable you want at the end and provide any necessary context.\u003c/li\u003e\u003cli\u003eStart with a real task you know well. You\u0026#39;ll see immediately where it\u0026#39;s strong, where it needs context from you, and you already know what \u0026#34;good\u0026#34; looks like for it.\u003c/li\u003e\u003cli\u003eMake Claude ask you questions before it starts. This is the single most useful habit I’ve built. Include this as part of your prompt: \u003cem\u003eBefore we begin, repeat my ask back to me so we\u0026#39;re aligned, then ask me as many clarifying questions as you have.\u003c/em\u003e\u003c/li\u003e\u003c/ol\u003e\u003cp\u003eThis surfaces things you didn\u0026#39;t think to specify, like which time period are we looking at, what does \u0026#34;good\u0026#34; mean here, or what edge cases do you know that Claude doesn\u0026#39;t. The trap is assuming Claude already knows what\u0026#39;s obvious to you. Answering five questions up front costs you 30 seconds. Finding those same gaps afterwards costs you time and tokens, and it\u0026#39;s a pain to fix.\u003c/p\u003e\u003cp\u003eStill not sure what to hand off? Ask Claude. Claude has memory and can search your past conversations, so you can ask it which tasks you do most often and which ones to try in Claude Cowork.\u003c/p\u003e\u003cfigure style=\"max-width:1815pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1e1f566c0d01abd9a5a6a9_CleanShot%202026-05-19%20at%2015.05.52.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003ch3\u003eWhen I still reach for chat\u003c/h3\u003e\u003cp\u003eI still use chat extensively to talk through a positioning problem, pressure-test an idea before I commit to it, or to ask random questions like why my dog keeps licking the bed.\u003c/p\u003e\u003cp\u003eThe point isn\u0026#39;t that chat is the \u0026#34;old\u0026#34; thing. Chat is for when the output is a thought in your head, and Claude Cowork is for when the output is something you’ll hand to someone else.\u003c/p\u003e\u003ch2\u003eGo build something\u003c/h2\u003e\u003cp\u003ePick one repetitive task you do every week, try using\u003ca href=\"https://claude.com/product/cowork\" target=\"_blank\"\u003e Claude Cowork\u003c/a\u003e for it, and see what comes back. The first few tasks might feel a little awkward, but after a few tries you\u0026#39;ll quickly go from \u0026#34;how do I use this\u0026#34; to \u0026#34;what do I hand it next.\u0026#34;\u003c/p\u003e\u003cp\u003e\u003cem\u003eThis article was written by Austin Lau, on the growth team at Anthropic, and expresses his opinions, usage patterns, and advice on Claude Cowork.\u003c/em\u003e\u003c/p\u003e",
      "summary": "When to use Claude Cowork instead of Claude Code or chat, how to decide what workflows to delegate, and concrete steps to get started.",
      "date_published": "0001-01-01T00:00:00Z",
      "tags": [
        "Enterprise AI"
      ]
    },
    {
      "id": "/blog/running-an-ai-native-engineering-org",
      "url": "https://claude.com/blog/running-an-ai-native-engineering-org",
      "title": "Running an AI-native engineering org",
      "content_html": "\u003cdiv class=\"w-embed\"\u003e\u003cstyle\u003e\n  /* Fluid breakout for blog embeds + code blocks */\n  .u-rich-text-blog .w-embed,\n  .u-rich-text-blog pre.w-code-block {\n    --max-w: 860px;\n    --gutter: 24px;\n    --available: calc(100vw - (var(--gutter) * 2));\n    --w: min(var(--max-w), var(--available));\n    width: var(--w);\n    max-width: var(--w);\n    margin-left: calc((640px - var(--w)) / 2);\n    margin-right: calc((640px - var(--w)) / 2);\n    box-sizing: border-box;\n  }\n  @media (max-width: 720px) {\n    .u-rich-text-blog .w-embed,\n    .u-rich-text-blog pre.w-code-block {\n      width: 100%;\n      max-width: 100%;\n      margin-left: 0;\n      margin-right: 0;\n    }\n    /* Constrain the post column to the viewport so nothing overflows the page */\n    .blog_post_layout.u-column-custom,\n    .blog_post_content_wrap,\n    .u-rich-text-blog {\n      max-width: 100% !important;\n      box-sizing: border-box;\n    }\n    html,\n    body {\n      overflow-x: hidden;\n    }\n  }\n  /* Embed inner wrappers: scroll horizontally when content overflows */\n  .u-rich-text-blog .w-embed figure {\n    width: 100% !important;\n    max-width: 100% !important;\n    margin: 0 !important;\n  }\n  .u-rich-text-blog .w-embed figure \u003e div {\n    width: 100% !important;\n    max-width: 100% !important;\n    overflow-x: auto !important;\n    -webkit-overflow-scrolling: touch;\n  }\n  /* Tables: ratios on wider screens, natural width + scroll on mobile */\n  .u-rich-text-blog .w-embed table {\n    width: 100% !important;\n    table-layout: fixed !important;\n  }\n  .u-rich-text-blog .w-embed table th:nth-child(1),\n  .u-rich-text-blog .w-embed table td:nth-child(1) {\n    width: 22%;\n  }\n  .u-rich-text-blog .w-embed table th:nth-child(2),\n  .u-rich-text-blog .w-embed table td:nth-child(2) {\n    width: 39%;\n  }\n  .u-rich-text-blog .w-embed table th:nth-child(3),\n  .u-rich-text-blog .w-embed table td:nth-child(3) {\n    width: 39%;\n  }\n  .u-rich-text-blog .w-embed td code,\n  .u-rich-text-blog .w-embed th code {\n    overflow-wrap: anywhere;\n    word-break: break-word;\n    white-space: normal;\n  }\n  @media (max-width: 639px) {\n    .u-rich-text-blog .w-embed table {\n      width: auto !important;\n      min-width: 640px !important;\n      table-layout: auto !important;\n    }\n    .u-rich-text-blog .w-embed table th,\n    .u-rich-text-blog .w-embed table td {\n      min-width: 0 !important;\n      width: auto !important;\n    }\n  }\n  /* Code blocks */\n  .u-rich-text-blog pre.w-code-block {\n    overflow-x: auto;\n    -webkit-overflow-scrolling: touch;\n  }\n  @media (max-width: 639px) {\n    .u-rich-text-blog pre.w-code-block {\n      font-size: 0.82rem;\n    }\n  }\n\u003c/style\u003e\u003c/div\u003e\u003cfigure style=\"padding-bottom:56.206088992974244%\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-video\"\u003e\u003cdiv\u003e\u003ciframe src=\"https://www.youtube.com/embed/igO8iyca2_g?start=433\" title=\"Running an AI-native engineering org\" scrolling=\"no\" frameborder=\"0\" allowfullscreen=\"true\"\u003e\u003c/iframe\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eFor years, engineering bandwidth was the expensive part of building applications. Every process we used to have around software planning and shipping, first waterfall and then agile, was built around that cost. \u003c/p\u003e\u003cp\u003eI started my career in the early 2000s working on Visual Studio. In those days we shipped software on CD-ROMs with hard manufacturing deadlines. Once we could distribute software online, we began increasing to shipping updates continuously. Now we’re changing the way we work again, this time around the time and people it takes to write software. \u003c/p\u003e\u003cp\u003eOn the Claude Code team, writing code, writing tests, and refactoring rarely slows us down anymore. But the bottlenecks didn’t go away when agentic coding took away the actual need to type code. Verification, code review, and security took their place.\u003c/p\u003e\u003cp\u003eWe can all generate a lot of code really fast now, but this also brings up new questions: Is this code correct? How is it maintained? And one of the top questions I get from fellow engineering leaders: “How are humans keeping up with how you’re doing code reviews?”\u003cem\u003e \u003c/em\u003e\u003c/p\u003e\u003ch2\u003eThe processes that quietly stopped working\u003c/h2\u003e\u003cp\u003eWe all put processes in place for a reason, to close a gap or make something work better. But when that gap no longer exists and those processes become obsolete, they rarely go away on their own. When the Claude Code team began using agentic coding as our default way of working, a lot of our existing processes stopped working. Here are the norms we rewrote, and why. \u003c/p\u003e\u003ch3\u003ePlanning: shift roadmaps to just in time\u003c/h3\u003e\u003cp\u003eThe old norm was to spend a lot more time pre-planning because coding time was expensive. When I first joined the Claude Code team, we wrote a pretty good six month roadmap, and then \u003cem\u003ebecause\u003c/em\u003e of Claude Code, so many things changed that it was out of date by month three. \u003c/p\u003e\u003cp\u003eEngineering speed and throughput is different now, so the way we plan sprints has changed. I call it just-in-time (JIT) planning, almost like JIT compiling: how do you do just the right amount at the right time? Our planning ritual shifted away from design docs toward discussions in PRs or prototypes. The space moves fast so we don’t do a lot of product reviews. Our process now is let\u0026#39;s prototype, get a lot of internal users on it, and start acting on their feedback.\u003c/p\u003e\u003ch3\u003eContext gathering: ask Claude, not the author\u003c/h3\u003e\u003cp\u003eWhen engineers wrote code, the first step to getting an answer to most questions was to find the person who wrote the code. Now, since all our PRs are assisted by Claude, \u0026#34;Who made this change?\u0026#34; is no longer sufficient. Our new norm is to go a level deeper: what do you actually need to know? For instance: Are you looking for who caused a regression? An expert to answer a customer question? Or context on a decision? You ask Claude that question, and consider whether Claude can answer it directly, also with more data and context.\u003c/p\u003e\u003cp\u003eOn the Claude Code team, no matter what that question is, our process is to also ask “Is there a way to automate it?” For example, having Claude summarize customer feedback channels every morning went from a ritual I did manually with my coffee to something I just have running automatically in the background.\u003c/p\u003e\u003ch3\u003eCode review: trust but verify\u003c/h3\u003e\u003cp\u003eWe use \u003ca href=\"https://code.claude.com/docs/en/code-review\" target=\"_blank\"\u003eCode Review\u003c/a\u003e heavily. Claude handles all the style and linting, PR feedback requests, catching bugs and fixing them before a full commit, and adding tests. Where we still definitely want a human is expertise. \u003c/p\u003e\u003cp\u003eThe new norm is human review where it matters: for legal review, I always want my legal partner involved in risk tolerance. For trust boundaries and security-sensitive code, I want the domain experts. Product managers and designers also need to be involved with product sense and taste. \u003c/p\u003e\u003cp\u003eIt’s important to continually evaluate, though, because the right balance of trust vs. verify will keep changing as the models improve. What you need humans for today might look different with the next model.\u003c/p\u003e\u003ch3\u003eTeam makeup: blurring roles\u003c/h3\u003e\u003cp\u003eClaude and AI have reshaped roles across the team. Our PMs code a lot now, which is fun to see. With Claude, you have nontraditional coders now being able to do more engineering, and you have engineers who take on things like content and design, work that were traditionally not on the technical side. \u003c/p\u003e\u003cp\u003eOn the Claude Code engineering team, I’ve indexed heavily on two profiles. One is creative builders with product sense: the dreamers who are deeply curious and passionate about shipping products that solve problems. The other one is engineers with deep systems expertise. For example, when I joined the team, I noticed we were missing experts with systems backgrounds and we needed that when building \u003ca href=\"https://www.anthropic.com/news/claude-code-on-the-web\" target=\"_blank\"\u003eClaude Code on the Web\u003c/a\u003e, to ensure we can run Claude everywhere. \u003c/p\u003e\u003cp\u003eWhat I index on less, on the other hand, is raw throughput; the models handle that. The more important question is where you still need human expertise, and that’s where I’d focus.\u003c/p\u003e\u003cdiv class=\"w-embed\"\u003e\u003cfigure\u003e\n  \u003cdiv role=\"region\" tabindex=\"0\"\u003e\n    \u003ctable\u003e\n      \u003cthead\u003e\n        \u003ctr\u003e\n          \u003cth scope=\"col\"\u003e\u003c/th\u003e\n          \u003cth scope=\"col\"\u003eBefore\u003c/th\u003e\n          \u003cth scope=\"col\"\u003eAfter\u003c/th\u003e\n        \u003c/tr\u003e\n      \u003c/thead\u003e\n      \u003ctbody\u003e\n        \u003ctr\u003e\n          \u003cth scope=\"row\"\u003ePlanning\u003c/th\u003e\n          \u003ctd\u003eSix-month product roadmaps.\u003c/td\u003e\n          \u003ctd\u003eJust-in-time (JIT) planning: prototype, put internal users on it, and act on their feedback.\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003cth scope=\"row\"\u003eContext gathering\u003c/th\u003e\n          \u003ctd\u003eFind the person who wrote the code and ask them.\u003c/td\u003e\n          \u003ctd\u003eAsk Claude first. Then ask whether what you are asking about can be automated.\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003cth scope=\"row\"\u003eCode review\u003c/th\u003e\n          \u003ctd\u003eHumans review everything.\u003c/td\u003e\n          \u003ctd\u003eClaude handles style, bugs, and tests. Humans review where domain expertise is important.\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003cth scope=\"row\"\u003eTeam makeup\u003c/th\u003e\n          \u003ctd\u003eFixed roles: engineers write code, PMs plan, designers design.\u003c/td\u003e\n          \u003ctd\u003eRoles blur: PMs prototype, engineers take on design and context. Hire for creative builders and deep systems expertise.\u003c/td\u003e\n        \u003c/tr\u003e\n      \u003c/tbody\u003e\n    \u003c/table\u003e\n  \u003c/div\u003e\n\u003c/figure\u003e\u003c/div\u003e\u003ch2\u003eHow we rolled out our new norms\u003c/h2\u003e\u003cp\u003eAs these norms changed, some aspects were mandated as team principles and others we let small sub-teams (pods) figure out on their own. There is a set of the Claude Code core team principles that are non-negotiable “must dos”:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eRelentlessly dogfood your product:\u003c/strong\u003e Every Claude Code team member, including cross-functional partners, uses Claude Code (and also Claude Cowork). We’re always thinking of ways to get Claude to help us do our work faster, and more efficiently. \u003c/li\u003e\u003cli\u003e\u003cstrong\u003eKeep the team flat as possible.\u003c/strong\u003e When I joined Claude Code I wanted every manager to start out as an IC first, learn how to be an effective engineer on the team by shipping, and really live through and understand what it’s like to be an engineer at Anthropic. We have one overall team mission on Claude Code and Claude Cowork. Managers support pods of work while keeping the team agile so people can move to where the work is.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eDon’t  hesitate to kill processes that no longer work:\u003c/strong\u003e Finally, we relentlessly question why we do things the way we do. When something doesn’t make sense anymore, team members have explicit permission to question and kill old processes. \u003c/li\u003e\u003c/ul\u003e\u003cp\u003eWithin these few rules, though, each pod has a lot of agency. They have room to adapt how they use Claude to do triage, how they run any planning rituals or standups, and which workflows get “Claudified” first. \u003c/p\u003e\u003ch2\u003eHow to know your new processes are sticking\u003c/h2\u003e\u003cp\u003eHere are three numbers every engineering leader should start tracking now as they roll out changes.\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eOnboarding ramp time goes down:\u003c/strong\u003e How soon can an engineer, a designer, or a PM start being effective? On our team this is much faster than a year ago, and engineers ship real code now within their first week.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003ePR cycle time goes down:\u003c/strong\u003e This one\u0026#39;s interesting to dig into because it might help you identify where your pipeline is struggling to scale. As we’re generating so much more code, sometimes build systems and continuous integration (CI) may struggle to keep up.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eClaude-assisted commits going up\u003c/strong\u003e: For us, by default, every commit is Claude-assisted. I don\u0026#39;t think I\u0026#39;ve seen a non-Claude-assisted commit in the last four months.\u003c/li\u003e\u003c/ul\u003e\u003cp\u003eOn the third bullet, don\u0026#39;t confuse throughput with success. Throughput is one metric, but the real metric is measuring the thing you\u0026#39;re trying to solve. With the right alignment, throughput can help you solve problems faster.\u003c/p\u003e\u003ch2\u003eGetting started\u003c/h2\u003e\u003cp\u003eIf I were to leave you with one thing: \u003cstrong\u003epick your noisiest workflow.\u003c/strong\u003e That could be your most expensive workflow, the one you might be dreading, or that your team doesn\u0026#39;t look forward to. And ask: is it still serving its purpose? If so, can you automate it? \u003c/p\u003e\u003cp\u003eI was once on a team that had an expensive weekly review, with a large number of  people in a meeting room. I noticed everybody was on their laptops except when it was their time to give a status report. They would pop their head up, say the status, and go back down to their laptops. I asked one simple question: “Why are we having this meeting again? It seems like an expensive use of our time.” And just that one question made everyone realize it wasn’t needed. So we canceled it.\u003c/p\u003e\u003cp\u003eSo, ask yourself: what\u0026#39;s one piece of your engineering workflow that you might consider automating or even dropping altogether? \u003c/p\u003e\u003cp\u003e‍\u003c/p\u003e",
      "summary": "How the Claude Code engineering team’s processes and structure changed once agentic coding became the default way of working.",
      "date_published": "0001-01-01T00:00:00Z",
      "tags": [
        "Claude Code"
      ]
    },
    {
      "id": "/blog/a-harness-for-every-task-dynamic-workflows-in-claude-code",
      "url": "https://claude.com/blog/a-harness-for-every-task-dynamic-workflows-in-claude-code",
      "title": "A harness for every task: dynamic workflows in Claude Code",
      "content_html": "\u003cp\u003eLast week, we released \u003ca href=\"https://code.claude.com/docs/en/workflows\"\u003edynamic workflows\u003c/a\u003e in Claude Code. Claude can now write its own  \u003ca href=\"https://code.claude.com/docs/en/glossary#agentic-harness\"\u003eharness\u003c/a\u003e on the fly, custom-built for the task at hand.\u003c/p\u003e\u003cp\u003eWhile the default Claude Code harness is built for coding, it is also useful for many other types of tasks because, as it turns out, many tasks resemble coding tasks. But there are certain classes of tasks where we have had to build custom harnesses on top of Claude Code to achieve peak performance such as \u003ca href=\"https://support.claude.com/en/articles/11088861-using-research-on-claude\"\u003eResearch\u003c/a\u003e, \u003ca href=\"https://support.claude.com/en/articles/11932705-automated-security-reviews-in-claude-code\"\u003esecurity analysis\u003c/a\u003e, \u003ca href=\"https://code.claude.com/docs/en/agent-teams\"\u003eagent teams\u003c/a\u003e, or \u003ca href=\"https://code.claude.com/docs/en/code-review\"\u003eCode Review\u003c/a\u003e.\u003c/p\u003e\u003cp\u003eWorkflows allow you to dynamically create harnesses built on top of Claude Code that enable Claude to solve all of those problems more natively. You can also share and reuse these workflows with others.\u003c/p\u003e\u003cp\u003eIn this article, I’ll cover my initial workflows experiences and learnings so you can best take full advantage. Keep in mind, best practices are still developing: dynamic workflows often use more tokens and are best suited for complex, high value tasks.\u003c/p\u003e\u003ch2\u003eExample prompts\u003c/h2\u003e\u003cp\u003eBefore diving into the technical details, I’d like to start with several example prompts to get you thinking about the possibilities with workflows:\u003c/p\u003e\u003cp\u003e\u0026#34;This test fails maybe 1 in 50 runs. Set up a workflow to reproduce it. Form competing theories about the race, and don\u0026#39;t stop until one theory survives the evidence.\u0026#34; \u003c/p\u003e\u003cp\u003e\u0026#34;Using a workflow, go through my last 50 sessions and mine them for corrections I keep making and turn the recurring ones into \u003ccode\u003eCLAUDE.md\u003c/code\u003e rules\u0026#34;\u003c/p\u003e\u003cp\u003e“Use a workflow to dig through #incidents in Slack for the past six months and find recurring root causes where nobody has filed a ticket.\u0026#34; \u003c/p\u003e\u003cp\u003e\u0026#34;Take my business plan and run a workflow where different agents tear it apart from an investor\u0026#39;s, a customer\u0026#39;s, and a competitor\u0026#39;s perspective.\u0026#34; \u003c/p\u003e\u003cp\u003e\u0026#34;Here\u0026#39;s a folder of 80 resumes, use a workflow to rank them for the backend role and double-check the top ten. Interview me using the AskUserQuestion tool for a rubric.\u0026#34;\u003c/p\u003e\u003cp\u003e\u0026#34;I need a name for this CLI tool. Use a workflow to brainstorm a bunch of options and run a tournament to pick the top 3.\u0026#34; \u003c/p\u003e\u003cp\u003e\u0026#34;Use a workflow to rename our User model to Account everywhere.\u0026#34; \u003c/p\u003e\u003cp\u003e“Go through my blog post draft and verify every technical claim against the codebase using a workflow, I don\u0026#39;t want to ship anything wrong.\u0026#34;\u003c/p\u003e\u003ch2\u003eHow dynamic workflows work\u003c/h2\u003e\u003cp\u003eDynamic workflows execute a javascript file with a few special functions that help spawn and coordinate \u003ca href=\"https://code.claude.com/docs/en/sub-agents\"\u003esubagents\u003c/a\u003e:\u003c/p\u003e\u003cfigure style=\"max-width:1760pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1f1684f559cc83ff4b465b_image1.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eDynamic workflows also include standard JavaScript functions like JSON, Math, and Array, to help process data.\u003c/p\u003e\u003cp\u003eIt’s particularly useful to know that dynamic workflows can decide which models an agent uses and whether subagents are run in their own worktree, allowing Claude to choose the intelligence level and isolation needed.\u003c/p\u003e\u003cp\u003eIf a workflow is interrupted, for example by user action or quitting the terminal, resuming the session will allow the workflow to pick up where it left off.\u003c/p\u003e\u003ch2\u003eWhy dynamic workflows \u003c/h2\u003e\u003cp\u003eWhen you ask the default Claude Code harness to do a task, it needs to both plan and execute in the same context window. For many coding tasks, this is highly effective, but it can break down over long-running, massively parallel, highly structured and/or adversarial tasks.\u003c/p\u003e\u003cp\u003eThis is because the longer Claude works on a complex task in a single context window, the more it becomes susceptible to a few specific failure modes:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eAgentic laziness\u003c/strong\u003e refers to when Claude stops before finishing a particularly complex, multi-part task and declares the job done after partial progress, for example addressing 35 of the 50 items in a security review.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eSelf-preferential bias \u003c/strong\u003erefers to Claude’s tendency to prefer its own results or findings, especially when asked to verify or judge them against a rubric. \u003c/li\u003e\u003cli\u003e\u003cstrong\u003eGoal drift \u003c/strong\u003erefers to the gradual loss of fidelity to the original objective across many turns, especially after compaction. Each summarization step is lossy, and details like edge-case requirements or \u0026#34;don\u0026#39;t do X\u0026#34; constraints can get lost.\u003c/li\u003e\u003c/ul\u003e\u003cp\u003eCreating a workflow helps combat these by orchestrating separate Claude subagents with their own context windows and focused, isolated goals.\u003c/p\u003e\u003ch2\u003eDynamic vs static workflows\u003c/h2\u003e\u003cp\u003eYou may have previously created a static workflow using the Claude Agent SDK or \u003ccode\u003eclaude -p\u003c/code\u003e to coordinate multiple instances of Claude Code together. \u003c/p\u003e\u003cp\u003eBut because static workflows need to work for all edge cases, they are usually more generic. With \u003ca href=\"https://www.anthropic.com/news/claude-opus-4-8\" target=\"_blank\"\u003eClaude Opus 4.8\u003c/a\u003e and dynamic workflows, Claude is now intelligent enough to write a custom harness tailor-made for your use case.\u003c/p\u003e\u003cfigure style=\"max-width:1999pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1f3a0e17e2844bed86f22a_image9.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003ch2\u003eHelpful patterns when using dynamic workflows\u003c/h2\u003e\u003cp\u003eYou can start using dynamic workflows just by asking Claude to make one, or by using the trigger word “\u003ccode\u003eultracode\u003c/code\u003e” to ensure that Claude Code creates a workflow. \u003c/p\u003e\u003cp\u003eBut building a mental model for how dynamic workflows work will help you understand when to use them and how you might nudge Claude via prompts.\u003c/p\u003e\u003cp\u003eThere are a few common patterns that Claude might use and compose together when building workflows:\u003c/p\u003e\u003cfigure style=\"max-width:1999pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1f16d86247e586b929a407_image10.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003ch3\u003eClassify-and-act\u003c/h3\u003e\u003cp\u003eUse a classifier agent to decide on the type of task, and then route to different agents or behavior based on the task. Or, use a classifier at the end to determine output.\u003c/p\u003e\u003ch3\u003eFan-out-and-synthesize\u003c/h3\u003e\u003cp\u003eSplit up a task into many smaller steps, run an agent on each step and then synthesize those results. This is particularly useful for when there are a large number of smaller steps, or when each step benefits from its own clean context window so they don\u0026#39;t interfere or cross-contaminate. The synthesize step is a barrier—it waits for all the fan-out agents, then merges their structured outputs into one result.\u003c/p\u003e\u003ch3\u003eAdversarial verification\u003c/h3\u003e\u003cp\u003eFor each spawned agent, run a separate spawned agent to adversarially verify its output against a rubric or criteria. \u003c/p\u003e\u003ch3\u003eGenerate-and-filter\u003c/h3\u003e\u003cp\u003eGenerate a number of ideas on a topic and then filter them by a rubric or by verification, dedupe duplicates and return only the highest quality, tested ideas.\u003c/p\u003e\u003ch3\u003eTournament\u003c/h3\u003e\u003cp\u003eInstead of dividing the work, have agents compete on it. Spawn N agents that each attempt the same task using different approaches. Prompts or models then judge the results in a pairwise fashion using a judging agent until you have a winner.\u003c/p\u003e\u003ch3\u003eLoop until done\u003c/h3\u003e\u003cp\u003eFor tasks with an unknown amount of work, loop spawning agents until a stop condition is met (no new findings, or no more errors in the logs) instead of a fixed number of passes.\u003c/p\u003e\u003ch2\u003eUse cases\u003c/h2\u003e\u003cp\u003eThink creatively of when and how to ask Claude Code to make dynamic workflows. I’ve found that workflows are sometimes even more useful for non-technical work.\u003c/p\u003e\u003ch3\u003eMigrations and refactors \u003c/h3\u003e\u003cp\u003e\u003ca href=\"https://bun.com/\" target=\"_blank\"\u003eBun\u003c/a\u003e was rewritten from Zig to Rust using workflows. You can read more about how that was done in \u003ca href=\"https://x.com/jarredsumner/status/2060050578026189172\" target=\"_blank\"\u003eJarred’s X thread\u003c/a\u003e. \u003c/p\u003e\u003cp\u003eThe key is to break down the task into a series of steps that need to be operated on for example callsites, failing tests, modules, etc. Spin off a subagent for every fix in a worktree to make the fix, then have another agent adversarially review, and merge them. Consider telling the agent not to use resource intensive commands so that you can maximally parallelize without running out of resources on your machine.\u003c/p\u003e\u003ch3\u003eDeep research\u003c/h3\u003e\u003cp\u003eWe published a deep research skill (\u003ccode\u003e/deep-research\u003c/code\u003e) inside Claude Code that uses dynamic workflows. Specifically, it fans-out web searches, fetches sources, adversarially verifies their claims, and synthesizes a cited report.\u003c/p\u003e\u003cp\u003eBut you may do this sort of research for more than just web searches. For example, asking Claude to compile a status report from context in Slack or to research how a feature works by exploring a codebase in-depth.\u003c/p\u003e\u003ch3\u003eDeep verification\u003c/h3\u003e\u003cfigure style=\"max-width:1999pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1f1721824a27cf13da87f4_image2.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eOn the other hand, if you have a report where you want to check and source every factual claim that it references you may want to generate a workflow which has one agent identify all of the factual claims and then spin off a subagent to check each one in-detail. You could also have a verification agent check the source subagent to make sure its source is high quality. \u003c/p\u003e\u003ch3\u003eSorting\u003c/h3\u003e\u003cfigure style=\"max-width:1999pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1f173ce727a972001584cc_image3.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eYou may have a list of items that you want to sort by some qualitative measurement that you believe that Claude Code is good at evaluating, for example: support tickets sorted by severity of the bug. But if you try to sort 1000+ rows in one prompt, quality degrades and it won\u0026#39;t fit in context. Instead run a tournament, a pipeline of pairwise-comparison agents (comparative judgment is more reliable than absolute scoring), or bucket-rank in parallel then merge. Each comparison is its own agent, so the deterministic loop holds the bracket and only the running order stays in context.\u003c/p\u003e\u003ch3\u003eMemory and rule adherence\u003c/h3\u003e\u003cfigure style=\"max-width:1999pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1f17517076bb59050d90bb_image8.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eIf you have a particular set of rules that you find Claude misses or struggles with, even when put into the \u003ccode\u003eCLAUDE.mds\u003c/code\u003e, create a workflow with a list of rules that must be checked by verifier agents—one verifier per rule. Creating a skeptic persona subagent to review the rules to make sure they are in line will help avoid too many false positives.\u003c/p\u003e\u003cp\u003eThe reverse direction works too: mine your recent sessions and code review comments for corrections you keep making, cluster them with parallel agents, adversarially verify each candidate (would this rule have prevented a real mistake?), and then distill the survivors back into a \u003ccode\u003eCLAUDE.md\u003c/code\u003e.\u003c/p\u003e\u003ch3\u003eRoot-cause investigation \u003c/h3\u003e\u003cp\u003eDebugging works best when you come up with several independent hypotheses and test them, but if you’re only using one context window, Claude can run into self-preferential bias\u003c/p\u003e\u003cp\u003eA workflow can structurally prevent this by spinning up agents to generate hypotheses from disjoint evidence. For example, separate agents for logs, files, and data. Each hypothesis can then face a panel of verifiers and refuters.\u003c/p\u003e\u003cp\u003eThis isn\u0026#39;t just for code. Workflows can be used for sales (why did sales drop in March?), data engineering (why did this pipeline fail?), or any post-mortem exercise.\u003c/p\u003e\u003ch3\u003eTriaging at scale\u003c/h3\u003e\u003cfigure style=\"max-width:1999pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1f1778dc00d34cca70819d_image6.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eEvery team has a support queue, bug reports, or some other backlog that cannot be fully processed by humans. \u003c/p\u003e\u003cp\u003eA triage workflow classifies each item, dedupes against what\u0026#39;s already tracked, and takes action. This could mean attempting the fix or escalating to a human user.\u003c/p\u003e\u003cp\u003eA useful pattern for triage workflows is quarantine. This involves barring the agents that read untrusted public content from taking high-privilege actions, which are instead done by the agents in charge of acting on the information.\u003c/p\u003e\u003cp\u003ePair triage workflows with /loop to have Claude do this continuously.\u003c/p\u003e\u003ch3\u003eExploration and taste\u003c/h3\u003e\u003cp\u003eWorkflows can be useful when exploring different approaches to a solution, especially when it is taste based, like design or naming, and would benefit from a rubric.\u003c/p\u003e\u003cp\u003eTry asking Claude to explore a bunch of solutions, and give a review agent  a rubric for what a good solution looks like. The task is complete when the review agent feels like it has met the criteria. Solutions can also be ordered or selected via a tournament based on the rubric.\u003c/p\u003e\u003ch3\u003eEvals\u003c/h3\u003e\u003cp\u003eYou can run lightweight evals for particular tasks by spinning off separate agents in a worktree and then spinning off comparison agents to compare and grade the specific outputs against a rubric. For example, evaluating and then refining a skill you’ve created against a particular criteria.\u003c/p\u003e\u003ch3\u003eModel and intelligence routing\u003c/h3\u003e\u003cp\u003eCreate a classifier agent tuned to your tasks that decides which model to use. This can be helpful when your task will involve many tool calls and conducting research prior to execution can identify the best model for the job. \u003c/p\u003e\u003cp\u003eFor example, the best model for the task “explain how the auth module works” depends on how many files in the auth module there are and the shape of the codebase. A classifier agent can do this research and then route to Sonnet or Opus based on the expected complexity of the task.\u003c/p\u003e\u003ch2\u003eWhen not to use dynamic workflows\u003c/h2\u003e\u003cp\u003eWorkflows are new. While there are many use cases where it will create outsized results, they are not needed for every task and may end up using significantly more tokens.\u003c/p\u003e\u003cp\u003eIt’s best to use workflows creatively to push Claude Code in ways that you haven’t previously. For regular coding tasks, try and ask yourself: does it really need more compute? For example, most traditional coding tasks do not need a panel of 5 reviewers.\u003c/p\u003e\u003ch2\u003eTips for building dynamic workflows\u003c/h2\u003e\u003ch3\u003ePrompting\u003c/h3\u003e\u003cp\u003eDetailed prompting, using the specific techniques we described above, for dynamic workflows creates the best results.\u003c/p\u003e\u003cp\u003eWorkflows are not just for large tasks. You can prompt the model to use a “quick workflow.” For example, you can create a quick adversarial review of an assumption.\u003c/p\u003e\u003ch3\u003eCombine with \u003ccode\u003e/goal\u003c/code\u003e and \u003ccode\u003e/loop\u003c/code\u003e\u003c/h3\u003e\u003cp\u003eWhen using workflows that can be repeated, for example triage, research, or verification, pair them with \u003ccode\u003e/loop\u003c/code\u003e to be run at regular intervals, and /goal to set a hard completion requirement.\u003c/p\u003e\u003ch3\u003eToken usage budgets\u003c/h3\u003e\u003cp\u003eYou can set explicit token usage budgets for dynamic workflows to limit how many tokens a task uses. You can prompt it with a budget like: “use 10k tokens,” which will set the cap.\u003c/p\u003e\u003ch3\u003eSaving and sharing dynamic workflows\u003c/h3\u003e\u003cp\u003eYou can save workflows by pressing “s” in the workflow menu. You can check these into \u003ccode\u003e~/.claude/workflows\u003c/code\u003e or distribute them via a skill. \u003c/p\u003e\u003cfigure class=\"w-richtext-align-center w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1f17b1ca20533e666c867c_image4.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eTo share them via a skill, put your JavaScript workflow files in the skill and folder and reference them in the \u003ca href=\"http://skill.md\"\u003eSKILL.MD\u003c/a\u003e. To allow for more flexibility, you may want to prompt Claude to think of the workflows in the skill as a template instead of a script that needs to be run verbatim.\u003c/p\u003e\u003cfigure style=\"max-width:1999pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a1f17cb835cf4f9fd5da921_image7.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003ch2\u003eA new starting point for discovery \u003c/h2\u003e\u003cp\u003eWorkflows are a helpful new way to extend Claude Code. I encourage you to think of them as a starting point to explore new ways to use Claude to help accomplish your tasks. There is still much to discover in how to use them best. Let me know what you find. \u003c/p\u003e\u003cp\u003e‍\u003c/p\u003e\u003cp\u003e\u003cem\u003eThis article was written by Thariq Shihipar and Sid Bidasaria, members of technical staff at Anthropic working on Claude Code. \u003c/em\u003e\u003c/p\u003e",
      "summary": "Claude Code can now write and orchestrate its own multi-agent harness on the fly. Here's how dynamic workflows work, and the patterns that get the most out of them.",
      "date_published": "0001-01-01T00:00:00Z",
      "tags": [
        "Claude Code"
      ]
    },
    {
      "id": "/blog/introducing-dynamic-workflows-in-claude-code",
      "url": "https://claude.com/blog/introducing-dynamic-workflows-in-claude-code",
      "title": "Introducing dynamic workflows in Claude Code",
      "content_html": "\u003cp\u003e\u003cstrong\u003eUpdate: \u003c/strong\u003eDynamic workflows are now generally available.\u003c/p\u003e\u003cp\u003eToday we\u0026#39;re introducing dynamic workflows in Claude Code, helping Claude take on the most challenging tasks end-to-end. Work you\u0026#39;d normally plan in quarters now finishes in days. Claude dynamically writes orchestration scripts that run tens to hundreds of parallel subagents in a single session, checking its work before anything reaches you.\u003c/p\u003e\u003cp\u003eSome problems are too big for one pass by a single agent, especially in complex, legacy codebases: a bug hunt across an entire service, a migration that touches hundreds of files, a plan you want stress-tested from every angle before you commit to it. Dynamic workflows can handle all of these end-to-end.\u003c/p\u003e\u003cfigure style=\"max-width:2048pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a186b2e070156fbb2df90ad_166befe7.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003c/figure\u003e\u003cp\u003eDynamic workflows are generally available in the Claude Code CLI, Desktop, and the VS code extension for Pro, Max, Team, and Enterprise plans, as well as on the Claude API, on Amazon Bedrock, Vertex AI, and Microsoft Foundry.\u003c/p\u003e\u003cp\u003eNote: Dynamic workflows can consume substantially more tokens than a typical Claude Code session, so we recommend starting on a scoped task to get a feel for usage in your work. \u003c/p\u003e\u003cp\u003eFor the best experience, turn on auto mode when using dynamic workflows. From there, you have two ways to start a workflow:\u003c/p\u003e\u003col role=\"list\"\u003e\u003cli\u003eAsk Claude to create a dynamic workflow directly (e.g.,  “Create a workflow”), or\u003c/li\u003e\u003cli\u003eSwitch on a new Claude Code-specific setting called \u003ccode\u003eultracode\u003c/code\u003e. This is accessible through the effort menu and it sets the effort level to xhigh, while letting Claude decide automatically when to use a workflow to handle your task.\u003c/li\u003e\u003c/ol\u003e\u003ch2\u003e\u003cstrong\u003eDynamic workflows in action\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eEarly access users and teams inside Anthropic have been using dynamic workflows for a wide range of use cases, including:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eCodebase-wide bug hunts, profiler-guided optimization audits, and security audits:\u003c/strong\u003e Claude searches a service or repo in parallel, then runs independent verification on every finding so the report surfaces real issues. The same shape works for hardening passes:  auth checks, input validation, and unsafe patterns across an entire codebase.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eLarge migrations and modernization efforts:\u003c/strong\u003e Claude can handle framework swaps, API deprecations, language ports that span thousands of files end-to-end.\u003cstrong\u003e‍\u003c/strong\u003e\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eCritical work you need checked twice:\u003c/strong\u003e When the cost of a wrong answer is high, a workflow gives Claude independent attempts at the problem and adversarial agents working to break the result before you see it.\u003c/li\u003e\u003c/ul\u003e",
      "summary": "Dynamic workflows in Claude Code let Claude tackle the most challenging tasks by executing across 10s to 100s of parallel subagents, and checking its work before anything reaches you.",
      "date_published": "2026-05-28T00:00:00Z",
      "tags": [
        "Product announcements"
      ]
    },
    {
      "id": "/blog/using-llms-to-secure-source-code",
      "url": "https://claude.com/blog/using-llms-to-secure-source-code",
      "title": "Using LLMs to secure source code",
      "content_html": "\u003cp\u003eModel capabilities are advancing quickly, and unevenly. We’ve been \u003ca href=\"https://www.anthropic.com/glasswing\" target=\"_blank\"\u003eworking with security teams\u003c/a\u003e to find and fix vulnerabilities in their own code and open source software, and the work has given us a better understanding of how to use models to secure source code. \u003cstrong\u003eOur primary takeaway: discovery is now straightforward to parallelize, and the bottleneck has shifted to verification, triage, and patching\u003c/strong\u003e. \u003c/p\u003e\u003cp\u003eTo give some indication of this discrepancy, as part of \u003ca href=\"https://www.anthropic.com/research/glasswing-initial-update\" target=\"_blank\"\u003eour own scanning\u003c/a\u003e of open source software, as of May 22, 2026, we had disclosed 1,596 vulnerabilities. To our knowledge, 97 of these have been patched.\u003c/p\u003e\u003cp\u003eThis guide walks through how you can work with Claude Opus to build a threat model, discover vulnerabilities in your codebase, then verify, triage, and patch them. While we don’t have all the answers, we’ll share how teams have scaled discovery and what’s helped in the later stages. \u003cem\u003eGet started today with the \u003c/em\u003e\u003ca href=\"https://github.com/anthropics/defending-code-reference-harness\" target=\"_blank\"\u003e\u003cem\u003eaccompanying repo\u003c/em\u003e\u003c/a\u003e which includes skills for interactive workflows and a demo harness for autonomous scanning; we’ll call out the skill that implements each step as you read.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eThe find-and-fix loop\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eTeams finding and fixing the most vulnerabilities converged on a variation of existing best practices. We’ve distilled them into a sequence of six steps:\u003c/p\u003e\u003col role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eThreat model:\u003c/strong\u003e Decide what counts as a vulnerability before you start scanning.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eSandbox:\u003c/strong\u003e Build a sandbox environment to isolate agents and prove exploits.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eDiscovery: \u003c/strong\u003eHave models look for vulnerabilities in your source code.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eVerification:\u003c/strong\u003e Independently confirm which findings are actually exploitable.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eTriage:\u003c/strong\u003e Deduplicate findings, assign severity, and prioritize what needs fixing.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003ePatching:\u003c/strong\u003e Apply the fix, confirm the vulnerability is nullified, and search for variants.\u003c/li\u003e\u003c/ol\u003e\u003cfigure style=\"max-width:1999pxpx\" class=\"w-richtext-align-fullwidth w-richtext-figure-type-image\"\u003e\u003cdiv\u003e\u003cimg alt=\"\" src=\"https://cdn.prod.website-files.com/68a44d4040f98a4adf2207b6/6a17340b26b22fa10e3ed68e_image1.png\" loading=\"lazy\"/\u003e\u003c/div\u003e\u003cfigcaption\u003eA one-time investment in threat modeling and sandboxing powers the defender\u0026#39;s loop—a repeating cycle of discovery, verification, triage, and patching—where the bottleneck isn\u0026#39;t finding vulnerabilities but everything that comes after.\u003c/figcaption\u003e\u003c/figure\u003e\u003cp\u003eThe first two steps—building a threat model and a sandbox—are the setup for the rest of the loop. These are typically done once per codebase and revisited when the underlying system changes. The next four steps are the loop you’ll run against the source: discover, verify, triage, and patch.\u003c/p\u003e\u003cp\u003eThe first run on a codebase typically has the highest number of findings. Subsequent runs tend to have fewer—though often more complex—vulnerabilities, as the simpler ones were patched in prior runs. However, don’t expect the \u003cem\u003en\u003csup\u003eth\u003c/sup\u003e\u003c/em\u003e run to have zero new findings. Models are stochastic, and a large codebase can have a long tail of vulnerabilities that continue to trickle in even when the code is unchanged.\u003c/p\u003e\u003cp\u003eOn your first iteration with a codebase, you should run the loop multiple times, deciding when to stop based on the number of net-new findings and your risk tolerance for that system. After that first iteration, continue to scan (1) periodically or (2) whenever the code meaningfully changes.\u003c/p\u003e\u003cp\u003eNext, we’ll walk through each step in detail, explaining why it matters, what it produces, and how to implement it. \u003c/p\u003e\u003ch2\u003e\u003cstrong\u003e1. Threat model: Define what counts as a vulnerability\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eThe most common cause of false positives is that the model lacks a good understanding of your trust boundaries. The model might flag code as vulnerable because it assumes a client could send corrupted values or an attacker could control the config, even though these inputs are \u003cem\u003etrusted\u003c/em\u003e in your environment. Conversely, the model might assume that an internet-facing service is internal-only and thus under-report true vulnerabilities. In both cases, the model is wrong about the threat model, not the code.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eOne team noticed a pattern across their findings: the model performed best on systems with well-documented threat models, system design docs, requirements, and constraints. When the threat model was well-defined, the model\u0026#39;s findings \u0026#34;were exploitable 90 percent of the time.\u0026#34;\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003eYou can work with Claude to build a threat model in two steps:\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eFirst, bootstrap from the code, docs, and vulnerability history.\u003c/strong\u003e Feed the model what you would hand a new security engineer on day one: architecture docs, wikis, entry points, git history, and past vulnerabilities. This helps overcome the challenge of inferring implicit knowledge, trade-offs, and design decisions from code alone. Then, ask the model to create a threat model that includes the system context, assets, entry points, and trust boundaries. Finally, have the model cluster past bugs and list the relevant vulnerability classes. Make sure the threat model documents what vulnerabilities you do and don’t care about, and why. \u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eOne team reviewed hundreds of past CVE and security-fix commits, distilled them into \u0026#34;bug-shape\u0026#34; hints, and asked the model two questions: was the fix complete, and was it applied everywhere else? They found three exploitable issues in an hour. As they put it: \u0026#34;\u0026#39;What have people exploited in the past\u0026#39; is sometimes a much easier cheat-code towards success than \u0026#39;find me vulnerabilities in this codebase.\u0026#39;\u0026#34;\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003e\u003cstrong\u003eSecond, have the model interview someone who knows the system well. \u003c/strong\u003eConsider \u003ca href=\"https://github.com/adamshostack/4QuestionFrame\" target=\"_blank\"\u003eShostack\u0026#39;s four questions\u003c/a\u003e: \u003cem\u003eWhat are we building? What can go wrong? What are we doing about it? Did we do a good job?\u003c/em\u003e Run the bootstrap step first so the interviewee isn’t starting from scratch. This way, instead of spending hours researching and building a threat model from scratch, they can start from a draft. And while the interview step is optional, it adds context the model can’t get from the code or docs, which improves the threat model.\u003c/p\u003e\u003cp\u003eA few practices can make a big difference:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eConsider your dependencies’ security policies.\u003c/strong\u003e Many open-source projects publish one. For example, vLLM’s \u003ca href=\"https://docs.vllm.ai/en/latest/usage/security.html\" target=\"_blank\"\u003e\u003ccode\u003esecurity.md\u003c/code\u003e\u003c/a\u003e, SQLite\u0026#39;s \u003ca href=\"https://www.sqlite.org/security.html\" target=\"_blank\"\u003e\u0026#34;Defense Against the Dark Arts\u0026#34;\u003c/a\u003e, and \u003ca href=\"https://github.com/ImageMagick/ImageMagick/security/policy\" target=\"_blank\"\u003eImageMagick\u0026#39;s security policy\u003c/a\u003e. Your threat model should consider them directly instead of rebuilding a policy from scratch.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eName what  is trusted.\u003c/strong\u003e If you trust config files or authenticated clients, document it in the threat model. These assumptions help separate non-exploitable bugs from actual exploits.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eInclude a \u003ccode\u003eTHREAT_MODEL.md\u003c/code\u003e with the code.\u003c/strong\u003e Have it in the repo and update it as code changes. The discovery agent can then read it before searching, skipping known non-issues.\u003c/li\u003e\u003c/ul\u003e\u003cp\u003eYou’ll use the threat model in two places. In discovery, as scope\u003cstrong\u003e:\u003c/strong\u003e partition the code, prioritize targets, and skip what is out of scope. This helps with large codebases you cannot scan entirely. In triage, as a filter: after scanning broadly, use the threat model to better calibrate severity to your system and environment.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eOne team scanning a large project had a 40% false positive rate and dug into why. The findings were reproducible and the PoCs proved exploitability. But the dev team who owned the code dismissed them as false positives because the bugs didn\u0026#39;t fit the project\u0026#39;s threat model. Another team\u0026#39;s CISO put it succinctly: \u0026#34;[The model has] good context of the code, but not good context of us.\u0026#34;\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003e\u003cstrong\u003eTry the \u003c/strong\u003e\u003ca href=\"https://github.com/anthropics/defending-code-reference-harness/tree/main/.claude/skills/threat-model\" target=\"_blank\"\u003e\u003cstrong\u003ethreat-model skill\u003c/strong\u003e\u003c/a\u003e\u003cstrong\u003e.\u003c/strong\u003e It walks through both steps described in this section—\u003ccode\u003ebootstrap\u003c/code\u003e derives a draft from your code, CVEs, and git history, and interview walks a system owner through Shostack’s four questions to refine it. The output is a \u003ccode\u003eTHREAT_MODEL.md\u003c/code\u003e file which is used in the Discovery and Triage steps.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003e2. Sandbox: Run agents safely and verify exploitability\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003e\u003cstrong\u003eOne purpose of the sandbox is to protect your systems.\u003c/strong\u003e To enable models to run safely and autonomously, you need a strong isolation layer. Without it, the agent may overshoot the target and do something unexpected.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eOne team told the model it had no network access—when it actually did—and the model discovered it could fetch from GitHub anyway. Another team observed an agent answer a GitHub issue mid-scan. Neither action was malicious, but both demonstrated the need to enforce constraints via code and configuration.\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003eMatch the isolation to your threat model. Containers are fine for the discovery agent reading code, but run the target and its PoCs in a microVM (like Firecracker) or a full VM with egress locked down so nothing can reach your production systems. And never have credentials (\u003ccode\u003e~/.aws\u003c/code\u003e, \u003ccode\u003e~/.ssh\u003c/code\u003e, \u003ccode\u003e.env\u003c/code\u003e) available to the agent.\u003c/p\u003e\u003cp\u003eGive the sandbox network access only while you’re setting it up. Pull the dependencies, build, install tools, deploy the target, and run the existing tests to confirm everything works. Then, snapshot the environment and remove its network access. During scanning, allow traffic only to the model API, routed through a local proxy. Load the snapshot at the start of each run so every scan begins from the same clean slate.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eAnother purpose of the sandbox is to prove exploitability. \u003c/strong\u003eDuring static scanning, the model reads code and hypothesizes what might break, but it cannot test if a path is reachable or if there\u0026#39;s a compensating control. As a result, the model might flag unexploitable code-correctness bugs that you don’t actually care about. When teams built a sandbox where the agent could compile code, run tests, and detonate a proof of concept, non-exploitable findings dropped significantly.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eOne offensive-security team built a harness that gives the agent a test bed, with a simple verification rule: it’s only a true positive if the agent can build a proof of concept and run it on the test bed. Their assessment after six weeks was that \u0026#34;the biggest efficacy lever has been giving the model test beds, live systems, and running the PoCs.\u0026#34;\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003eWhen building sandboxes, pin as much as you can so every run uses the same code in the same environment: image tags, commit SHAs, dependencies, and build commands. Cache a  local copy so the build requires no network, and aim for the container to be durable so multiple testing loops can just load it.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eOne team\u0026#39;s scan flagged a vulnerability that turned out to be a byproduct of the agent downloading an older version of the library instead of what was actually deployed. This was caught by an engineer who read the transcript and spotted that a different dependency was being downloaded. They now build Docker containers with dependencies pinned to match production, so the finding agent and the verification agent operate on the same artifacts an attacker would.\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003eIt’s important to build sandboxes that are faithful enough to production. Excluding dependencies (like a queue or datastore) can lead to under-reporting bugs that may exist in production. Conversely, ignoring production defenses (like a WAF or auth gateway) leads to the model reporting unexploitable findings that your prod environment already mitigates.\u003c/p\u003e\u003cp\u003eNonetheless, if building a representative sandbox is impractical because of cloud dependencies, data stores, or other real-world complexities, start with the discovery step (below) instead. You don’t necessarily need to run PoCs in a sandbox. Frontier models are good at finding vulnerabilities from just analyzing source code. Several teams, including our own, have found this effective. The trade-off is in the verification phase, where without a running target we can’t prove findings with a PoC, so budget more time for verification. You can also invest in the sandbox later, once the volume of findings justifies it.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eRefer to the \u003c/strong\u003e\u003ca href=\"https://github.com/anthropics/defending-code-reference-harness/tree/main/harness\" target=\"_blank\"\u003e\u003cstrong\u003eharness \u003ccode\u003eREADME.md\u003c/code\u003e\u003c/strong\u003e\u003c/a\u003e\u003cstrong\u003e\u003ccode\u003e \u003c/code\u003efor a reference sandbox. \u003c/strong\u003eIn this implementation, agents and targets run in gVisor-isolated containers with egress locked to the model API. The target is built from a Dockerfile pinned to a specific commit, with \u003ca href=\"https://github.com/anthropics/defending-code-reference-harness/blob/main/scripts/setup_sandbox.sh\" target=\"_blank\"\u003e\u003ccode\u003esetup_sandbox.sh\u003c/code\u003e\u003c/a\u003e handling the setup phase.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003e3. Discovery: Provide rich context, shorter prompts, and useful tools\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eGive the discovery agent access to context it can load as needed, such as the threat model, architecture docs, and results of past scans. When the agent understands your trust boundaries and how the system is actually deployed, it can better identify vulnerabilities specific to your system. \u003c/p\u003e\u003cp\u003eWe’ve found frontier models to benefit from increasingly simple prompts during the discovery phase. Counterintuitively, more prescriptive prompts make discovery worse—long checklists tend to reduce the model’s creativity and generate fewer novel bugs. Here are some prompting tips that helped in the discovery phase:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eProvide the goal and context.\u003c/strong\u003e Indicate the “why” and “what”—why you’re scanning, what a finding that matters looks like, what system is being scanned—and leave “how to scan for vulnerabilities\u0026#34; to the model. Frontier models are increasingly good at security tasks and being overly prescriptive can narrow what they try.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eTry asking for a specific vulnerability class.\u003c/strong\u003e If you’d like to focus on a specific type of vulnerability guided by prior CVEs or the codebase’s language, say that. Describe the vulnerability class, what it does and where it tends to live, so the model can recognize it in your codebase.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eDefine the output. \u003c/strong\u003eAsk for a structured report with predefined fields, and order them so the model’s reasoning builds on each field. Example fields include rationale, finding, impact, severity, etc. Include an escape hatch so the model can exit early for weak findings.\u003c/li\u003e\u003c/ul\u003e\u003cp\u003eGive the model tools to search through and read the codebase, such as grep, glob, etc. Also let the model use security-specific tools your team might use such as SAST scanners or fuzzers. Ask the model what tools are needed for a specific task and make them available. Finally, let the model build tools as needed: recent frontier models are increasingly good at writing the tools they need.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eIn addition to source code, one pentesting team gave the discovery agent tools to send requests, check the responses, and query traffic logs. As a result, the agent didn’t need to guess whether a path could be reached and could test each candidate against the running application as it went, improving their true-positive rate to nearly 100 percent.\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003eHave the model do a first pass over the system to partition the search space, such as by attack surface, endpoint, or component. Then, feed those partitions to parallel discovery agents so they don’t converge on the same shallow bugs. Finally, run a system-level pass that takes the partition-level findings as context to search for vulnerabilities.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eTeams that tried to brute-force discovery quickly hit diminishing returns. From one team: \u0026#34;We initially tried to just horizontally scale and send more agents, but saw limiting returns.\u0026#34; Another increased the number of focus areas and parallel agents and got \u0026#34;tons of issues\u0026#34;, most of them duplicates of each other.\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003eIf you have a sandbox to run the target, ask the discovery agent to build a PoC of the finding, such as a script, a crashing input, or a failing test. Building the PoC helps the agent iterate and pin down the finding, and the artifact gives the verification agent concrete evidence to evaluate. Nonetheless, findings the agent can’t reproduce can still be reported, flagged as unproven, so you keep recall high.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eThe \u003c/strong\u003e\u003ca href=\"https://github.com/anthropics/defending-code-reference-harness/tree/main/.claude/skills/vuln-scan\" target=\"_blank\"\u003e\u003cstrong\u003e\u003ccode\u003evuln-scan\u003c/code\u003e skill\u003c/strong\u003e\u003c/a\u003e is helpful in this stage. It reads your \u003ccode\u003eTHREAT_MODEL.md\u003c/code\u003e, partitions the target into focus areas, and fans out parallel review agents per area. The output is structured findings the next steps consume directly.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003e4. Verification: Filter out non-exploitable findings\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eDiscovery optimizes for recall; verification optimizes\u003cem\u003e for precision\u003c/em\u003e. In other words, discovery should find as many vulnerabilities as possible—even unlikely ones—and verification should exclude findings that are not actually exploitable. When an agent tries to do \u003cem\u003eboth\u003c/em\u003e in the same step, it can self censor and exclude exploitable true positives. We learned this the hard way, where asking discovery agents to also verify findings led to them filtering out true positives that a separate verification step would have confirmed.\u003c/p\u003e\u003cp\u003eThe verifier agent should be independent from the discovery agent. Run the verifier in a fresh container without a shared filesystem or conversation history. If the verifier is exposed to the discovery agent’s reasoning, it may simply agree instead of testing the claim. Thus, give the verifier only (1) the proof of concept or written finding and (2) the codebase, so it can search for mitigations the finder missed (e.g., upstream validation, auth gates, type constraints, or unreachable code). \u003c/p\u003e\u003cp\u003eIf a single verification pass still lets too many unexploitable findings through, try running multiple independent verifiers. They can consider different angles or run with different models. Then, take the majority vote. Also consider having a separate judge to decide between the discovery and verification agents’ results.\u003c/p\u003e\u003cp\u003ePrompt the verification agent to disprove the discovery agent’s findings. Have the verifier assume each finding is a false positive and search for reasons the finding is wrong. Include clear criteria that the verifier agent can use to determine if the finding is a true positive. This matters most when the discovery agent’s output doesn’t include a PoC. Aim to exclude as many non-exploitable findings as possible to reduce effort on manual reviews.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eAcross the teams we’ve worked with, adding an adversarial verifier roughly halved the rate of non-exploitable findings from the discovery phase. Requiring that verifier to also build a proof of concept confirming the exploit brought the false positive rate to near zero. Together, these two steps helped to reduce the downstream triage and patching load significantly.\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003eIf you’re able to sufficiently reproduce your production environment in a sandbox (see step 2), prompt the verifier agent to build and execute a reproducible proof of concept (PoC). If the PoC works, you can conclude the finding is exploitable. Note that the inverse isn’t true—failure to produce a working PoC is not proof of a false positive.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eOne team scanning open-source packages built a verification step that helped to close the loop: scan the package, generate a proof of concept, then deploy a mock application that uses the package and triggers the PoC. Their take was that: \u0026#34;Validation is the biggest holdup and the PoC is the validation.\u0026#34;\u003c/em\u003e\u003c/blockquote\u003e\u003ch2\u003e\u003cstrong\u003e5. Triage: Deduplicate by root cause, rank by preconditions and impact\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eWhile verification confirms a finding is exploitable, triage assesses patching priority. Previously, when discovery took more effort, the engineer who found the bug also triaged it. Now, with models capable of finding a hundred candidates before lunch, triage has become the bottleneck.\u003c/p\u003e\u003cp\u003eProper triage helps prevent alert fatigue. If you submit too many bugs that are duplicated or have an inflated severity, product engineers may stop reading them, even the ones that need immediate patching. Open source maintainers are especially likely to be overwhelmed by untriaged findings since they receive reports from many different users that rely on their software.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eMultiple teams shared the same lesson: if we send product engineers a pile of findings where a majority are non-exploitable, they will lose trust in the reports and give up. They also prioritize critical and high findings to avoid overwhelming the engineers downstream. Other teams found a win by pointing the model at their existing backlog—open findings from prior scanners, prior models, bug-bounty intake—and cleared hundreds of stale items in days.\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003eTo deduplicate findings, consider the root cause. Scanners often flag one bug at multiple call sites or report multiple symptoms of a single root cause. Here’s one practical approach: First, use a cheap deterministic pass: same file, same category, vulnerability line numbers within ten lines of each other. Then, have a model apply qualitative rules to what remains:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eTreat as duplicate\u003c/strong\u003e: the same root cause worded differently; the same vulnerability reported at multiple call sites; a missing global protection (like an auth check) reported per endpoint; or a cause and its consequence flagged in the same path.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eTreat as distinct\u003c/strong\u003e: different vulnerability classes in the same file; different variables reaching different sinks; two independent bugs inside one helper; the same missing check on two endpoints, but each requires its own fix.\u003c/li\u003e\u003c/ul\u003e\u003cp\u003eIf your harness generates PoCs and patches for each finding, another approach to deduplicate findings is to check if the patch for one finding also disarms the PoCs of others.\u003cbr/\u003e\u003c/p\u003e\u003cp\u003eAfter deduplication, rate the severity of each finding based on:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eReachability.\u003c/strong\u003e Can an attacker reach this code from a real entry point, or is it only reachable from internal code and endpoints?\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eAttacker control.\u003c/strong\u003e Does untrusted input reach the sink intact, or does something upstream sanitize or constrain it?\u003c/li\u003e\u003cli\u003e\u003cstrong\u003ePreconditions.\u003c/strong\u003e What has to be in place for the bug to trigger: a non-default setting, a specific feature flag, a narrow time window the attacker has to hit?\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eAuthentication.\u003c/strong\u003e Can an unauthenticated attacker trigger it, or does it require a logged-in user or an admin?\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eRead vs. write.\u003c/strong\u003e Can the attacker only read data, or also modify it? \u003c/li\u003e\u003cli\u003e\u003cstrong\u003eBlast radius.\u003c/strong\u003e If the PoC fires, who is affected? One user or all users, one tenant or the platform, userland or the kernel?\u003c/li\u003e\u003c/ul\u003e\u003cp\u003eTo turn the rubric into a score, have the model write out its answer to each question before assigning a severity. Going through the evidence first keeps the model from anchoring on the bug class (“SQL injection, so critical”) and then inflating the severity to match. As a starting point: zero preconditions with unauthenticated remote access is critical or high severity. One or two preconditions, or an authenticated path, is medium. Three or more, or local-only, is low. Adjust the thresholds to your system.\u003c/p\u003e\u003cp\u003eModels may inflate severity because they have insufficient context. They may not know what inputs an attacker actually controls, or they cannot see compensating controls. As an example of the former, a SQL injection is critical if triggered by an unauthenticated request but a non-issue if triggered by an admin-only config file. For the latter, upstream WAF or authentication that prevent exploits may not be visible from the source code alone.\u003c/p\u003e\u003cp\u003eThe solution is to provide a threat model during triage that tells the model which types of vulnerabilities you do and don’t care about in your system. For example, clarifying that \u0026#34;we trust authenticated clients\u0026#34; can simplify or remove a whole class of criticals.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eOne team found the model is often overconfident unless grounded in something to verify, or has more context on whether something is expected as part of the threat model. Their fix was to give the triage agent the same threat model the discovery agent gets.\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003e\u003cstrong\u003eTry the \u003c/strong\u003e\u003ca href=\"https://github.com/anthropics/defending-code-reference-harness/tree/main/.claude/skills/triage\" target=\"_blank\"\u003e\u003cstrong\u003e\u003ccode\u003etriage\u003c/code\u003e skill\u003c/strong\u003e\u003c/a\u003e\u003cstrong\u003e.\u003c/strong\u003e It does both verification and triage: multi-vote verification per finding, deduplication across runs, and re-ranking by derived exploitability. The output is a short, ranked, owned list instead of a raw dump.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003e6. Patching: Close the loop and improve context for the next cycle\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003ePatching is where you close the loop and fix the vulnerabilities. It also helps to improve the threat model based on verified findings—updating trust boundaries or components that need more scrutiny—and feed past findings into the next scan’s context. Each cycle hardens the codebase and makes the next scan better informed.\u003c/p\u003e\u003cp\u003eBefore patching, write a new test that fails with the existing code. Then, implement the fix and confirm the same test now passes without breaking anything else. (Yes, it’s test-driven development). If you don\u0026#39;t add a test, the fix can silently regress and it can be hard to retroactively prove the bug was real.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eOne pentester found that their generated patches were inconsistent—some good, some bad—until the harness told the model to validate patches by re-running the proof of concept against the patched code. By giving the model feedback to iterate against, patch quality jumped, saving time on human review.\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003eModels may narrowly address findings at a specific call site instead of the root cause. Simply prompting the model to identify and fix the root cause can be effective. Then, have the model look for variants at two levels: (1) same pattern, where there are other call sites or copies of the same buggy code elsewhere, and (2) same class, where a codebase with one SQL injection vulnerability tends to have more SQL injection vulnerabilities. Update the threat model with the validated findings and patches to close the loop.\u003c/p\u003e\u003cp\u003eBefore you ship the patch, run an adversarial check. Have a new discovery agent probe the patch as an attacker to confirm the patch is comprehensive. Then, simplify the generated patch to address patches that are too invasive. Minimal patches are easier to review and less likely to introduce new bugs. Prompt for the smallest change that fixes the root cause—no refactoring, no drive-by cleanups, no reformatting.\u003c/p\u003e\u003cblockquote\u003e\u003cem\u003eOne team on their most common patch failure: \u0026#34;The recommended patches tend to be as restrictive as possible, to the point that they would break connections with other services. It would address the issue, but break the dependencies that allow the service to work in the first place.\u0026#34;\u003c/em\u003e\u003c/blockquote\u003e\u003cp\u003eYou can validate each patch against a ladder of checks, starting with the cheapest:\u003c/p\u003e\u003col role=\"list\"\u003e\u003cli\u003e\u003cstrong\u003eBuild.\u003c/strong\u003e The patch compiles and the new tests pass.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eTry to reproduce.\u003c/strong\u003e The original PoC should stop working. This catches ineffective patches.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eCheck for regressions.\u003c/strong\u003e The original test suite still passes. This catches broken or over-restrictive patches.\u003c/li\u003e\u003cli\u003e\u003cstrong\u003eRe-attack.\u003c/strong\u003e A fresh discovery agent runs an adversarial check. This catches incomplete patches.\u003c/li\u003e\u003c/ol\u003e\u003cp\u003eFinally, while the model can write the patch, a human still needs to own it. Generated patches can fail in predictable ways—fixing the symptom instead of the root cause, blocking legitimate input, or removing access to a dependent service. The goal is to validate each patch as much as possible so human review requires less effort. The goal is to help the dev team focus on nuances the model might be unaware of (e.g., incoming changes, code style) with minimal review and updates needed to patches.\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eTry the \u003c/strong\u003e\u003ca href=\"https://github.com/anthropics/defending-code-reference-harness/tree/main/.claude/skills/patch\" target=\"_blank\"\u003e\u003cstrong\u003e\u003ccode\u003epatch\u003c/code\u003e skill\u003c/strong\u003e\u003c/a\u003e\u003cstrong\u003e.\u003c/strong\u003e It consumes the triage output and generates a candidate diff per finding, with an independent reviewer agent checking each one.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eGetting started\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eTry running the loop yourself. Clone \u003ca href=\"https://github.com/anthropics/defending-code-reference-harness\" target=\"_blank\"\u003e\u003ccode\u003edefending-code-reference-harness\u003c/code\u003e\u003c/a\u003e and \u003ccode\u003erun /quickstart\u003c/code\u003e in Claude Code. It walks you through an interactive workflow, from threat modeling to scanning to triage, on a demo target. The repo also includes an autonomous harness and a /customize skill to update the harness for your environment.\u003c/p\u003e\u003cp\u003eThen, run it on your own code. Pick a service or package. Bootstrap a threat model from the code and docs, and go through the interview. Invest in building a sandbox of your environment. Scan. Verify the findings with an independent agent. Triage based on your criteria and review everything rated high and above. Patch. Then re-scan periodically.\u003c/p\u003e\u003cp\u003eYour first scan will surface more findings than you’d expect. Most will require verification and triage. Budget for the pipeline \u003cem\u003eafter\u003c/em\u003e the scan before you budget for more scanning.\u003c/p\u003e\u003cp\u003eSome resources you might find helpful:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003e\u003ca href=\"https://www.anthropic.com/product/security\" target=\"_blank\"\u003eClaude Security\u003c/a\u003e: Anthropic’s managed product for agentic vulnerability detection and patching.\u003c/li\u003e\u003cli\u003e\u003ca href=\"https://github.com/anthropics/defending-code-reference-harness\" target=\"_blank\"\u003e\u003ccode\u003edefending-code-reference-harness\u003c/code\u003e\u003c/a\u003e: Companion repo with skills for interactive workflows and a demo harness for autonomous runs.\u003c/li\u003e\u003cli\u003e\u003ca href=\"https://github.com/anthropics/claude-code-security-review\" target=\"_blank\"\u003e\u003ccode\u003eclaude-code-security-review action\u003c/code\u003e\u003c/a\u003e: Github action with Claude as a security reviewer on every pull request.\u003c/li\u003e\u003cli\u003e\u003ca href=\"https://platform.claude.com/cookbook/tool-use-threat-intel-enrichment-agent\" target=\"_blank\"\u003eThreat Intelligence Enrichment Agent\u003c/a\u003e: Cookbook to build an agent that enriches indicators of compromise against threat intel feeds.\u003c/li\u003e\u003cli\u003e\u003ca href=\"https://platform.claude.com/cookbook/claude-agent-sdk-06-the-vulnerability-detection-agent\" target=\"_blank\"\u003eVulnerability Detection Agent\u003c/a\u003e: Cookbook to build an agent that builds a threat-model, scans for vulnerabilities, and triages findings into a structured report.\u003c/li\u003e\u003c/ul\u003e\u003ch2\u003e\u003cstrong\u003eMoving forward\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eWe believe it’s getting easier for models to \u003ca href=\"https://red.anthropic.com/2026/exploit-evals\" target=\"_blank\"\u003efind and exploit vulnerabilities\u003c/a\u003e in code. Thus, our work as defenders is to find and fix the vulnerabilities in our code before adversaries exploit them. Some teams have gone as far as connecting their harnesses to events, where  a bug bounty report triggers an automated variant analysis, a security review triggers scanning and has candidate findings attached, or a verified vulnerability updates the static analysis tooling to prevent it in the future. \u003c/p\u003e\u003cp\u003eThe work is critical and high stakes. But done right, it’s the start of a larger, more hopeful shift, where we’ll be \u003cem\u003eable \u003c/em\u003eto find and fix vulnerabilities before attackers exploit them.\u003c/p\u003e\u003cp\u003eIf you’d like to stay connected to our work on cybersecurity, please sign up to our mailing list, \u003ca href=\"https://claude.com/form/cybersecurity-mailing-list\" target=\"_blank\"\u003e\u003cstrong\u003ehere\u003c/strong\u003e\u003c/a\u003e.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eWritten by Eugene Yan and Henna Dattani, with contributions from Michael Molash, Abel Ribbink, Justin Young, Ben Morris, David Dworken, and Hasnain Lakhani. This work draws upon our experiences working with models for security at Anthropic and the valuable insights shared by our partners and customers, for which we’re deeply grateful.\u003c/p\u003e\u003cp\u003e‍\u003c/p\u003e",
      "summary": "We share best practices for how you can work with Claude Opus to build a threat model, discover vulnerabilities in your codebase, then verify, triage, and patch them.",
      "date_published": "2026-05-27T00:00:00Z",
      "tags": [
        "Enterprise AI"
      ]
    },
    {
      "id": "/blog/how-coderabbit-used-claude-to-build-an-agent-orchestration-system",
      "url": "https://claude.com/blog/how-coderabbit-used-claude-to-build-an-agent-orchestration-system",
      "title": "How CodeRabbit used Claude to build an agent orchestration system",
      "content_html": "\u003cp\u003e\u003cem\u003eIn our series,\u003c/em\u003e \u003cstrong\u003e\u003cem\u003eHow startups build with Claude\u003c/em\u003e\u003c/strong\u003e, we highlight how startups are transforming their industries with AI. In this article, we share how CodeRabbit built an agent orchestration layer that plans before AI generates code.\u003c/p\u003e\u003cdiv class=\"w-embed\"\u003e\u003cfigure\u003e\n  \u003cdiv role=\"region\" tabindex=\"0\"\u003e\n    \u003ctable\u003e\n      \u003cthead\u003e\n        \u003ctr\u003e\n          \u003cth colspan=\"2\"\u003eThe quick pitch\u003c/th\u003e\n        \u003c/tr\u003e\n      \u003c/thead\u003e\n      \u003ctbody\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eName\u003c/td\u003e\n          \u003ctd\u003eCodeRabbit\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eFounded\u003c/td\u003e\n          \u003ctd\u003e2023\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eFounders\u003c/td\u003e\n          \u003ctd\u003eHarjot Gill, CEO\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eStack\u003c/td\u003e\n          \u003ctd\u003eClaude Platform, Claude Code\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eScale\u003c/td\u003e\n          \u003ctd\u003eReviews 2 million PRs per week across 15,000+ customers\u003c/td\u003e\n        \u003c/tr\u003e\n      \u003c/tbody\u003e\n    \u003c/table\u003e\n  \u003c/div\u003e\n\u003c/figure\u003e\u003c/div\u003e\u003cp\u003eAI coding tools have collapsed the time between idea and working prototype. CodeRabbit, an AI code review platform, has noticed a different trend climbing alongside that throughput: code that compiles and passes tests but doesn\u0026#39;t do what the team actually meant to build. \u003c/p\u003e\u003cp\u003eDavid Loker, VP of AI at CodeRabbit, locates the cause upstream of the model. Experienced developers often assume coding agents understand the same context they do, so they don’t write down requirements that feel obvious to them. The coding agent then fills the gaps with whatever it considers plausible.\u003c/p\u003e\u003cp\u003eTo close that gap, CodeRabbit used Claude to design and build an agent orchestration system that runs a structured planning phase before any code is generated. The team\u0026#39;s working thesis is that planning quality determines output quality, and the cheaper code generation gets, the more expensive it becomes to move in the wrong direction.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eAddressing the internal knowledge gap in AI coding\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eWhen the CodeRabbit team studied AI-generated pull requests across their customer base, the most frequent failure mode was code that compiled and passed tests, yet still didn\u0026#39;t solve the problem it was built to solve.\u003c/p\u003e\u003cp\u003e\u0026#34;As we gain experience as developers, we internalize knowledge,\u0026#34; Loker says. \u0026#34;All those things are in our head, and we assume other developers know them too. But then we make that assumption of the AI system as well, that it also implicitly understands. We\u0026#39;re not even aware that we\u0026#39;re assuming those things.\u0026#34;\u003c/p\u003e\u003cp\u003eVague prompts force the underlying system to fill gaps with whatever it considers plausible. That guess often diverges from what the developer had in mind. \u003c/p\u003e\u003cp\u003eLoker offers a personal example. While building a memory system on a side project, he spent hours iterating with a coding agent until everything ran. When he asked the agent how to use it, the instructions told him to pass in a user token. There was no login page. He had specified that the system required users but never said users needed a way to sign in. The agent filled the gap, and hours of work landed in a product missing a front door.\u003c/p\u003e\u003cp\u003e\u0026#34;What ends up happening is you build a lot more stuff on top of it, then much later you find there\u0026#39;s a problem,\u0026#34; Loker says. \u0026#34;In AI workflows, late validation can be very expensive.\u0026#34;\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eAn orchestration layer that runs before AI coding solutions\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eCodeRabbit\u0026#39;s response was to insert a planning system in front of code generation. It coordinates multiple Claude models to analyze requirements and surface assumptions before producing a structured execution plan that defines what should be built and what constraints it needs to satisfy.\u003c/p\u003e\u003cp\u003e\u0026#34;This planning system is not meant to replace Claude Code\u0026#39;s Plan Mode,\u0026#34; Loker says. \u0026#34;It\u0026#39;s a higher level orchestration that happens before Claude Code, to point it in a really narrow and right direction where everything that needs to be explicit is made explicit, and we are aware of all assumptions that are being made.\u0026#34;\u003c/p\u003e\u003cp\u003eThe output is a collaborative product requirements document (PRD): a plan created with full context, validated by stakeholders across the team, and reviewed before implementation starts. Claude Code picks up that plan and uses it to generate a fine-grained implementation plan. The plan becomes a shared artifact that captures what was decided and why, which not only helps teams avoid rework and validate later that the output matched the original intent, but also onboard new engineers.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eRouting across the Claude model family\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eCodeRabbit matches each model tier to task complexity to optimize for cost and latency. Opus drives the orchestration loop and the higher-level strategic work of understanding the problem and setting overall direction. Sonnet takes that output and sequences it into structured planning steps. Haiku handles narrowly scoped operations like context distillation and targeted tool use, where the question is specific enough that a smaller model can answer it well.\u003c/p\u003e\u003cp\u003e\u0026#34;If Haiku does as well as Sonnet on a given task, we use Haiku,\u0026#34; Loker says. \u0026#34;If the evaluation harness tells us the plan quality improves when we give Opus more room, we give it more room. We don\u0026#39;t guess.\u0026#34;\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eBuilding an eval harness for plan quality\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eCodeRabbit had a mature evaluation system for code review, but nothing for evaluating planning output. Building that infrastructure became its own project.\u003c/p\u003e\u003cp\u003eThe system started with hand-tuned examples and manual inspection. The team developed a library of LLM judges that scored specific dimensions of plan quality. Because plans eventually produce code, the team could also measure whether the generated code worked, whether it contained extra scope, and how many tokens it took to get there. Running the same task with and without the planning step gave them a way to isolate the value of planning itself.\u003c/p\u003e\u003cp\u003e\u0026#34;We didn\u0026#39;t realize what the right level of detail was going to be for that plan,\u0026#34; Loker says. Plans that were too granular went stale the moment the codebase shifted. Plans that were too high-level left room for the agent to fill in assumptions, which was the original problem the planning layer was meant to solve. Finding the working level of abstraction took iteration, which is what the eval harness made possible.\u003c/p\u003e\u003ch2\u003e\u003cstrong\u003eCatching errors before any code gets written\u003c/strong\u003e\u003c/h2\u003e\u003cp\u003eIn an AI-native coding workflow, many of the decisions that used to surface during code review are now made earlier, in the planning layer. Building a plan that the team can review and align on before code generation starts catches mistakes early.\u003c/p\u003e\u003cp\u003e\u0026#34;What we\u0026#39;ve built, using the Claude ecosystem, is a team-wide planning system,\u0026#34; Loker says. \u0026#34;The plan itself becomes a quality gate. If we can make sure the quality of that plan is really good upfront, the downstream effect is very pronounced. You end up with a lot better code at the end of it.\u0026#34;\u003c/p\u003e\u003cdiv class=\"w-embed\"\u003e\u003cfigure\u003e\n  \u003cdiv role=\"region\" tabindex=\"0\"\u003e\n    \u003ctable\u003e\n      \u003cthead\u003e\n        \u003ctr\u003e\n          \u003cth colspan=\"2\"\u003eBest practices from the CodeRabbit team\u003c/th\u003e\n        \u003c/tr\u003e\n      \u003c/thead\u003e\n      \u003ctbody\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eWhat outcome are you actually trying to create, and how do you measure?\u003c/td\u003e\n          \u003ctd\u003eBe explicit not just in specifications to the AI but also define what you want in the MPP (maximum possible product).\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eWhat assumptions are still implicit?\u003c/td\u003e\n          \u003ctd\u003eAsk Claude: what is missing? Are there any parts of the plan that are coming out as implicit assumptions instead of explicit specifications?\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eWhat workflows or edge cases are easy to forget?\u003c/td\u003e\n          \u003ctd\u003eAsk Claude to help identify places or cases that you may not have taken into account.\u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n          \u003ctd\u003eHow will you know the output matches intent before rollout?\u003c/td\u003e\n          \u003ctd\u003eCreate a record of work: a chronicle of planning artifacts that is saved and reused.\u003c/td\u003e\n        \u003c/tr\u003e\n      \u003c/tbody\u003e\n    \u003c/table\u003e\n  \u003c/div\u003e\n\u003c/figure\u003e\u003c/div\u003e\u003cp\u003e\u003cstrong\u003eBuild your startup on the \u003c/strong\u003e\u003ca href=\"https://platform.claude.com/login?returnTo=%2F%3F\" target=\"_blank\"\u003e\u003cstrong\u003eClaude Platform\u003c/strong\u003e\u003c/a\u003e\u003cstrong\u003e.\u003c/strong\u003e\u003c/p\u003e",
      "summary": "CodeRabbit built a layer on Claude that sits between a coding request and a coding agent, producing a structured coding plan the team can review before any code gets generated.",
      "date_published": "2026-05-27T00:00:00Z",
      "tags": [
        "Claude Code"
      ]
    },
    {
      "id": "/blog/zero-trust-for-ai-agents",
      "url": "https://claude.com/blog/zero-trust-for-ai-agents",
      "title": "Zero Trust for AI agents",
      "content_html": "\u003cp\u003eFrontier AI models are compressing the timeline between vulnerability and exploit from months to hours. Defenders who adopt these tools find and fix bugs faster; attackers who adopt them, or who simply wait for defenders\u0026#39; patches and reverse-engineer them into exploits, move faster too. This is not a future concern: models can already find serious vulnerabilities that traditional tooling and human reviewers have missed for years.\u003c/p\u003e\u003cp\u003eThis acceleration matters twice for any organization deploying agents. The infrastructure your agents run on is exposed to AI-accelerated offense like the rest of your estate, and the agents themselves introduce autonomy to interpret goals, select tools, and execute multi-step operations. Traditional access controls won\u0026#39;t prevent agents from misusing legitimate permissions, and monitoring needs to account for attacks designed to succeed through persistence rather than exploitation.\u003c/p\u003e\u003cp\u003e\u003ca href=\"https://en.wikipedia.org/wiki/Zero_trust_architecture\"\u003eZero Trust\u003c/a\u003e—trust nothing, verify everything, and assume breach has already occurred—gives security leaders a proven foundation to address this. But the principles need new shape for agentic systems: identities that are cryptographically rooted, permissions scoped per task, memory protected against poisoning, and defensive operations that run at the speed of autonomous attackers. \u003c/p\u003e\u003cp\u003eTo help security and risk leaders build for this shift, we put together a practical framework for deploying autonomous AI agents in the enterprise.\u003c/p\u003e\u003cp\u003eIn this guide, we share:\u003c/p\u003e\u003cul role=\"list\"\u003e\u003cli\u003eThe security considerations unique to agentic systems, including tool access, autonomous decision-making, context persistence, and multi-agent coordination\u003c/li\u003e\u003cli\u003eThe current threat landscape for agents, including prompt injection, tool poisoning, identity and privilege abuse, memory poisoning, and supply chain attacks\u003c/li\u003e\u003cli\u003eA three-tier Zero Trust framework (Foundation, Advanced, and Optimized) mapped to organizational maturity and risk tolerance\u003c/li\u003e\u003cli\u003eAn eight-phase implementation workflow covering identity, access scoping, sandboxing, input and output controls, and memory safeguards\u003c/li\u003e\u003cli\u003eHow to run agentic security operations (Agentic SOAR) fast enough to contend with AI-accelerated attackers\u003c/li\u003e\u003cli\u003eCompliance alignment for regulated industries including healthcare, finance, and government\u003c/li\u003e\u003c/ul\u003e\u003cp\u003eThe organizations best positioned for this shift will be the ones whose fundamentals are strong enough that AI-assisted scanning finds fewer bugs in the first place, and whose agent deployments are architected for breach from day one.\u003c/p\u003e\u003cp\u003eCheck it out, \u003ca href=\"https://cdn.prod.website-files.com/6889473510b50328dbb70ae6/6a1611a04085d7cd3dadc924_Claude-eBook-Zero-Trust-for-AI-Agents-05182026.pdf\" target=\"_blank\"\u003ehere\u003c/a\u003e.\u003c/p\u003e\u003cp\u003eGet started with \u003ca href=\"https://www.anthropic.com/product/security\" target=\"_blank\"\u003eClaude Security\u003c/a\u003e today.\u003c/p\u003e",
      "summary": "A Zero Trust framework for deploying autonomous AI agents in the enterprise, covering current threats, a tiered architecture, an eight-phase implementation workflow, and agentic SOAR.",
      "date_published": "2026-05-27T00:00:00Z",
      "tags": [
        "Enterprise AI"
      ]
    }
  ]
}
