Skip to main content
Featured claude-code mcp architecture proposal

The Future of MCP: From Always-On to On-Demand Loading

An architectural proposal for Claude Code's next evolution - assigning MCP servers to specific agents and skills instead of loading all at session start.

January 12, 2026 9 min read By Claude World

The Problem: Context Budget Under Pressure

If you’re using Claude Code with multiple MCP servers, you’ve probably noticed something: your context window starts filling up before you even type your first message.

Here’s a typical scenario:

MCP ServerApproximate Token CostSource
GitHub (27 tools)~18,000Issue #11364
AWS MCP servers~18,300Issue #7172
Cloudflare~15,000+Community reports
Sentry~14,000Community reports
Playwright (21 tools)~13,647Scott Spence
Supabase~12,000+Community reports
Average per tool~550-850Issue #11364

Real-world impact: With 7 MCP servers active, tool definitions alone consume 67,300 tokens (33.7% of 200k context). Even a minimal 3-server setup consumes 42,600 tokens (21.3%).

The irony? Many of these MCPs won’t be used in a given session. That Sentry integration sitting at 14,000 tokens? You might not need it today. But it’s there, eating context, every single time.

The Observation: Fork Architecture is Already Here

Claude Code 2.1.x introduced a powerful feature: context: fork for skills. This allows skills to run in isolated contexts, with their own tool permissions and state.

# A skill with forked context
---
description: Deploy to production
context: fork
allowed-tools: [Bash, Read]
---

When this skill runs, it gets its own context bubble. Changes don’t pollute the main conversation. It’s clean, it’s isolated, it works.

Here’s the insight: If we can fork contexts, why can’t we fork MCP access too?

The Proposal: MCP Context Isolation

Important distinction: This proposal is fundamentally different from typical “lazy loading” approaches. We’re not suggesting dynamic runtime loading into the main context. Instead, we propose isolating MCPs into forked agent/skill contexts, keeping the main context permanently clean.

ApproachMain ContextLoading TimeComplexity
Traditional Lazy LoadingGets populated when MCP is neededRuntime dynamicHigh (state management)
Our Proposal: Context IsolationAlways stays cleanAt fork creationLow (uses existing context: fork)
Traditional Lazy Loading:
Main Context ──[need MCP]──> Load MCP ──> Main Context (now occupied)

Our Proposal (Context Isolation):
Main Context (stays clean)
    └── Fork Agent Context ──> Load MCPs ──> Isolated Context
                                              └── Released when done

Imagine this architecture:

# agents/database-specialist.md
---
name: database-specialist
description: Database operations expert
tools: [Read, Bash, Grep]
mcp: [postgres, redis]        # ← Only loads for this agent
context: fork
---
# skills/deploy/SKILL.md
---
description: Deploy to production
mcp: [vercel, github]         # ← Only loads during /deploy
context: fork
---

The main session stays lean:

Main Session (Lean)

    ├── Base MCPs only: filesystem, memory
    │   (~1,300 tokens instead of 20,000+)

    ├── Task: database-specialist (forked)
    │   └── Loads: postgres, redis (only here)

    └── Skill: /deploy (forked)
        └── Loads: vercel, github (only here)

Real-World Example: The Modern Work Hub Problem

Here’s a scenario many of us face daily:

The Multi-Platform Reality

Modern knowledge workers juggle an overwhelming number of platforms:

CategoryPlatforms
Code HostingGitHub, GitLab, Bitbucket
Project ManagementJira, Linear, Asana, Notion
CommunicationSlack, Discord, Teams, Email
CI/CDVercel, Netlify, AWS
MonitoringSentry, Datadog, PagerDuty
CRM/BusinessSalesforce, HubSpot, Stripe

Each of these has an MCP. Now imagine the dilemma:

Option A: Install Everything

All MCPs loaded at session start:
- github (~2,000 tokens)
- gitlab (~2,500 tokens)
- slack (~3,000 tokens)
- jira (~4,000 tokens)
- notion (~2,500 tokens)
- sentry (~14,000 tokens)
- ...and 10 more

Total: 50,000+ tokens before you even start working.
That's potentially 50% of your context budget. Gone.

Option B: Separate by Project

Project A: github + vercel
Project B: gitlab + jira
Project C: slack + notion

This defeats the entire purpose. Claude Code’s power is being a unified command center - one place to orchestrate all your tools. Fragmenting by project means:

  • Switching contexts constantly
  • Losing cross-platform insights
  • No unified workflow automation

Option C: The On-Demand Future

# skills/work-hub/SKILL.md
---
description: Unified work management hub
mcp:
  communication: [slack, discord]
  code: [github, gitlab]
  projects: [jira, notion]
  monitoring: [sentry]
context: fork
---

Now Claude Code becomes what it should be: a true central nervous system for your digital work life. Need to:

  • Check GitHub PR + notify Slack + update Jira? One command.
  • Review GitLab MR + post to Discord + log in Notion? One command.
  • Debug Sentry error + create GitHub issue + assign in Linear? One command.

Each workflow loads only the MCPs it needs. Your main context stays pristine.

Current Workaround (Not Ideal)

Yes, you can use skills today to wrap functionality without MCPs - making API calls directly via bash/curl. But that’s:

  • More fragile (no MCP error handling)
  • More verbose (raw API vs. semantic tools)
  • Missing the point (we have MCPs for a reason)

The critical issue is authentication:

AspectMCPSkills + Scripts
Credential ManagementCentralized in settings.jsonScattered across .env, scripts, env vars
SecurityEnvironment isolationRisk of exposure in logs/shell history
Token RefreshHandled automaticallyManual implementation required
Error HandlingStandardized responsesDifferent per API
# MCP approach - Clean & Secure
mcp: [github]
# Credentials in settings.json, isolated, never exposed

# Script approach - Credentials scattered everywhere
# Option 1: .env file (needs management)
# Option 2: Hardcoded in script (dangerous)
# Option 3: Pass every time (tedious, error-prone)

MCP’s value isn’t just the tools—it’s the centralized, secure credential management. Context Isolation preserves this benefit while solving the context consumption problem.

The on-demand architecture isn’t just an optimization. It’s what unlocks Claude Code’s potential as a universal work orchestrator.

Why This Makes Sense

1. Context Efficiency

Your main conversation keeps its full context budget. MCPs load only when the specific agent or skill that needs them runs.

2. Granular Permissions

Instead of “this session has access to everything,” you get layered control:

Layer 0: Main Context (minimal)
   └── filesystem (read-only), memory

Layer 1: Development Agents
   └── code-reviewer: + git (read)
   └── debugger: + bash (sandboxed)

Layer 2: Specialized Skills
   └── /deploy: + vercel, github (push)
   └── /db-migrate: + postgres (write)

Layer 3: Admin Operations
   └── /production-access: all (with confirmation)

3. Progressive Security

Rather than “all or nothing” permissions, you get defense in depth. A code review doesn’t need database write access. A deployment doesn’t need Sentry access.

4. Ecosystem Scalability

The MCP ecosystem is exploding. Dozens of new servers every week. The “load everything at start” model simply doesn’t scale.

Implementation Possibilities

The ideal solution combines two-sided configuration for maximum flexibility and backward compatibility:

MCP-Side: Lazy Loading Flag

In settings.json, each MCP can declare whether it should load at session start:

{
  "mcpServers": {
    "memory": {
      "command": "...",
      "lazy": false    // Always load (default, backward compatible)
    },
    "github": {
      "command": "...",
      "lazy": true     // Don't load until requested
    },
    "postgres": {
      "command": "...",
      "lazy": true     // Don't load until requested
    }
  }
}

Backward compatibility: Omitting lazy or setting lazy: false maintains current behavior.

Agent/Skill-Side: Frontmatter Declaration

Agents and skills declare which MCPs they need:

# agents/database-specialist.md
---
name: database-specialist
description: Database operations expert
tools: [Read, Bash, Grep]
mcp:
  required: [postgres]    # Must have
  optional: [redis]       # Nice to have
context: fork
---

Loading Logic

MCP lazy SettingAgent/Skill DeclarationResult
false (or omitted)-✅ Load at session start (current behavior)
trueNot declared❌ Don’t load
truemcp: [xxx]✅ Load when agent/skill runs

This dual approach enables:

  • Gradual migration: Move heavy MCPs to lazy: true one at a time
  • Zero breaking changes: Existing configs work unchanged
  • Fine-grained control: Both infrastructure and application level settings

The Evidence: Claude Code is Heading This Way

Look at the evolution:

VersionFeatureTrend
2.0.65Context awareness, status lineTracking context usage
2.1.0context: fork for skillsIsolation architecture
2.1.1Agent frontmatterConfigurable agents
2.1.3Skills = Commands unifiedSimplification
2.2.x?On-demand MCP?Logical next step

The pieces are there. The architecture supports it. The need is clear.

Challenges to Consider

ChallengePossible Solution
MCP startup latencyWarm pool, pre-connect on first mention
State after fork endsStateless design, session-level cache
Tool discoveryLazy manifest - tools declared but not loaded
Credential scopeEnvironment inheritance with limits

These are solvable problems. The fork architecture already handles most of them.

The Vision: Claude Code as Universal Work Orchestrator

This proposal isn’t just about saving tokens. It’s about unlocking what Claude Code can truly become.

Today, Claude Code is a powerful coding assistant. With on-demand MCP loading, it transforms into something far more ambitious: a universal orchestrator for your entire digital work life.

The architecture pieces are already in place:

  • context: fork provides isolation
  • Agent/Skill frontmatter provides declaration
  • The MCP ecosystem provides integration

What’s missing is the connection: letting agents and skills declare and load their own MCPs on demand.

This is the natural next step. The question isn’t if this will happen, but when and how.


This proposal is published at claude-world.com as part of our ongoing exploration of Claude Code’s architectural possibilities.