The Future of MCP: From Always-On to On-Demand Loading

The Problem: Context Budget Under Pressure

If you’re using Claude Code with multiple MCP servers, you’ve probably noticed something: your context window starts filling up before you even type your first message.

Here’s a typical scenario:

MCP Server	Approximate Token Cost	Source
GitHub (27 tools)	~18,000	Issue #11364
AWS MCP servers	~18,300	Issue #7172
Cloudflare	~15,000+	Community reports
Sentry	~14,000	Community reports
Playwright (21 tools)	~13,647	Scott Spence
Supabase	~12,000+	Community reports
Average per tool	~550-850	Issue #11364

Real-world impact: With 7 MCP servers active, tool definitions alone consume 67,300 tokens (33.7% of 200k context). Even a minimal 3-server setup consumes 42,600 tokens (21.3%).

The irony? Many of these MCPs won’t be used in a given session. That Sentry integration sitting at 14,000 tokens? You might not need it today. But it’s there, eating context, every single time.

The Observation: Fork Architecture is Already Here

Claude Code 2.1.x introduced a powerful feature: context: fork for skills. This allows skills to run in isolated contexts, with their own tool permissions and state.

# A skill with forked context
---
description: Deploy to production
context: fork
allowed-tools: [Bash, Read]
---

When this skill runs, it gets its own context bubble. Changes don’t pollute the main conversation. It’s clean, it’s isolated, it works.

Here’s the insight: If we can fork contexts, why can’t we fork MCP access too?

The Proposal: MCP Context Isolation

Important distinction: This proposal is fundamentally different from typical “lazy loading” approaches. We’re not suggesting dynamic runtime loading into the main context. Instead, we propose isolating MCPs into forked agent/skill contexts, keeping the main context permanently clean.

Approach	Main Context	Loading Time	Complexity
Traditional Lazy Loading	Gets populated when MCP is needed	Runtime dynamic	High (state management)
Our Proposal: Context Isolation	Always stays clean	At fork creation	Low (uses existing `context: fork`)

Traditional Lazy Loading:
Main Context ──[need MCP]──> Load MCP ──> Main Context (now occupied)

Our Proposal (Context Isolation):
Main Context (stays clean)
    └── Fork Agent Context ──> Load MCPs ──> Isolated Context
                                              └── Released when done

Imagine this architecture:

# agents/database-specialist.md
---
name: database-specialist
description: Database operations expert
tools: [Read, Bash, Grep]
mcp: [postgres, redis]        # ← Only loads for this agent
context: fork
---

# skills/deploy/SKILL.md
---
description: Deploy to production
mcp: [vercel, github]         # ← Only loads during /deploy
context: fork
---

The main session stays lean:

Main Session (Lean)
    │
    ├── Base MCPs only: filesystem, memory
    │   (~1,300 tokens instead of 20,000+)
    │
    ├── Task: database-specialist (forked)
    │   └── Loads: postgres, redis (only here)
    │
    └── Skill: /deploy (forked)
        └── Loads: vercel, github (only here)

Real-World Example: The Modern Work Hub Problem

Here’s a scenario many of us face daily:

The Multi-Platform Reality

Modern knowledge workers juggle an overwhelming number of platforms:

Category	Platforms
Code Hosting	GitHub, GitLab, Bitbucket
Project Management	Jira, Linear, Asana, Notion
Communication	Slack, Discord, Teams, Email
CI/CD	Vercel, Netlify, AWS
Monitoring	Sentry, Datadog, PagerDuty
CRM/Business	Salesforce, HubSpot, Stripe

Each of these has an MCP. Now imagine the dilemma:

Option A: Install Everything

All MCPs loaded at session start:
- github (~2,000 tokens)
- gitlab (~2,500 tokens)
- slack (~3,000 tokens)
- jira (~4,000 tokens)
- notion (~2,500 tokens)
- sentry (~14,000 tokens)
- ...and 10 more

Total: 50,000+ tokens before you even start working.
That's potentially 50% of your context budget. Gone.

Option B: Separate by Project

Project A: github + vercel
Project B: gitlab + jira
Project C: slack + notion

This defeats the entire purpose. Claude Code’s power is being a unified command center - one place to orchestrate all your tools. Fragmenting by project means:

Switching contexts constantly
Losing cross-platform insights
No unified workflow automation

Option C: The On-Demand Future

# skills/work-hub/SKILL.md
---
description: Unified work management hub
mcp:
  communication: [slack, discord]
  code: [github, gitlab]
  projects: [jira, notion]
  monitoring: [sentry]
context: fork
---

Now Claude Code becomes what it should be: a true central nervous system for your digital work life. Need to:

Check GitHub PR + notify Slack + update Jira? One command.
Review GitLab MR + post to Discord + log in Notion? One command.
Debug Sentry error + create GitHub issue + assign in Linear? One command.

Each workflow loads only the MCPs it needs. Your main context stays pristine.

Current Workaround (Not Ideal)

Yes, you can use skills today to wrap functionality without MCPs - making API calls directly via bash/curl. But that’s:

More fragile (no MCP error handling)
More verbose (raw API vs. semantic tools)
Missing the point (we have MCPs for a reason)

The critical issue is authentication:

Aspect	MCP	Skills + Scripts
Credential Management	Centralized in settings.json	Scattered across .env, scripts, env vars
Security	Environment isolation	Risk of exposure in logs/shell history
Token Refresh	Handled automatically	Manual implementation required
Error Handling	Standardized responses	Different per API

# MCP approach - Clean & Secure
mcp: [github]
# Credentials in settings.json, isolated, never exposed

# Script approach - Credentials scattered everywhere
# Option 1: .env file (needs management)
# Option 2: Hardcoded in script (dangerous)
# Option 3: Pass every time (tedious, error-prone)

MCP’s value isn’t just the tools—it’s the centralized, secure credential management. Context Isolation preserves this benefit while solving the context consumption problem.

The on-demand architecture isn’t just an optimization. It’s what unlocks Claude Code’s potential as a universal work orchestrator.

Why This Makes Sense

1. Context Efficiency

Your main conversation keeps its full context budget. MCPs load only when the specific agent or skill that needs them runs.

2. Granular Permissions

Instead of “this session has access to everything,” you get layered control:

Layer 0: Main Context (minimal)
   └── filesystem (read-only), memory

Layer 1: Development Agents
   └── code-reviewer: + git (read)
   └── debugger: + bash (sandboxed)

Layer 2: Specialized Skills
   └── /deploy: + vercel, github (push)
   └── /db-migrate: + postgres (write)

Layer 3: Admin Operations
   └── /production-access: all (with confirmation)

3. Progressive Security

Rather than “all or nothing” permissions, you get defense in depth. A code review doesn’t need database write access. A deployment doesn’t need Sentry access.

4. Ecosystem Scalability

The MCP ecosystem is exploding. Dozens of new servers every week. The “load everything at start” model simply doesn’t scale.

Implementation Possibilities

The ideal solution combines two-sided configuration for maximum flexibility and backward compatibility:

MCP-Side: Lazy Loading Flag

In settings.json, each MCP can declare whether it should load at session start:

{
  "mcpServers": {
    "memory": {
      "command": "...",
      "lazy": false    // Always load (default, backward compatible)
    },
    "github": {
      "command": "...",
      "lazy": true     // Don't load until requested
    },
    "postgres": {
      "command": "...",
      "lazy": true     // Don't load until requested
    }
  }
}

Backward compatibility: Omitting lazy or setting lazy: false maintains current behavior.

Agent/Skill-Side: Frontmatter Declaration

Agents and skills declare which MCPs they need:

# agents/database-specialist.md
---
name: database-specialist
description: Database operations expert
tools: [Read, Bash, Grep]
mcp:
  required: [postgres]    # Must have
  optional: [redis]       # Nice to have
context: fork
---

Loading Logic

MCP `lazy` Setting	Agent/Skill Declaration	Result
`false` (or omitted)	-	✅ Load at session start (current behavior)
`true`	Not declared	❌ Don’t load
`true`	`mcp: [xxx]`	✅ Load when agent/skill runs

This dual approach enables:

Gradual migration: Move heavy MCPs to lazy: true one at a time
Zero breaking changes: Existing configs work unchanged
Fine-grained control: Both infrastructure and application level settings

The Evidence: Claude Code is Heading This Way

Look at the evolution:

Version	Feature	Trend
2.0.65	Context awareness, status line	Tracking context usage
2.1.0	`context: fork` for skills	Isolation architecture
2.1.1	Agent frontmatter	Configurable agents
2.1.3	Skills = Commands unified	Simplification
2.2.x?	On-demand MCP?	Logical next step

The pieces are there. The architecture supports it. The need is clear.

Challenges to Consider

Challenge	Possible Solution
MCP startup latency	Warm pool, pre-connect on first mention
State after fork ends	Stateless design, session-level cache
Tool discovery	Lazy manifest - tools declared but not loaded
Credential scope	Environment inheritance with limits

These are solvable problems. The fork architecture already handles most of them.

The Vision: Claude Code as Universal Work Orchestrator

This proposal isn’t just about saving tokens. It’s about unlocking what Claude Code can truly become.

Today, Claude Code is a powerful coding assistant. With on-demand MCP loading, it transforms into something far more ambitious: a universal orchestrator for your entire digital work life.

The architecture pieces are already in place:

context: fork provides isolation
Agent/Skill frontmatter provides declaration
The MCP ecosystem provides integration

What’s missing is the connection: letting agents and skills declare and load their own MCPs on demand.

This is the natural next step. The question isn’t if this will happen, but when and how.

This proposal is published at claude-world.com as part of our ongoing exploration of Claude Code’s architectural possibilities.