The Future of MCP: From Always-On to On-Demand Loading
An architectural proposal for Claude Code's next evolution - assigning MCP servers to specific agents and skills instead of loading all at session start.
The Problem: Context Budget Under Pressure
If you’re using Claude Code with multiple MCP servers, you’ve probably noticed something: your context window starts filling up before you even type your first message.
Here’s a typical scenario:
| MCP Server | Approximate Token Cost | Source |
|---|---|---|
| GitHub (27 tools) | ~18,000 | Issue #11364 |
| AWS MCP servers | ~18,300 | Issue #7172 |
| Cloudflare | ~15,000+ | Community reports |
| Sentry | ~14,000 | Community reports |
| Playwright (21 tools) | ~13,647 | Scott Spence |
| Supabase | ~12,000+ | Community reports |
| Average per tool | ~550-850 | Issue #11364 |
Real-world impact: With 7 MCP servers active, tool definitions alone consume 67,300 tokens (33.7% of 200k context). Even a minimal 3-server setup consumes 42,600 tokens (21.3%).
The irony? Many of these MCPs won’t be used in a given session. That Sentry integration sitting at 14,000 tokens? You might not need it today. But it’s there, eating context, every single time.
The Observation: Fork Architecture is Already Here
Claude Code 2.1.x introduced a powerful feature: context: fork for skills. This allows skills to run in isolated contexts, with their own tool permissions and state.
# A skill with forked context
---
description: Deploy to production
context: fork
allowed-tools: [Bash, Read]
---
When this skill runs, it gets its own context bubble. Changes don’t pollute the main conversation. It’s clean, it’s isolated, it works.
Here’s the insight: If we can fork contexts, why can’t we fork MCP access too?
The Proposal: MCP Context Isolation
Important distinction: This proposal is fundamentally different from typical “lazy loading” approaches. We’re not suggesting dynamic runtime loading into the main context. Instead, we propose isolating MCPs into forked agent/skill contexts, keeping the main context permanently clean.
| Approach | Main Context | Loading Time | Complexity |
|---|---|---|---|
| Traditional Lazy Loading | Gets populated when MCP is needed | Runtime dynamic | High (state management) |
| Our Proposal: Context Isolation | Always stays clean | At fork creation | Low (uses existing context: fork) |
Traditional Lazy Loading:
Main Context ──[need MCP]──> Load MCP ──> Main Context (now occupied)
Our Proposal (Context Isolation):
Main Context (stays clean)
└── Fork Agent Context ──> Load MCPs ──> Isolated Context
└── Released when done
Imagine this architecture:
# agents/database-specialist.md
---
name: database-specialist
description: Database operations expert
tools: [Read, Bash, Grep]
mcp: [postgres, redis] # ← Only loads for this agent
context: fork
---
# skills/deploy/SKILL.md
---
description: Deploy to production
mcp: [vercel, github] # ← Only loads during /deploy
context: fork
---
The main session stays lean:
Main Session (Lean)
│
├── Base MCPs only: filesystem, memory
│ (~1,300 tokens instead of 20,000+)
│
├── Task: database-specialist (forked)
│ └── Loads: postgres, redis (only here)
│
└── Skill: /deploy (forked)
└── Loads: vercel, github (only here)
Real-World Example: The Modern Work Hub Problem
Here’s a scenario many of us face daily:
The Multi-Platform Reality
Modern knowledge workers juggle an overwhelming number of platforms:
| Category | Platforms |
|---|---|
| Code Hosting | GitHub, GitLab, Bitbucket |
| Project Management | Jira, Linear, Asana, Notion |
| Communication | Slack, Discord, Teams, Email |
| CI/CD | Vercel, Netlify, AWS |
| Monitoring | Sentry, Datadog, PagerDuty |
| CRM/Business | Salesforce, HubSpot, Stripe |
Each of these has an MCP. Now imagine the dilemma:
Option A: Install Everything
All MCPs loaded at session start:
- github (~2,000 tokens)
- gitlab (~2,500 tokens)
- slack (~3,000 tokens)
- jira (~4,000 tokens)
- notion (~2,500 tokens)
- sentry (~14,000 tokens)
- ...and 10 more
Total: 50,000+ tokens before you even start working.
That's potentially 50% of your context budget. Gone.
Option B: Separate by Project
Project A: github + vercel
Project B: gitlab + jira
Project C: slack + notion
This defeats the entire purpose. Claude Code’s power is being a unified command center - one place to orchestrate all your tools. Fragmenting by project means:
- Switching contexts constantly
- Losing cross-platform insights
- No unified workflow automation
Option C: The On-Demand Future
# skills/work-hub/SKILL.md
---
description: Unified work management hub
mcp:
communication: [slack, discord]
code: [github, gitlab]
projects: [jira, notion]
monitoring: [sentry]
context: fork
---
Now Claude Code becomes what it should be: a true central nervous system for your digital work life. Need to:
- Check GitHub PR + notify Slack + update Jira? One command.
- Review GitLab MR + post to Discord + log in Notion? One command.
- Debug Sentry error + create GitHub issue + assign in Linear? One command.
Each workflow loads only the MCPs it needs. Your main context stays pristine.
Current Workaround (Not Ideal)
Yes, you can use skills today to wrap functionality without MCPs - making API calls directly via bash/curl. But that’s:
- More fragile (no MCP error handling)
- More verbose (raw API vs. semantic tools)
- Missing the point (we have MCPs for a reason)
The critical issue is authentication:
| Aspect | MCP | Skills + Scripts |
|---|---|---|
| Credential Management | Centralized in settings.json | Scattered across .env, scripts, env vars |
| Security | Environment isolation | Risk of exposure in logs/shell history |
| Token Refresh | Handled automatically | Manual implementation required |
| Error Handling | Standardized responses | Different per API |
# MCP approach - Clean & Secure
mcp: [github]
# Credentials in settings.json, isolated, never exposed
# Script approach - Credentials scattered everywhere
# Option 1: .env file (needs management)
# Option 2: Hardcoded in script (dangerous)
# Option 3: Pass every time (tedious, error-prone)
MCP’s value isn’t just the tools—it’s the centralized, secure credential management. Context Isolation preserves this benefit while solving the context consumption problem.
The on-demand architecture isn’t just an optimization. It’s what unlocks Claude Code’s potential as a universal work orchestrator.
Why This Makes Sense
1. Context Efficiency
Your main conversation keeps its full context budget. MCPs load only when the specific agent or skill that needs them runs.
2. Granular Permissions
Instead of “this session has access to everything,” you get layered control:
Layer 0: Main Context (minimal)
└── filesystem (read-only), memory
Layer 1: Development Agents
└── code-reviewer: + git (read)
└── debugger: + bash (sandboxed)
Layer 2: Specialized Skills
└── /deploy: + vercel, github (push)
└── /db-migrate: + postgres (write)
Layer 3: Admin Operations
└── /production-access: all (with confirmation)
3. Progressive Security
Rather than “all or nothing” permissions, you get defense in depth. A code review doesn’t need database write access. A deployment doesn’t need Sentry access.
4. Ecosystem Scalability
The MCP ecosystem is exploding. Dozens of new servers every week. The “load everything at start” model simply doesn’t scale.
Implementation Possibilities
The ideal solution combines two-sided configuration for maximum flexibility and backward compatibility:
MCP-Side: Lazy Loading Flag
In settings.json, each MCP can declare whether it should load at session start:
{
"mcpServers": {
"memory": {
"command": "...",
"lazy": false // Always load (default, backward compatible)
},
"github": {
"command": "...",
"lazy": true // Don't load until requested
},
"postgres": {
"command": "...",
"lazy": true // Don't load until requested
}
}
}
Backward compatibility: Omitting lazy or setting lazy: false maintains current behavior.
Agent/Skill-Side: Frontmatter Declaration
Agents and skills declare which MCPs they need:
# agents/database-specialist.md
---
name: database-specialist
description: Database operations expert
tools: [Read, Bash, Grep]
mcp:
required: [postgres] # Must have
optional: [redis] # Nice to have
context: fork
---
Loading Logic
MCP lazy Setting | Agent/Skill Declaration | Result |
|---|---|---|
false (or omitted) | - | ✅ Load at session start (current behavior) |
true | Not declared | ❌ Don’t load |
true | mcp: [xxx] | ✅ Load when agent/skill runs |
This dual approach enables:
- Gradual migration: Move heavy MCPs to
lazy: trueone at a time - Zero breaking changes: Existing configs work unchanged
- Fine-grained control: Both infrastructure and application level settings
The Evidence: Claude Code is Heading This Way
Look at the evolution:
| Version | Feature | Trend |
|---|---|---|
| 2.0.65 | Context awareness, status line | Tracking context usage |
| 2.1.0 | context: fork for skills | Isolation architecture |
| 2.1.1 | Agent frontmatter | Configurable agents |
| 2.1.3 | Skills = Commands unified | Simplification |
| 2.2.x? | On-demand MCP? | Logical next step |
The pieces are there. The architecture supports it. The need is clear.
Challenges to Consider
| Challenge | Possible Solution |
|---|---|
| MCP startup latency | Warm pool, pre-connect on first mention |
| State after fork ends | Stateless design, session-level cache |
| Tool discovery | Lazy manifest - tools declared but not loaded |
| Credential scope | Environment inheritance with limits |
These are solvable problems. The fork architecture already handles most of them.
The Vision: Claude Code as Universal Work Orchestrator
This proposal isn’t just about saving tokens. It’s about unlocking what Claude Code can truly become.
Today, Claude Code is a powerful coding assistant. With on-demand MCP loading, it transforms into something far more ambitious: a universal orchestrator for your entire digital work life.
The architecture pieces are already in place:
context: forkprovides isolation- Agent/Skill frontmatter provides declaration
- The MCP ecosystem provides integration
What’s missing is the connection: letting agents and skills declare and load their own MCPs on demand.
This is the natural next step. The question isn’t if this will happen, but when and how.
This proposal is published at claude-world.com as part of our ongoing exploration of Claude Code’s architectural possibilities.