Here's a simple experiment. Ask Claude to evaluate a startup idea, a productivity tool for remote teams, nothing special. You'll get a competent, bloodless assessment about "target market considerations" and "competitive differentiation." Four paragraphs of consulting-speak.
Now load a single file, 1,875 tokens of structured context describing how Paul Graham thinks, communicates, and evaluates ideas, and ask the same question.
The response is unrecognisable. Claude stops talking about "target markets" and starts asking whether the founders have personally experienced the problem. It flags that the idea is hiding behind what PG calls the "unsexy filter": nobody wants to build boring tools, which is exactly why boring tools are fertile territory. It applies the well-vs-broad-shallow test: is there a narrow group that desperately needs this, or a broad group that mildly wants it? It catches the pitch's implicit assumption that growth will come from marketing rather than from the product being genuinely useful, and calls it out as "test-hacking," PG's term for optimising proxies rather than building something people actually want.
Same model. Same question. The only variable is 1,875 tokens of structured context.
- Structured context
A designed representation of how a specific professional thinks, communicates, and operates, built from deliberate observation and loaded into an AI tool per session. Unlike platform memory, which captures facts passively, structured context captures reasoning patterns, decision frameworks, and voice. You own it, you control it, it works across any model.
To make the delta concrete: I pitched Learned Context to Claude twice, once plain, once with the PG profile loaded, and asked what Paul Graham would say. Here's what came back.
I think the idea has potential. Paul Graham would likely evaluate it through several lenses:
Market opportunity: The AI tools market is growing rapidly, and there's increasing demand for personalisation. Context engineering is an emerging discipline, and positioning yourself at the intersection of professional productivity and AI could be valuable.
Key considerations: Is there a clear target market? Professionals is broad, you'll need to narrow down. What's the competitive moat? Platform memory features are improving rapidly. How do you acquire customers?
PG would probably ask: Are you building something people want? Have you talked to potential users? What does the MVP look like?
I'd recommend focusing on a specific vertical first, maybe consultants or executives, and proving the value there before expanding.
The first question is whether you've personally experienced this problem, not whether the market is large. Have you? Because the pitch describes a system for professionals, but I want to know if you built it because you genuinely couldn't get useful output from your AI without structured context, or because you saw "context engineering" trending and decided it was a market.
The unsexy filter is working in your favour. Nobody wants to sit down and do a calibration session. That's schlep. But schlep is signal: if the upfront investment produces output that's measurably different, the people who do the work will never go back.
The real question is the well-vs-broad-shallow test. Is there a narrow group who desperately need this, who would be genuinely upset if it disappeared, or a broad group who think it sounds cool but never finish the calibration?
The platforms will add better memory. Your bet is that passive, platform-controlled memory hits a ceiling. Is the structured approach architecturally superior, or just temporarily ahead? Everything depends on that.
- A first-person experiment: building a Paul Graham AI profile and testing it against vanilla Claude
- Why platform memory (ChatGPT, Claude) captures facts about you but not how you think
- The research on context degradation: why compact, structured context outperforms larger, unstructured context
- What a demo profile can and can't do, and why building your own matters more
The term arrived before the practice
On June 18, 2025, Tobi Lutke posted on X that he preferred "context engineering" over "prompt engineering" because "it describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM."1 A week later, Andrej Karpathy endorsed the term: "context engineering is the delicate art and science of filling the context window with just the right information for the next step."2 In September, Anthropic published an engineering blog post formalising it as a discipline: "the set of strategies for curating and maintaining the optimal set of tokens during LLM inference."3
"Building with language models is becoming less about finding the right words and more about answering a broader question: what configuration of context is most likely to generate the model's desired behaviour?"
Anthropic, "Effective Context Engineering for AI Agents," September 2025
By November, MIT Technology Review framed the arc from "vibe coding" to context engineering as the story of 2025 in software development.4
But almost all of this discourse is about developer tooling. Agents, RAG pipelines, MCP servers, multi-agent architectures. The context being engineered is code context, tool context, system context. Almost nobody is applying context engineering to a structured representation of how a specific professional thinks, communicates, and operates.
What your AI actually remembers about you
If you use ChatGPT with memory enabled, the system decides what to keep, passively, based on what it thinks matters. You don't control the structure. You don't control what gets stored or discarded.
In 2025, it failed catastrophically. Twice.
On February 5th, a backend update wiped months of accumulated context for users across the platform: personalised preferences, project details, fictional worlds that creative writers had built over months of interaction. Over 300 complaint threads appeared on r/ChatGPTPro alone.5 No post-mortem from OpenAI. No recovery. On November 6th, it happened again.6
Claude's memory system, which rolled out to all plans in March 2026, is more thoughtful. It uses "Memory Synthesis," processing conversations roughly every 24 hours and distilling what it considers long-term-worthy.7 But it is still passive, still platform-controlled, still optimised for what the platform decides "work-relevant" means rather than what you do.
These are memory features, not memory systems. A feature is something the platform adds to its product. A system is something you build, own, and control. When ChatGPT's memory wiped, users lost everything because they'd outsourced their context to the platform. If your context lives in a structured file you own, the worst that happens is you close the tab.
| Dimension | Platform memory | Structured context |
|---|---|---|
| Ownership | Platform-controlled | User-owned file |
| Structure | Passive, unstructured | Designed schema |
| Portability | Locked to one platform | Works across any LLM |
| Wipe risk | Two platform failures in 2025 | Local file, your backup |
| Day-one effort | Zero (auto-captures) | Calibration session required |
| Depth at month six | Name, role, surface preferences | Reasoning patterns, stakeholder dynamics, voice |
The physics of context
The argument for structured context isn't only about platform reliability. There is a deeper structural problem, and it shows up in the research.
In 2025, Chroma published a study testing 18 frontier models, GPT-4.1, Claude Opus 4, Gemini 2.5 Pro, and others, on how they handle increasing context length.8 Every single model degraded as input grew. Performance declined continuously across all tested lengths: there was no safe zone below a model's stated maximum. Models claiming 200K-token context windows degraded well before reaching capacity.
The mechanisms are architectural. The "lost in the middle" problem, documented by Liu et al. at Stanford and the University of Washington, showed that LLMs exhibit a U-shaped attention pattern: tokens at the beginning and end of the context receive disproportionate attention, while information in the middle gets de-emphasised regardless of relevance.9 This is a property of how Rotary Position Embedding encodes distance between tokens in transformer attention. Newer models have narrowed the effect, but Chroma's results confirm it persists across every frontier model tested.
Chroma also isolated a third mechanism: distractor interference. Adding semantically similar but irrelevant content causes degradation beyond what context length alone explains. The noise-to-signal ratio compounds with every additional piece of context.
The fix isn't bigger windows. It's better structure.
Anthropic demonstrated this directly. Their multi-agent research system, an Opus 4 lead agent delegating to Sonnet 4 subagents with clean, focused context windows, outperformed a single Opus 4 agent by 90.2% on research tasks.10 The improvement came entirely from managing context, not from a better model.
What I built and what it proved
The Paul Graham demo profile started as an experiment. Could I build a profile of a well-known thinker that let people experience context engineering without building their own?
I chose PG because his thinking patterns are extensively documented: over 200 essays spanning two decades, plus interviews, talks, and the autobiographical "What I Worked On." I drew on roughly 30 of these for the profile, selecting essays that most clearly revealed his reasoning patterns and decision frameworks. If you can't build a strong context profile from that corpus, the approach doesn't work.
The process followed Learned Context's v0.3 calibration engine, a seven-stage process that extracts reasoning patterns, communication style, domain expertise, operational habits, and stakeholder dynamics from representative source material. Every observation had to pass three tests: distinctiveness (would this be true of most senior professionals?), contrastive depth (what does the person do and not do?), and actionability (if removed, would the AI produce different output?).
What emerged was a two-file system: a core profile at 1,875 tokens and a context library at 3,627 tokens. The core profile loads every session, enough for the AI to reason and communicate in PG's patterns. The context library loads alongside when deeper framework application or voice fidelity is needed. Together they occupy 5,500 tokens and produce recognisably different AI behaviour within seconds.
The split mirrors how professional context actually works. The profile is what a calibration session produces on day one. The context library represents what compounds over months of use: frameworks confirmed, voice patterns refined, stakeholder dynamics observed across dozens of interactions. For this demo I pre-built it from essays. For a real user it accumulates automatically.
What a demo profile can and can't do
The PG profile captures frameworks, decision patterns, and communication style extracted from published material. It can make an AI reason through problems the way PG reasons, write in his conversational-but-precise register, and apply his named frameworks, schlep filter, well-vs-broad-shallow, bus ticket theory, to new situations.
It cannot be Paul Graham. It doesn't have his lived experience: the years painting in Florence, the specific founders he's backed, the private doubts that never made it into essays. It doesn't have the tacit knowledge that comes from founding YC and watching three thousand companies succeed or fail. What it produces is Paul Graham lite: a thinking partner that reasons in his patterns, not a replacement for his judgement.
Individual memory, the full texture of lived professional experience, is not something that can be captured from the outside. That is precisely why building your own profile matters more than test-driving someone else's.
The demo shows what context engineering can do with public material alone. Your own profile, built from your actual decisions, conversations, and stakeholders, captures things no external observer could reconstruct. You are the only source of your own signal.
The compounding problem
Context engineering for professionals faces a fair objection: platform memory is passive. It just works (when it doesn't wipe). A structured system requires upfront investment.
I'll concede it: platform memory is easier on day one. If all you need is an AI that remembers your name and job title, it's fine.
But context compounds.
A professional who has been using a structured context system for six months has a profile that captures their reasoning architecture, not "prefers data-driven decisions," but "separates decisions by reversibility, not importance, and moves at 60% confidence on reversible choices." It captures their communication patterns across audiences, not "professional tone," but "formality stays constant, directness shifts: no preamble for direct reports, full build-from-premise for the board." It captures stakeholder dynamics that no platform would ever store: who they defer to on which domains, whose commitments need follow-up, where trust has been established or broken.
That accumulated context changes what the AI can do. Not incrementally. Structurally.
What to watch
Capital is flowing into context infrastructure. Mem0 raised $24 million in October 2025 to build "the memory layer for AI apps," processing 186 million API calls per quarter by Q3.11 Meta acquired Limitless AI, the pendant that recorded your conversations, in December 2025.12
But there is a gap. Mem0 is developer infrastructure: an API, not a user-facing system. Limitless was passive capture: recording everything and hoping the AI figures out what matters. The second-brain tools (Obsidian, Notion, Capacities) store documents, not professional identity.
If structured context can produce recognisably different AI behaviour from public essays alone, material reconstructed from the outside, the question is what it produces when the source material is you.
Same model. Same question. Different context. Different everything.
Frequently asked questions
- Tobi Lutke, post on X, June 18, 2025.
- Andrej Karpathy, post on X, June 25, 2025.
- Anthropic, "Effective Context Engineering for AI Agents," September 29, 2025.
- MIT Technology Review, "From Vibe Coding to Context Engineering: 2025 in Software Development," November 5, 2025.
- WebProNews, "ChatGPT's Fading Recall: Inside the 2025 Memory Wipe Crisis," 2025.
- Piunikaweb, "Some ChatGPT Users Report Memories Wiped Clean," November 5, 2025.
- Claude Help Center, Release Notes, March 2026.
- Chroma Research, "Context Rot: How Increasing Input Tokens Impacts LLM Performance," 2025.
- Nelson F. Liu et al., "Lost in the Middle: How Language Models Use Long Contexts," Stanford and University of Washington, 2023.
- Anthropic, "How We Built Our Multi-Agent Research System," June 14, 2025.
- TechCrunch, "Mem0 Raises $24M from YC, Peak XV and Basis Set," October 28, 2025.
- TechCrunch, "Meta Acquires AI Device Startup Limitless," December 5, 2025.
