Context Engineering

I Built a Paul Graham AI Profile

Q: What is a context engineering profile?

A context engineering profile is a structured representation of how a specific professional thinks, communicates, and operates. Unlike platform memory, which captures facts passively, a profile captures reasoning patterns, decision frameworks, and voice. It loads per session, works across any AI tool, and stays under the user's control.

Q: Can you build an AI profile of someone from public material?

Yes, but with limits. A profile built from public sources captures frameworks, reasoning patterns, and communication style. It cannot capture lived experience, tacit knowledge, or private judgement. The Paul Graham demo profile, built from roughly 30 published essays, produces recognisably different AI behaviour, but it is a thinking partner that reasons in his patterns, not a replacement for his judgement.

Q: Why does AI output quality degrade with more context?

Research by Chroma (2025) found that all 18 frontier models tested showed continuous performance degradation as context length increased. The mechanisms are architectural: "lost in the middle" attention patterns, position encoding biases, and distractor interference from semantically similar but irrelevant content. The fix is not larger context windows but more structured, compact context.

Q: How is structured context different from ChatGPT's memory?

Platform memory captures facts about you passively: your name, preferences, recurring topics. Structured context captures how you think: decision frameworks, reasoning patterns, communication style. Memory is platform-locked and has suffered multiple wipe events. Structured context is a file you own, edit, and use across any model.

What happens when you load 1,875 tokens of structured context into Claude? A first-person experiment in context engineering with a Paul Graham AI profile.

PublishedApr 15, 2026
UpdatedApr 15, 2026
Reading time12 min read

Here's a simple experiment. Ask Claude to evaluate a startup idea, a productivity tool for remote teams, nothing special. You'll get a competent, bloodless assessment about "target market considerations" and "competitive differentiation." Four paragraphs of consulting-speak.

Now load a single file, 1,875 tokens of structured context describing how Paul Graham thinks, communicates, and evaluates ideas, and ask the same question.

The response is unrecognisable. Claude stops talking about "target markets" and starts asking whether the founders have personally experienced the problem. It flags that the idea is hiding behind what PG calls the "unsexy filter": nobody wants to build boring tools, which is exactly why boring tools are fertile territory. It applies the well-vs-broad-shallow test: is there a narrow group that desperately needs this, or a broad group that mildly wants it? It catches the pitch's implicit assumption that growth will come from marketing rather than from the product being genuinely useful, and calls it out as "test-hacking," PG's term for optimising proxies rather than building something people actually want.

Same model. Same question. The only variable is 1,875 tokens of structured context.

Structured context: A designed representation of how a specific professional thinks, communicates, and operates, built from deliberate observation and loaded into an AI tool per session. Unlike platform memory, which captures facts passively, structured context captures reasoning patterns, decision frameworks, and voice. You own it, you control it, it works across any model.

To make the delta concrete: I pitched Learned Context to Claude twice, once plain, once with the PG profile loaded, and asked what Paul Graham would say. Here's what came back.

Without profile

I think the idea has potential. Paul Graham would likely evaluate it through several lenses:

Market opportunity: The AI tools market is growing rapidly, and there's increasing demand for personalisation. Context engineering is an emerging discipline, and positioning yourself at the intersection of professional productivity and AI could be valuable.

Key considerations: Is there a clear target market? Professionals is broad, you'll need to narrow down. What's the competitive moat? Platform memory features are improving rapidly. How do you acquire customers?

PG would probably ask: Are you building something people want? Have you talked to potential users? What does the MVP look like?

I'd recommend focusing on a specific vertical first, maybe consultants or executives, and proving the value there before expanding.

With profile loaded

The first question is whether you've personally experienced this problem, not whether the market is large. Have you? Because the pitch describes a system for professionals, but I want to know if you built it because you genuinely couldn't get useful output from your AI without structured context, or because you saw "context engineering" trending and decided it was a market.

The unsexy filter is working in your favour. Nobody wants to sit down and do a calibration session. That's schlep. But schlep is signal: if the upfront investment produces output that's measurably different, the people who do the work will never go back.

The real question is the well-vs-broad-shallow test. Is there a narrow group who desperately need this, who would be genuinely upset if it disappeared, or a broad group who think it sounds cool but never finish the calibration?

The platforms will add better memory. Your bet is that passive, platform-controlled memory hits a ceiling. Is the structured approach architecturally superior, or just temporarily ahead? Everything depends on that.

Same model (Claude), same prompt ("What would Paul Graham say about this startup idea?"), same session. Left: no context. Right: 1,875-token profile loaded.

Same model (Claude), same prompt, same session. Left: no context. Right: 1,875-token profile loaded.

What this article covers

A first-person experiment: building a Paul Graham AI profile and testing it against vanilla Claude
Why platform memory (ChatGPT, Claude) captures facts about you but not how you think
The research on context degradation: why compact, structured context outperforms larger, unstructured context
What a demo profile can and can't do, and why building your own matters more

The term arrived before the practice

On June 18, 2025, Tobi Lutke posted on X that he preferred "context engineering" over "prompt engineering" because "it describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM."¹ A week later, Andrej Karpathy endorsed the term: "context engineering is the delicate art and science of filling the context window with just the right information for the next step."² In September, Anthropic published an engineering blog post formalising it as a discipline: "the set of strategies for curating and maintaining the optimal set of tokens during LLM inference."³

"Building with language models is becoming less about finding the right words and more about answering a broader question: what configuration of context is most likely to generate the model's desired behaviour?"

Anthropic, "Effective Context Engineering for AI Agents," September 2025

By November, MIT Technology Review framed the arc from "vibe coding" to context engineering as the story of 2025 in software development.⁴

But almost all of this discourse is about developer tooling. Agents, RAG pipelines, MCP servers, multi-agent architectures. The context being engineered is code context, tool context, system context. Almost nobody is applying context engineering to a structured representation of how a specific professional thinks, communicates, and operates.

What your AI actually remembers about you

If you use ChatGPT with memory enabled, the system decides what to keep, passively, based on what it thinks matters. You don't control the structure. You don't control what gets stored or discarded.

In 2025, it failed catastrophically. Twice.

On February 5th, a backend update wiped months of accumulated context for users across the platform: personalised preferences, project details, fictional worlds that creative writers had built over months of interaction. Over 300 complaint threads appeared on r/ChatGPTPro alone.⁵ No post-mortem from OpenAI. No recovery. On November 6th, it happened again.⁶

Claude's memory system, which rolled out to all plans in March 2026, is more thoughtful. It uses "Memory Synthesis," processing conversations roughly every 24 hours and distilling what it considers long-term-worthy.⁷ But it is still passive, still platform-controlled, still optimised for what the platform decides "work-relevant" means rather than what you do.

These are memory features, not memory systems. A feature is something the platform adds to its product. A system is something you build, own, and control. When ChatGPT's memory wiped, users lost everything because they'd outsourced their context to the platform. If your context lives in a structured file you own, the worst that happens is you close the tab.

Dimension	Platform memory	Structured context
Ownership	Platform-controlled	User-owned file
Structure	Passive, unstructured	Designed schema
Portability	Locked to one platform	Works across any LLM
Wipe risk	Two platform failures in 2025	Local file, your backup
Day-one effort	Zero (auto-captures)	Calibration session required
Depth at month six	Name, role, surface preferences	Reasoning patterns, stakeholder dynamics, voice

The physics of context

The argument for structured context isn't only about platform reliability. There is a deeper structural problem, and it shows up in the research.

In 2025, Chroma published a study testing 18 frontier models, GPT-4.1, Claude Opus 4, Gemini 2.5 Pro, and others, on how they handle increasing context length.⁸ Every single model degraded as input grew. Performance declined continuously across all tested lengths: there was no safe zone below a model's stated maximum. Models claiming 200K-token context windows degraded well before reaching capacity.

100%75%50%25%

No safe threshold: degradation begins immediately

0K50K100K150K200K

Context window tokens

Best performance

Moderate degradation

Severe degradation

Chroma, "Context Rot," 2025. Composite across GPT-4.1, Claude Opus 4, Gemini 2.5 Pro, and 15 others.

The mechanisms are architectural. The "lost in the middle" problem, documented by Liu et al. at Stanford and the University of Washington, showed that LLMs exhibit a U-shaped attention pattern: tokens at the beginning and end of the context receive disproportionate attention, while information in the middle gets de-emphasised regardless of relevance.⁹ This is a property of how Rotary Position Embedding encodes distance between tokens in transformer attention. Newer models have narrowed the effect, but Chroma's results confirm it persists across every frontier model tested.

Chroma also isolated a third mechanism: distractor interference. Adding semantically similar but irrelevant content causes degradation beyond what context length alone explains. The noise-to-signal ratio compounds with every additional piece of context.

The fix isn't bigger windows. It's better structure.

Anthropic demonstrated this directly. Their multi-agent research system, an Opus 4 lead agent delegating to Sonnet 4 subagents with clean, focused context windows, outperformed a single Opus 4 agent by 90.2% on research tasks.¹⁰ The improvement came entirely from managing context, not from a better model.

What I built and what it proved

The Paul Graham demo profile started as an experiment. Could I build a profile of a well-known thinker that let people experience context engineering without building their own?

I chose PG because his thinking patterns are extensively documented: over 200 essays spanning two decades, plus interviews, talks, and the autobiographical "What I Worked On." I drew on roughly 30 of these for the profile, selecting essays that most clearly revealed his reasoning patterns and decision frameworks. If you can't build a strong context profile from that corpus, the approach doesn't work.

The process followed Learned Context's v0.3 calibration engine, a seven-stage process that extracts reasoning patterns, communication style, domain expertise, operational habits, and stakeholder dynamics from representative source material. Every observation had to pass three tests: distinctiveness (would this be true of most senior professionals?), contrastive depth (what does the person do and not do?), and actionability (if removed, would the AI produce different output?).

Core Profile

1,875 tokens. Loads every session

Reasoning Architecture

Decision frameworks, quality standards

Voice

Communication style, constraints

Domain + Identity

Expertise, role, context

Operations + Stakeholders

How work gets done, who matters

Day one

Context Library

3,627 tokens. Loads alongside per task

Framework Library

17 named frameworks as reference

Voice Examples + Task Cards

Golden examples, tone dials, audience rules

Verification + Anti-Patterns

10 verification habits, 14 anti-patterns

Career Arc + Stakeholders

History, relationships, context

Month six

The core profile is what calibration produces on day one. The context library is what compounds over months of use.

What emerged was a two-file system: a core profile at 1,875 tokens and a context library at 3,627 tokens. The core profile loads every session, enough for the AI to reason and communicate in PG's patterns. The context library loads alongside when deeper framework application or voice fidelity is needed. Together they occupy 5,500 tokens and produce recognisably different AI behaviour within seconds.

The split mirrors how professional context actually works. The profile is what a calibration session produces on day one. The context library represents what compounds over months of use: frameworks confirmed, voice patterns refined, stakeholder dynamics observed across dozens of interactions. For this demo I pre-built it from essays. For a real user it accumulates automatically.

What a demo profile can and can't do

The PG profile captures frameworks, decision patterns, and communication style extracted from published material. It can make an AI reason through problems the way PG reasons, write in his conversational-but-precise register, and apply his named frameworks, schlep filter, well-vs-broad-shallow, bus ticket theory, to new situations.

It cannot be Paul Graham. It doesn't have his lived experience: the years painting in Florence, the specific founders he's backed, the private doubts that never made it into essays. It doesn't have the tacit knowledge that comes from founding YC and watching three thousand companies succeed or fail. What it produces is Paul Graham lite: a thinking partner that reasons in his patterns, not a replacement for his judgement.

Individual memory, the full texture of lived professional experience, is not something that can be captured from the outside. That is precisely why building your own profile matters more than test-driving someone else's.

The demo shows what context engineering can do with public material alone. Your own profile, built from your actual decisions, conversations, and stakeholders, captures things no external observer could reconstruct. You are the only source of your own signal.

The compounding problem

Context engineering for professionals faces a fair objection: platform memory is passive. It just works (when it doesn't wipe). A structured system requires upfront investment.

I'll concede it: platform memory is easier on day one. If all you need is an AI that remembers your name and job title, it's fine.

But context compounds.

HighLow

Day 1Month 1Month 3Month 6Month 12

Structured context

Platform memory

Illustrative. Structured context compounds because each session refines the profile. Platform memory plateaus because storage is passive and fixed-size.

A professional who has been using a structured context system for six months has a profile that captures their reasoning architecture, not "prefers data-driven decisions," but "separates decisions by reversibility, not importance, and moves at 60% confidence on reversible choices." It captures their communication patterns across audiences, not "professional tone," but "formality stays constant, directness shifts: no preamble for direct reports, full build-from-premise for the board." It captures stakeholder dynamics that no platform would ever store: who they defer to on which domains, whose commitments need follow-up, where trust has been established or broken.

That accumulated context changes what the AI can do. Not incrementally. Structurally.

What to watch

Capital is flowing into context infrastructure. Mem0 raised $24 million in October 2025 to build "the memory layer for AI apps," processing 186 million API calls per quarter by Q3.¹¹ Meta acquired Limitless AI, the pendant that recorded your conversations, in December 2025.¹²

But there is a gap. Mem0 is developer infrastructure: an API, not a user-facing system. Limitless was passive capture: recording everything and hoping the AI figures out what matters. The second-brain tools (Obsidian, Notion, Capacities) store documents, not professional identity.

If structured context can produce recognisably different AI behaviour from public essays alone, material reconstructed from the outside, the question is what it produces when the source material is you.

Same model. Same question. Different context. Different everything.

Frequently asked questions

What is a context engineering profile?

Can you build an AI profile of someone from public material?

Why does AI output quality degrade with more context?

How is structured context different from ChatGPT's memory?

Tobi Lutke, post on X, June 18, 2025.
Andrej Karpathy, post on X, June 25, 2025.
Anthropic, "Effective Context Engineering for AI Agents," September 29, 2025.
MIT Technology Review, "From Vibe Coding to Context Engineering: 2025 in Software Development," November 5, 2025.
WebProNews, "ChatGPT's Fading Recall: Inside the 2025 Memory Wipe Crisis," 2025.
Piunikaweb, "Some ChatGPT Users Report Memories Wiped Clean," November 5, 2025.
Claude Help Center, Release Notes, March 2026.
Chroma Research, "Context Rot: How Increasing Input Tokens Impacts LLM Performance," 2025.
Nelson F. Liu et al., "Lost in the Middle: How Language Models Use Long Contexts," Stanford and University of Washington, 2023.
Anthropic, "How We Built Our Multi-Agent Research System," June 14, 2025.
TechCrunch, "Mem0 Raises $24M from YC, Peak XV and Basis Set," October 28, 2025.
TechCrunch, "Meta Acquires AI Device Startup Limitless," December 5, 2025.

Turn the framework into a working system.

Membership gives you the inference engine, conductor, and portable profile behind the method.

Join Membership Audit your AI

Context Engineering

I Built a Paul Graham AI Profile

What happens when you load 1,875 tokens of structured context into Claude? A first-person experiment in context engineering with a Paul Graham AI profile.

PublishedApr 15, 2026
UpdatedApr 15, 2026
Reading time12 min read

Now load a single file, 1,875 tokens of structured context describing how Paul Graham thinks, communicates, and evaluates ideas, and ask the same question.

Same model. Same question. The only variable is 1,875 tokens of structured context.

Structured context: A designed representation of how a specific professional thinks, communicates, and operates, built from deliberate observation and loaded into an AI tool per session. Unlike platform memory, which captures facts passively, structured context captures reasoning patterns, decision frameworks, and voice. You own it, you control it, it works across any model.

To make the delta concrete: I pitched Learned Context to Claude twice, once plain, once with the PG profile loaded, and asked what Paul Graham would say. Here's what came back.

Without profile

I think the idea has potential. Paul Graham would likely evaluate it through several lenses:

PG would probably ask: Are you building something people want? Have you talked to potential users? What does the MVP look like?

I'd recommend focusing on a specific vertical first, maybe consultants or executives, and proving the value there before expanding.

With profile loaded

Same model (Claude), same prompt ("What would Paul Graham say about this startup idea?"), same session. Left: no context. Right: 1,875-token profile loaded.

Same model (Claude), same prompt, same session. Left: no context. Right: 1,875-token profile loaded.

What this article covers

A first-person experiment: building a Paul Graham AI profile and testing it against vanilla Claude
Why platform memory (ChatGPT, Claude) captures facts about you but not how you think
The research on context degradation: why compact, structured context outperforms larger, unstructured context
What a demo profile can and can't do, and why building your own matters more

The term arrived before the practice

"Building with language models is becoming less about finding the right words and more about answering a broader question: what configuration of context is most likely to generate the model's desired behaviour?"

Anthropic, "Effective Context Engineering for AI Agents," September 2025

By November, MIT Technology Review framed the arc from "vibe coding" to context engineering as the story of 2025 in software development.⁴

What your AI actually remembers about you

If you use ChatGPT with memory enabled, the system decides what to keep, passively, based on what it thinks matters. You don't control the structure. You don't control what gets stored or discarded.

In 2025, it failed catastrophically. Twice.

Dimension	Platform memory	Structured context
Ownership	Platform-controlled	User-owned file
Structure	Passive, unstructured	Designed schema
Portability	Locked to one platform	Works across any LLM
Wipe risk	Two platform failures in 2025	Local file, your backup
Day-one effort	Zero (auto-captures)	Calibration session required
Depth at month six	Name, role, surface preferences	Reasoning patterns, stakeholder dynamics, voice

The physics of context

The argument for structured context isn't only about platform reliability. There is a deeper structural problem, and it shows up in the research.

100%75%50%25%

No safe threshold: degradation begins immediately

0K50K100K150K200K

Context window tokens

Best performance

Moderate degradation

Severe degradation

Chroma, "Context Rot," 2025. Composite across GPT-4.1, Claude Opus 4, Gemini 2.5 Pro, and 15 others.

The fix isn't bigger windows. It's better structure.

What I built and what it proved

The Paul Graham demo profile started as an experiment. Could I build a profile of a well-known thinker that let people experience context engineering without building their own?

Core Profile

1,875 tokens. Loads every session

Reasoning Architecture

Decision frameworks, quality standards

Voice

Communication style, constraints

Domain + Identity

Expertise, role, context

Operations + Stakeholders

How work gets done, who matters

Day one

Context Library

3,627 tokens. Loads alongside per task

Framework Library

17 named frameworks as reference

Voice Examples + Task Cards

Golden examples, tone dials, audience rules

Verification + Anti-Patterns

10 verification habits, 14 anti-patterns

Career Arc + Stakeholders

History, relationships, context

Month six

The core profile is what calibration produces on day one. The context library is what compounds over months of use.

What a demo profile can and can't do

Individual memory, the full texture of lived professional experience, is not something that can be captured from the outside. That is precisely why building your own profile matters more than test-driving someone else's.

The compounding problem

Context engineering for professionals faces a fair objection: platform memory is passive. It just works (when it doesn't wipe). A structured system requires upfront investment.

I'll concede it: platform memory is easier on day one. If all you need is an AI that remembers your name and job title, it's fine.

But context compounds.

HighLow

Day 1Month 1Month 3Month 6Month 12

Structured context

Platform memory

Illustrative. Structured context compounds because each session refines the profile. Platform memory plateaus because storage is passive and fixed-size.

That accumulated context changes what the AI can do. Not incrementally. Structurally.

What to watch

Same model. Same question. Different context. Different everything.

Frequently asked questions

What is a context engineering profile?

Can you build an AI profile of someone from public material?

Why does AI output quality degrade with more context?

How is structured context different from ChatGPT's memory?

Tobi Lutke, post on X, June 18, 2025.
Andrej Karpathy, post on X, June 25, 2025.
Anthropic, "Effective Context Engineering for AI Agents," September 29, 2025.
MIT Technology Review, "From Vibe Coding to Context Engineering: 2025 in Software Development," November 5, 2025.
WebProNews, "ChatGPT's Fading Recall: Inside the 2025 Memory Wipe Crisis," 2025.
Piunikaweb, "Some ChatGPT Users Report Memories Wiped Clean," November 5, 2025.
Claude Help Center, Release Notes, March 2026.
Chroma Research, "Context Rot: How Increasing Input Tokens Impacts LLM Performance," 2025.
Nelson F. Liu et al., "Lost in the Middle: How Language Models Use Long Contexts," Stanford and University of Washington, 2023.
Anthropic, "How We Built Our Multi-Agent Research System," June 14, 2025.
TechCrunch, "Mem0 Raises $24M from YC, Peak XV and Basis Set," October 28, 2025.
TechCrunch, "Meta Acquires AI Device Startup Limitless," December 5, 2025.

Turn the framework into a working system.

Membership gives you the inference engine, conductor, and portable profile behind the method.

Join Membership Audit your AI

The term arrived before the practice

What your AI actually remembers about you

The physics of context

What I built and what it proved

What a demo profile can and can't do

The compounding problem

What to watch

Frequently asked questions

Turn the framework into a working system.

Continue reading

The term arrived before the practice

What your AI actually remembers about you

The physics of context

What I built and what it proved

What a demo profile can and can't do

The compounding problem

What to watch

Frequently asked questions

Turn the framework into a working system.

Continue reading