Karpathy’s LLM Wiki Setup Guide: Build Your AI-Powered Knowledge Base


📺

Article based on video by

Teacher’s TechWatch original video ↗

Every time you query a RAG system, it starts from zero. No memory of your last conversation, no understanding of how that document connects to the one you read last month. Andrew Karpathy’s LLM Wiki approach flips this entirely—by building persistent AI knowledge, you transform note-taking from repeated scratch queries into a connected understanding that grows with every session. I spent a week testing this setup, and the difference from traditional retrieval is immediate and striking.

📺 Watch the Original Video

What Is an LLM Wiki and Why Traditional RAG Falls Short

Most tutorials treat RAG as the gold standard for AI document querying. I’ve found that framing misses the actual limitation. Let me show you what I mean.

The scratch query problem explained

Traditional RAG searches from scratch on every single query. You upload a document, ask a question, get an answer—and the system immediately forgets the interaction happened. Ask a follow-up five minutes later, and it behaves like you’ve never shared a document before.

That’s not a flaw in the implementation. It’s baked into how RAG works. Each query starts with zero context from previous conversations.

Imagine asking a reference librarian for help on your research topic. They answer your question, you leave the building, and return the next day with a follow-up. They have no idea who you are or what you discussed yesterday. “Sorry, you’ll need to explain everything again from the start.”

Sound familiar? That’s exactly what using traditional RAG with your personal documents feels like.

Why persistent memory changes everything

An LLM Wiki flips this entirely. Instead of ephemeral retrieval, you get persistent knowledge storage that accumulates understanding across sessions. The AI doesn’t just answer individual questions—it builds on previous interactions, growing smarter as it learns your information landscape.

This isn’t the same as “chat history.” True persistent memory means the system makes connections you haven’t explicitly requested yet. You mention a project deadline today; six sessions later, the system automatically links it to your planning notes without being asked.

The core difference comes down to this: traditional RAG treats your documents like a library you repeatedly query. An LLM Wiki treats your knowledge like a living system that compounds in value over time.

Which approach actually serves your work? For most people I’ve talked to, the scratch-and-forget model feels broken once they’ve experienced what cumulative understanding looks like.

Understanding the Architecture: How LLM Wiki Memory Works

Most people think of AI and documents like a librarian who walks into a new library every time you ask a question. They start fresh, no matter how many times you’ve visited before. That’s the RAG model — useful, but fundamentally forgetful. The LLM Wiki architecture flips this entirely. Instead of retrieving answers, it builds understanding that compounds over time, like a researcher who returns to the same library daily and gets faster, smarter with each visit.

Document processing pipeline breakdown

Here’s what actually happens when you feed a document into the system. First, it gets chunked — broken into manageable pieces, usually a few hundred tokens each. Then an embedding model converts each chunk into a vector, essentially a numerical fingerprint of its meaning. These vectors get stored in a vector database, ready for comparison.

When you search later, the system isn’t parsing keywords — it’s finding meaning. A query about “climate impact on agriculture” will match a chunk discussing “drought effects on crop yields” even if those exact words never appear together. Most chunking strategies aim for 500-1500 tokens per segment, which represents a sweet spot between enough context and precision.

Semantic memory systems and embeddings

The magic isn’t in the storage — it’s in the semantic layer. Semantic similarity search works like a mental thesaurus that understands concepts rather than just words. When the system finds two chunks conceptually related, it creates an explicit link between them.

Over time, this builds a web of connections that mirrors how your own memory associates ideas. The system starts recognizing patterns you haven’t explicitly defined yet. This is where most tutorials get it wrong — they focus on the technology instead of this emergent understanding that happens when connections accumulate.

Graph-based connection discovery

This is where the architecture really diverges from traditional approaches. The system maintains a knowledge graph — an explicit map of how concepts relate across all your documents. When you add a new note about neural networks, the graph might connect it to your earlier notes on machine learning basics, a research paper you uploaded last month, and a YouTube video transcript from two weeks ago.

These connections persist. The system “remembers” that you linked these ideas before, even across long time gaps. Unlike RAG’s single-query context window, LLM Wiki accumulates understanding layer by layer. Each interaction builds on the last, creating the kind of cumulative knowledge that makes the system feel less like a search engine and more like a collaborator who actually knows your work.

Setting Up Your LLM Wiki: Tools and Prerequisites

Obsidian as Your Local-First Knowledge Base

The foundation of this whole system is Obsidian, and here’s why that matters: it stores everything as plain markdown files on your computer. No cloud, no subscription, no wondering whether some startup will pivot and take your notes with it.

Your notes live as simple text files you could theoretically open in Notepad if you wanted. That’s the kind of ownership that makes local-first note-taking different from the alternatives. The graph view is where things get interesting—you can literally watch connections emerge between your notes as you build the wiki. I’ve found that seeing those links form helps you think differently about how information relates.

Claude Code Integration for Agentic Capabilities

Claude Code is what connects your LLM directly to that document collection. Instead of traditional RAG approaches that pull context only when asked, Claude Code can maintain persistent memory across sessions. Think of it like having a research assistant who actually remembers what you looked at last month—not just the current conversation.

This is the real shift: the AI becomes an active participant in your knowledge ecosystem, not just a responder to queries. It can discover connections you might have missed and build on existing work over time.

Plugin Ecosystem and Essential Extensions

Here’s the thing about setup—it’s surprisingly minimal. You need markdown files, an LLM interface, and an Obsidian vault structure. That’s genuinely it. No elaborate infrastructure required.

The plugin ecosystem does expand what’s possible if you want to go deeper, but you can start with just the basics and scale up as your workflow reveals what actually helps.

Building Your Knowledge Base: Step-by-Step Implementation

Most people jump straight into dumping documents into their note app and expect magic to happen. It doesn’t work that way. After watching Karpathy’s approach, I’ve learned that a knowledge base is only as good as the structure you build into it from day one.

Organizing Your First Notes for AI Comprehension

Start with a folder structure that mirrors how you actually think about your work. Don’t overthink this — I usually create folders like `projects/`, `concepts/`, `reference/`, and `daily/`. The goal is to have a place for everything so you’re not hunting through a flat list of 500 notes.

Here’s where most people get it wrong: they write long, narrative notes like they’re drafting emails. Instead, write atomic notes — one discrete idea per file. Think of each note like a single Lego brick. When you write “Transformers use attention mechanisms to weight token relationships,” that’s an atomic note. When you write a 1,500-word essay titled “Notes on AI,” that’s a mess waiting to happen. Atomic notes are ideal for chunking because AI systems can recombine them contextually rather than trying to extract meaning from walls of text.

Feeding Documents to Your LLM System

Once your notes are atomic, connect Obsidian to Claude Code through the plugin. You can feed entire folders or individual notes depending on what you’re working on. Karpathy’s setup treats these documents as a persistent memory layer — unlike traditional RAG which queries from scratch on every request, your LLM Wiki accumulates understanding over time.

One thing that surprised me: Claude Code can actually read your existing Obsidian vault and suggest connections between notes you didn’t realize were related. It’s like having a research assistant who notices patterns you’d miss.

Creating Interconnected Note Systems

This is where the magic happens. Use consistent wiki-links — `[[note-name]]` syntax — to explicitly connect related ideas. When you link liberally, Obsidian’s graph view becomes a map of your knowledge rather than just a list of files.

Check the graph view weekly to spot orphaned notes — those floating alone with no connections. A note with zero links is basically invisible to your AI system. I aim for every note to link to at least two others.

The payoff? When you ask Claude Code about something in your vault, it pulls from a web of connected context rather than searching cold. That’s the difference between querying a stranger and consulting your own brain.

Real-World Applications: From Research to Daily Workflows

Research and literature synthesis

If you’ve ever tried to remember which paper you read three months ago that mentioned a specific technique, you know how quickly academic knowledge becomes fragmented. With an LLM Wiki, you’re building persistent understanding — the system remembers the relationships between papers you uploaded in January when you’re writing your literature review in April.

Researchers have found this especially valuable when returning to a research direction after a break. You don’t start from zero; the connections you built months ago are still there, waiting.

Project documentation and technical specs

Here’s where I think this gets genuinely exciting for developers. How many hours have you spent trying to reconstruct why a particular implementation decision was made? “I know this looks weird, but trust me, there was a reason.”

An LLM Wiki becomes a kind of institutional memory that doesn’t quit. When a new developer joins the team, the system can surface not just what the code does, but the reasoning behind architectural choices — conversations you’ve captured, trade-offs you’ve discussed. It’s like having a teammate who has perfect recall of every documentation change from day one.

Personal knowledge management for continuous learning

Writers track argument evolution across drafts without manual summary. This is where it gets almost eerie — the system surfaces connections your brain might miss because you’re too close to the material.

Sound familiar? That “aha” moment when two ideas from completely different areas suddenly click together. What surprises me is how often these unexpected connections prove the most valuable. It’s like having a research assistant who reads everything you feed it and occasionally taps you on the shoulder with a note that says “you might want to look at this.”

Compare this to traditional RAG systems: you’d re-query for context each time, essentially starting from scratch. LLM Wiki surfaces relevant connections proactively — it builds on your knowledge incrementally, like a conversation that remembers everything you’ve ever discussed.

Frequently Asked Questions

What is the difference between LLM Wiki and RAG retrieval?

The core difference is persistence. Traditional RAG starts from scratch on every query—it searches your documents fresh each time with no memory of what it found before. An LLM Wiki, by contrast, builds cumulative understanding where connections discovered in one session inform the next. What I’ve found is that RAG works fine for one-off questions, but if you’re researching a complex topic over weeks, the persistent memory of an LLM Wiki prevents you from rediscovering the same information repeatedly.

How do I connect Obsidian to Claude Code for persistent AI memory?

The connection typically happens through Obsidian’s plugin ecosystem and Claude Code’s document awareness capabilities. You install the relevant plugins in Obsidian (like the local REST API plugin), then configure Claude Code to point to your vault directory. In practice, the setup takes about 15-20 minutes if you follow Karpathy’s guide—the trickier part is getting the embedding pipeline right so Claude understands your note relationships, not just individual files.

Can I build an LLM Wiki without coding experience?

Yes, and this is exactly what Karpathy’s guide targets. You don’t need to write code if you use the pre-built tools: Obsidian handles the note organization, and the connection to Claude Code is configuration-based, not programmatic. The only technical step is setting up the embedding pipeline, which has beginner-friendly automation now. If you’ve ever dragged files into a folder, you can build a basic LLM Wiki in under an hour.

What are the best practices for organizing notes in a personal knowledge base?

Link aggressively and use atomic notes. Each note should cover one concept (so you can reference it from multiple contexts), and you should create links between related ideas even if they seem tangentially connected—Claude Code’s semantic understanding picks up on these relationships. I’ve found that notes with 15-30 connections in Obsidian’s graph view tend to produce the most useful AI responses. Avoid deep folder hierarchies; flat structures with robust linking outperform nested directories for LLM access.

How does semantic memory in AI systems differ from traditional database retrieval?

Database retrieval finds exact matches or keyword overlaps—it’s basically search. Semantic memory understands meaning and context. If you ask ‘what projects used Python’ in a database, you’d need the exact term; an AI with semantic memory might return results mentioning ‘the scripting work’ or ‘the ML pipeline’ because it understands those are functionally related. The practical difference is that with semantic memory, you can ask vague questions and still get relevant answers because the system grasps concepts, not just keywords.

If you’re already maintaining a note-taking system, try importing one month of notes into Obsidian and querying it through Claude Code—you’ll immediately feel the difference between scratch retrieval and cumulative understanding.

Subscribe to Fix AI Tools for weekly AI & tech insights.

O

Onur

AI Content Strategist & Tech Writer

Covers AI, machine learning, and enterprise technology trends.