<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://blog.aarshpandey.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://blog.aarshpandey.com/" rel="alternate" type="text/html" /><updated>2026-04-06T11:34:51+00:00</updated><id>https://blog.aarshpandey.com/feed.xml</id><title type="html">Aarsh’s Blog</title><subtitle>Thoughts on code, design, and everything in between.</subtitle><author><name>Aarsh</name></author><entry><title type="html">How RAG Works</title><link href="https://blog.aarshpandey.com/posts/how-rag-works/" rel="alternate" type="text/html" title="How RAG Works" /><published>2026-04-06T00:00:00+00:00</published><updated>2026-04-06T00:00:00+00:00</updated><id>https://blog.aarshpandey.com/posts/how-rag-works</id><content type="html" xml:base="https://blog.aarshpandey.com/posts/how-rag-works/"><![CDATA[<p>Large language models are impressive, but they have a fundamental constraint: they only know what was in their training data, frozen at a point in time. Ask GPT-4 about something that happened last week, or about your company’s internal docs, and it either hallucinates or admits ignorance.</p>

<p>Retrieval-Augmented Generation — RAG — is the standard solution.</p>

<h2 id="the-core-idea">The core idea</h2>

<p>RAG is two steps:</p>

<ol>
  <li><strong>Retrieve</strong> relevant documents for the user’s query</li>
  <li><strong>Generate</strong> a response using those documents as context</li>
</ol>

<p>Instead of relying on what the model memorized during training, you hand it fresh information at inference time. The model becomes a reasoning engine over your data, not a knowledge store.</p>

<h2 id="step-1-indexing">Step 1: Indexing</h2>

<p><img src="/assets/images/rag-indexing.svg" alt="Indexing pipeline — documents are chunked, embedded, and stored in a vector database" /></p>

<p>Before you can retrieve anything, you need to index your documents. This happens offline.</p>

<p>Each document (or chunk of a document) gets converted into a vector — a list of numbers that encodes its semantic meaning. This is done by an <em>embedding model</em>, a neural net trained to map text into a high-dimensional space where similar meanings land close together.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"The cat sat on the mat."  →  [0.12, -0.84, 0.33, ...]
"A feline rested on a rug." →  [0.11, -0.81, 0.35, ...]  # close!
"Quarterly revenue is up."  →  [0.92,  0.14, -0.67, ...]  # far
</code></pre></div></div>

<p>You store these vectors in a <em>vector database</em> — Pinecone, Weaviate, pgvector, Chroma, and others. The database is optimized for one operation: finding the N vectors closest to a query vector.</p>

<h2 id="step-2-retrieval">Step 2: Retrieval</h2>

<p><img src="/assets/images/rag-retrieval.svg" alt="Retrieval and generation — query is embedded, matched against the vector DB, top chunks are passed to the LLM" /></p>

<p>When a user asks a question, you embed their query using the same embedding model, then run a nearest-neighbor search against your vector database.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>query: "what is our refund policy?"
  → embed query → [0.43, -0.21, ...]
  → search vector DB → top 3 matching chunks
  → return: [chunk_42, chunk_7, chunk_91]
</code></pre></div></div>

<p>The chunks that come back are semantically similar to the question — they might not share a single keyword, but they’re conceptually related.</p>

<h2 id="step-3-generation">Step 3: Generation</h2>

<p>Now you build a prompt that includes both the user’s question and the retrieved chunks, and send it to the LLM:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>System: You are a helpful assistant. Answer based only on the context below.

Context:
[chunk_42] Our refund policy allows returns within 30 days...
[chunk_7]  Refunds are processed within 5-7 business days...
[chunk_91] Digital products are non-refundable unless...

User: What is your refund policy?
</code></pre></div></div>

<p>The model reads the context and generates a grounded answer. If the context doesn’t contain the answer, a well-prompted model should say so rather than invent one.</p>

<h2 id="why-it-works">Why it works</h2>

<p>The key insight is that LLMs are exceptional at reading comprehension and synthesis. Give them a passage and a question, and they’ll extract and reason over the answer reliably. RAG exploits this — instead of asking the model to <em>remember</em>, you ask it to <em>read</em>.</p>

<p>This sidesteps the hallucination problem for factual questions: the model is constrained to what you gave it.</p>

<h2 id="the-moving-parts">The moving parts</h2>

<p>A production RAG system has a few components worth knowing:</p>

<p><strong>Chunking strategy</strong>: You split documents into chunks before embedding. Too large and the chunk adds noise; too small and you lose context. Typical chunks are 200–500 tokens, sometimes with overlap so context doesn’t get cut at boundaries.</p>

<p><strong>Embedding model</strong>: Separate from the LLM. OpenAI’s <code class="language-plaintext highlighter-rouge">text-embedding-3-small</code>, Cohere’s <code class="language-plaintext highlighter-rouge">embed-v3</code>, or open models like <code class="language-plaintext highlighter-rouge">bge-m3</code> are common choices. The embedding model determines retrieval quality.</p>

<p><strong>Retrieval depth (top-k)</strong>: How many chunks to retrieve. More context = more coverage but longer prompts and more noise. 3–5 is a common default.</p>

<p><strong>Reranking</strong>: A lightweight model that reorders retrieved chunks by relevance before they go into the prompt. Catches cases where vector similarity diverges from actual usefulness.</p>

<h2 id="what-rag-doesnt-solve">What RAG doesn’t solve</h2>

<p>RAG is not a silver bullet.</p>

<p>If the right chunk isn’t in your index, retrieval fails silently — the model answers from a bad or empty context. Garbage in, garbage out.</p>

<p>Multi-hop questions are hard: “Who manages the team that owns the billing service?” requires connecting two facts that might live in different documents. Standard RAG retrieves one neighborhood; it doesn’t traverse graphs.</p>

<p>And RAG adds latency. Every query now involves an embedding call, a vector search, and a larger prompt.</p>

<hr />

<p>RAG is fundamentally simple: look something up, then answer with it in hand. The complexity lives in the details — chunking, embedding quality, retrieval tuning, prompt design. But the architecture is just a smart use of the LLM’s best skill: reading.</p>]]></content><author><name>Aarsh</name></author><category term="ai" /><category term="llm" /><summary type="html"><![CDATA[Large language models are impressive, but they have a fundamental constraint: they only know what was in their training data, frozen at a point in time. Ask GPT-4 about something that happened last week, or about your company’s internal docs, and it either hallucinates or admits ignorance.]]></summary></entry><entry><title type="html">The Art of Building in Public</title><link href="https://blog.aarshpandey.com/posts/the-art-of-building-in-public/" rel="alternate" type="text/html" title="The Art of Building in Public" /><published>2026-04-01T00:00:00+00:00</published><updated>2026-04-01T00:00:00+00:00</updated><id>https://blog.aarshpandey.com/posts/the-art-of-building-in-public</id><content type="html" xml:base="https://blog.aarshpandey.com/posts/the-art-of-building-in-public/"><![CDATA[<p>There’s a temptation when building anything to wait until it’s <em>done</em> before showing it. To polish every edge, handle every edge case, and only then let people in.</p>

<p>That instinct is wrong.</p>

<h2 id="the-myth-of-the-finished-thing">The myth of the finished thing</h2>

<p>Nothing is ever really finished. Software gets deprecated. Designs go stale. Essays accrue footnotes. The gap between “done enough to ship” and “actually done” is infinite—and filling it is often the thing that stops work from ever reaching anyone.</p>

<p>When you build in public, you opt out of that trap.</p>

<h2 id="what-building-in-public-actually-means">What building in public actually means</h2>

<p>It doesn’t mean tweeting every keystroke or posting daily updates that nobody asked for. It means:</p>

<ul>
  <li><strong>Sharing work at the 70% mark</strong> — when it’s useful but still shapeable.</li>
  <li><strong>Writing about problems before you’ve solved them</strong> — which forces clearer thinking.</li>
  <li><strong>Being honest about what you don’t know</strong> — which attracts collaborators who do.</li>
</ul>

<p>The thing you’re afraid to share is usually exactly the thing someone else needs to see.</p>

<h2 id="the-feedback-loop-you-cant-manufacture">The feedback loop you can’t manufacture</h2>

<p>The only way to know if an idea is good is to expose it to reality. Imagined feedback is worse than useless — it’s noise that feels like signal.</p>

<p>Real feedback, even uncomfortable feedback, grounds you. It shortens the distance between what you think you’re building and what you’re actually building.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>idea → build → share → feedback → better idea
</code></pre></div></div>

<p>The only dangerous step is the one where “share” gets replaced by “polish forever.”</p>

<h2 id="start-with-one-post">Start with one post</h2>

<p>You don’t need an audience to start. You need a habit. Write the post you wished existed when you were stuck. Publish it somewhere—anywhere. See what happens.</p>

<p>Most of the time: nothing. Occasionally: someone finds it years later exactly when they needed it, and it saves them three hours.</p>

<p>That’s enough.</p>]]></content><author><name>Aarsh</name></author><category term="thinking" /><category term="craft" /><summary type="html"><![CDATA[There’s a temptation when building anything to wait until it’s done before showing it. To polish every edge, handle every edge case, and only then let people in.]]></summary></entry><entry><title type="html">Understanding Go Interfaces from First Principles</title><link href="https://blog.aarshpandey.com/posts/understanding-go-interfaces/" rel="alternate" type="text/html" title="Understanding Go Interfaces from First Principles" /><published>2026-03-22T00:00:00+00:00</published><updated>2026-03-22T00:00:00+00:00</updated><id>https://blog.aarshpandey.com/posts/understanding-go-interfaces</id><content type="html" xml:base="https://blog.aarshpandey.com/posts/understanding-go-interfaces/"><![CDATA[<p>Most explanations of Go interfaces start with the syntax. This one won’t.</p>

<p>Instead, let’s start with the problem interfaces solve, and work backwards to why Go’s approach is surprisingly elegant.</p>

<h2 id="the-problem-coupling">The problem: coupling</h2>

<p>When module A depends directly on module B, you’ve created coupling. A can’t exist without B. Testing A requires B. Changing B risks breaking A.</p>

<p>The classic solution is <em>indirection</em>: instead of depending on a concrete thing, depend on a description of what you need.</p>

<p>That description is an interface.</p>

<h2 id="what-makes-gos-interfaces-different">What makes Go’s interfaces different</h2>

<p>In most languages (Java, C#, TypeScript), interfaces are <em>declared</em>. You say “this type implements that interface” explicitly.</p>

<p>Go’s interfaces are <em>inferred</em>. A type satisfies an interface if it has the right methods — no declaration required.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">type</span> <span class="n">Stringer</span> <span class="k">interface</span> <span class="p">{</span>
    <span class="n">String</span><span class="p">()</span> <span class="kt">string</span>
<span class="p">}</span>

<span class="k">type</span> <span class="n">Point</span> <span class="k">struct</span> <span class="p">{</span>
    <span class="n">X</span><span class="p">,</span> <span class="n">Y</span> <span class="kt">float64</span>
<span class="p">}</span>

<span class="c">// Point implicitly satisfies Stringer</span>
<span class="k">func</span> <span class="p">(</span><span class="n">p</span> <span class="n">Point</span><span class="p">)</span> <span class="n">String</span><span class="p">()</span> <span class="kt">string</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"(%v, %v)"</span><span class="p">,</span> <span class="n">p</span><span class="o">.</span><span class="n">X</span><span class="p">,</span> <span class="n">p</span><span class="o">.</span><span class="n">Y</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">Point</code> never mentions <code class="language-plaintext highlighter-rouge">Stringer</code>. It just has a <code class="language-plaintext highlighter-rouge">String()</code> method, and that’s enough.</p>

<h2 id="why-this-matters">Why this matters</h2>

<p>This design decision has a profound consequence: <strong>interfaces are defined by the consumer, not the producer.</strong></p>

<p>The package that <em>needs</em> a logger defines <code class="language-plaintext highlighter-rouge">Logger</code>. The package that <em>provides</em> logging just has methods. If they happen to match, they fit together — even if they were written years apart by different people.</p>

<p>This is the opposite of most OOP languages, where the producer decides the contract.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// In your package — you define this</span>
<span class="k">type</span> <span class="n">Logger</span> <span class="k">interface</span> <span class="p">{</span>
    <span class="n">Log</span><span class="p">(</span><span class="n">msg</span> <span class="kt">string</span><span class="p">,</span> <span class="n">level</span> <span class="kt">int</span><span class="p">)</span>
<span class="p">}</span>

<span class="c">// In some other package — they don't know you exist</span>
<span class="k">type</span> <span class="n">ZapLogger</span> <span class="k">struct</span> <span class="p">{</span> <span class="o">...</span> <span class="p">}</span>
<span class="k">func</span> <span class="p">(</span><span class="n">z</span> <span class="n">ZapLogger</span><span class="p">)</span> <span class="n">Log</span><span class="p">(</span><span class="n">msg</span> <span class="kt">string</span><span class="p">,</span> <span class="n">level</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span> <span class="o">...</span> <span class="p">}</span>

<span class="c">// They fit anyway</span>
<span class="k">var</span> <span class="n">l</span> <span class="n">Logger</span> <span class="o">=</span> <span class="n">ZapLogger</span><span class="p">{}</span>
</code></pre></div></div>

<h2 id="the-empty-interface">The empty interface</h2>

<p><code class="language-plaintext highlighter-rouge">interface{}</code> (or <code class="language-plaintext highlighter-rouge">any</code> in modern Go) is the interface with no methods. Every type satisfies it, so it can hold any value.</p>

<p>It’s Go’s escape hatch. Use it sparingly — it trades away type safety for flexibility.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">PrintAnything</span><span class="p">(</span><span class="n">v</span> <span class="n">any</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="n">v</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="a-practical-heuristic">A practical heuristic</h2>

<p>Keep interfaces small. One or two methods is ideal. <code class="language-plaintext highlighter-rouge">io.Reader</code> has one. <code class="language-plaintext highlighter-rouge">io.Writer</code> has one. These small interfaces compose beautifully — <code class="language-plaintext highlighter-rouge">io.ReadWriter</code> is just both combined.</p>

<p>The moment an interface has seven methods, it’s describing a class, not a capability.</p>

<hr />

<p>The next time you’re reaching for a concrete type in a function signature, ask: what’s the minimum capability I actually need here? Name that capability. Make it an interface. The code will be better for it.</p>]]></content><author><name>Aarsh</name></author><category term="go" /><category term="programming" /><summary type="html"><![CDATA[Most explanations of Go interfaces start with the syntax. This one won’t.]]></summary></entry><entry><title type="html">Notes on Local-First Software</title><link href="https://blog.aarshpandey.com/posts/notes-on-local-first-software/" rel="alternate" type="text/html" title="Notes on Local-First Software" /><published>2026-03-10T00:00:00+00:00</published><updated>2026-03-10T00:00:00+00:00</updated><id>https://blog.aarshpandey.com/posts/notes-on-local-first-software</id><content type="html" xml:base="https://blog.aarshpandey.com/posts/notes-on-local-first-software/"><![CDATA[<p>The cloud made software better in a lot of ways. Sync across devices. Collaboration. No data loss when your laptop dies.</p>

<p>But it also made software <em>worse</em> in a way we’ve mostly accepted: <strong>your data lives somewhere else</strong>, and access to it depends on someone else’s uptime, business model, and goodwill.</p>

<p>Local-first software is an attempt to get the benefits of the cloud without giving up ownership.</p>

<h2 id="the-seven-ideals">The seven ideals</h2>

<p>The original <a href="https://www.inkandswitch.com/local-first/">local-first paper</a> from Ink &amp; Switch laid out seven properties:</p>

<ol>
  <li><strong>Fast</strong> — no round-trip latency for reads</li>
  <li><strong>Multi-device</strong> — works across your machines</li>
  <li><strong>Offline</strong> — works without internet</li>
  <li><strong>Collaboration</strong> — still supports real-time editing</li>
  <li><strong>Longevity</strong> — data survives the vendor</li>
  <li><strong>Privacy</strong> — you control your data</li>
  <li><strong>User control</strong> — you can export, modify, extend</li>
</ol>

<p>Most cloud software hits 2 and 4. Most desktop software hits 1, 3, and 6. Local-first tries to hit all seven.</p>

<h2 id="the-hard-part-sync">The hard part: sync</h2>

<p>Offline + multi-device + collaboration creates a distributed systems problem. When two devices edit the same document without talking to each other, how do you merge the results?</p>

<p>The promising answer is <strong>CRDTs</strong> — Conflict-free Replicated Data Types. They’re data structures designed to merge in a way that’s always consistent, without requiring coordination.</p>

<p>The tradeoff: CRDTs work well for text and counters, but get complicated for structured data with complex relationships.</p>

<h2 id="where-its-going">Where it’s going</h2>

<p>Tools like Obsidian, Logseq, and Linear (in parts) lean local-first. The <a href="https://automerge.org/">Automerge</a> library makes CRDTs accessible for app developers. SQLite-based sync tools like libSQL are exploring the same space for databases.</p>

<p>It’s not solved. But the gap between “cloud app” and “local app” is narrowing in the right direction.</p>

<hr />

<p>I don’t think local-first will replace everything. Some things genuinely need a central server. But for the apps where you’d be upset if the company shut down — your notes, your writing, your finances — it’s worth asking whether the data should live on <em>your</em> machine first.</p>]]></content><author><name>Aarsh</name></author><category term="software" /><category term="design" /><summary type="html"><![CDATA[The cloud made software better in a lot of ways. Sync across devices. Collaboration. No data loss when your laptop dies.]]></summary></entry></feed>