loke.dev
Cover image for My Remix App is Talking Back: Building a Local AI Assistant That Actually Understands My Codebase

My Remix App is Talking Back: Building a Local AI Assistant That Actually Understands My Codebase

A deep dive into using Ollama and the Vercel AI SDK within Remix to create a self-documenting, context-aware developer tool that stays strictly on your machine.

· 5 min read

I have a confession to make: I forget how my own code works roughly forty-five minutes after I write it. It's a talent, really. I’ll spend three hours architecting a beautiful, complex state machine in a Remix loader, walk away to grab a coffee, and return to find a screen filled with symbols that look like they were written by a caffeinated squirrel.

Usually, the solution is to copy-paste the mess into a browser tab, pray that I don't accidentally leak proprietary logic to a corporate LLM, and ask, "What on earth does this do?"

But last week, I got tired of the context switching. I wanted my app to explain itself _to me_, inside the dev environment, without my code ever leaving my MacBook. So, I decided to build a local AI assistant directly into my Remix project using Ollama and the Vercel AI SDK.

The result? My app is officially talking back. And honestly? It’s smarter than I am on Monday mornings.

Why Go Local?

Before we dive into the npm install madness, let's talk about why you'd bother with local AI.

  1. Privacy: Your code stays on your disk. Period. No "training on your data" shenanigans.
  2. Cost: Ollama is free. The only price you pay is in fan noise and battery life.
  3. Latency: If you're on a plane or a spotty train Wi-Fi, you can still get your "AI fix" without waiting for a cloud round-trip.
  4. The "Cool" Factor: There's something inherently satisfying about seeing your GPU usage spike because your computer is thinking about your code.

The Stack

For this build, we’re using:

  • Remix: Our favorite full-stack web framework.
  • Ollama: The engine that runs LLMs (like Llama 3 or Mistral) locally.
  • Vercel AI SDK: The glue that makes streaming responses and chat state management feel like a breeze.

Step 1: Getting the Engine Running

If you haven't installed Ollama yet, go do that now. Once it's installed, pull a model. I've been using deepseek-coder or llama3 lately for development tasks.

ollama pull llama3

Verify it's running by hitting http://localhost:11434. If you get a "Ollama is running" message, you're golden.

Step 2: The Remix Backend (The Action)

Remix handles server-side logic beautifully. We need an Action to handle the incoming chat prompts. The Vercel AI SDK has a provider for Ollama, which makes this almost too easy.

First, install the dependencies:

npm install ai @ai-sdk/ollama

Now, let's create a resource route. I put mine in app/routes/api.chat.ts. This keeps the AI logic separate from my UI.

// app/routes/api.chat.ts
import { createOllama } from '@ai-sdk/ollama'
import { ActionFunctionArgs } from '@remix-run/node'
import { streamText } from 'ai'

const ollama = createOllama({
  baseURL: 'http://localhost:11434/api',
})

export async function action({ request }: ActionFunctionArgs) {
  const { messages, context } = await request.json()

  // Here is the "secret sauce": we inject codebase context
  // into the system prompt.
  const systemPrompt = `
    You are a helpful assistant integrated into a Remix project. 
    You have access to the following project context:
    ${context}
    
    Answer questions concisely and provide code snippets where helpful.
  `

  const result = await streamText({
    model: ollama('llama3'),
    system: systemPrompt,
    messages,
  })

  return result.toDataStreamResponse()
}

<Callout type="info"> Note: Notice the context variable. This is where we’ll feed the LLM information about our files. Without this, it’s just a generic chatbot. With it, it’s a collaborator. </Callout>

Step 3: Feeding the Beast (Context Injection)

An AI is only as good as the context you give it. If I ask "Why is my loader failing?", the AI needs to see the loader.

In a real-world scenario, you might use a Vector DB or a RAG (Retrieval-Augmented Generation) setup. But for a local dev tool? We can be a bit more "scrappy." I wrote a small utility function that reads the relevant files in my app/routes directory and turns them into a string.

// utils/get-context.ts
import fs from 'fs/promises'
import path from 'path'

export async function getCodebaseContext() {
  // For this demo, let's just grab the current route's code
  // In a real app, you might use a file-walker or focus on specific files
  const routePath = path.join(process.cwd(), 'app/routes/_index.tsx')
  const content = await fs.readFile(routePath, 'utf-8')

  return `Current File (_index.tsx):\n\`\`\`tsx\n${content}\n\`\`\``
}

Step 4: The UI (The Chat Interface)

Over in my main route, I want a slick floating chat bubble. The useChat hook from the Vercel AI SDK handles the message state, loading states, and the actual streaming.

// app/routes/_index.tsx
import { useState } from 'react'
import { useChat } from 'ai/react'

export default function Index() {
  const [context, setContext] = useState('No context loaded')

  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({
      api: '/api/chat',
      body: {
        context: context, // We send our codebase snippet here
      },
    })

  // A quick way to refresh the context before asking
  const refreshContext = async () => {
    // You could call a separate loader here to get the file strings
    const res = await fetch('/api/get-context')
    const data = await res.json()
    setContext(data.context)
  }

  return (
    <div className="p-8 font-sans">
      <h1 className="text-3xl font-bold">My Awesome Remix App</h1>

      {/* The Chat UI */}
      <div className="fixed bottom-4 right-4 w-96 bg-white border rounded-xl shadow-2xl flex flex-col h-[500px]">
        <div className="p-4 border-b bg-gray-50 flex justify-between">
          <span className="font-semibold text-sm">
            Local Assistant (Llama 3)
          </span>
          <button onClick={refreshContext} className="text-xs text-blue-500">
            Sync Code
          </button>
        </div>

        <div className="flex-1 overflow-y-auto p-4 space-y-4">
          {messages.map((m) => (
            <div
              key={m.id}
              className={`p-2 rounded ${m.role === 'user' ? 'bg-blue-100 ml-8' : 'bg-gray-100 mr-8'}`}
            >
              <p className="text-sm font-bold capitalize">{m.role}</p>
              <p className="text-sm">{m.content}</p>
            </div>
          ))}
        </div>

        <form onSubmit={handleSubmit} className="p-4 border-t">
          <input
            className="w-full p-2 border rounded-md"
            value={input}
            placeholder="Ask about this page..."
            onChange={handleInputChange}
            disabled={isLoading}
          />
        </form>
      </div>
    </div>
  )
}

The "Oh No" Moments (Lessons Learned)

It wasn't all sunshine and rainbows. Building this, I ran into a few things that might trip you up:

  1. Token Limits are Real: Even local models have limits. If you try to feed your entire node_modules folder into the system prompt, Ollama will likely just cry (or return nonsense). Be selective about the context you send.
  2. Streaming Quirks: Sometimes the stream would stutter. I realized my MacBook was trying to throttle the GPU to save battery. Plugging into power suddenly made the AI "think" 2x faster.
  3. Prompt Engineering: I initially forgot to tell the AI it was in a _Remix_ project. It kept suggesting Express.js middleware solutions that made no sense in a Remix loader. Be specific in your system prompt!

So, Is It Worth It?

Honestly? Yes. Having a little sidecar chat that knows exactly what loader I'm working on has saved me dozen of trips to the documentation.

There's something incredibly empowering about owning the entire stack—from the UI framework down to the weights of the AI model. No API keys, no monthly subscriptions to a dozen different "AI for dev" tools, and no data leaving my machine.

Now, if I can just get it to write my unit tests while I go get that second coffee, I’ll be truly living in the future.

_Have you experimented with local LLMs in your dev workflow? I’d love to hear how you’re handling context—hit me up on the socials and let's compare notes!_