Complete Guide

How to Build a
RAG Chatbot

Next.js 14 + Supabase pgvector + Claude AI. Build production-ready AI chatbots with custom knowledge bases in under a day.

4-6 hrs

Build Time

$10-25

Monthly Cost

~500

Lines of Code

5

Dependencies

Understanding RAG

What is Retrieval-Augmented Generation?

RAG (Retrieval-Augmented Generation) is a technique that combines the power of large language models with your own custom knowledge base. Instead of relying solely on what the AI was trained on, RAG retrieves relevant information from your documents and includes it in the prompt.

This means you can build chatbots that answer questions about your specific domain - whether that's company policies, product documentation, legal guidelines, or tax regulations.

Real Example: We built SK TaxGPT in a day - a chatbot that answers Saskatchewan small business tax questions using 17 custom knowledge documents with 70+ chunks of tax information.

RAG vs Fine-Tuning

RAG

  • - Easy to update knowledge
  • - No training costs
  • - Works in hours
  • - Cites sources

Fine-Tuning

  • - Expensive to update
  • - Requires training
  • - Takes days/weeks
  • - No source tracking

For most use cases, RAG is the better choice. Fine-tuning is only worth it when you need to change the model's behavior or tone fundamentally.

System Design

RAG Architecture Overview

User Question

Chat UI

Generate Embedding

all-MiniLM-L6-v2

Vector Search

Supabase pgvector

Retrieve Context

Top 6 chunks

Claude AI Response

Streaming

Next.js 14

App Router, API Routes, Server Components

Supabase

PostgreSQL + pgvector extension

Vercel AI SDK

useChat hook, streaming responses

Claude AI

Claude Sonnet for intelligent responses

Step 1

Set Up Supabase with pgvector

First, create a Supabase project and enable the pgvector extension. This gives you a PostgreSQL database with vector similarity search capabilities.

1. Enable pgvector Extension

-- Run in Supabase SQL Editor
CREATE EXTENSION IF NOT EXISTS vector;

2. Create Documents Table

CREATE TABLE tax_documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  content TEXT NOT NULL,
  embedding vector(384),  -- MiniLM dimensions
  metadata JSONB DEFAULT '{}',
  province TEXT DEFAULT 'SK',
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create index for fast similarity search
CREATE INDEX ON tax_documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

3. Create Search Function

CREATE OR REPLACE FUNCTION match_tax_documents(
  query_embedding vector(384),
  match_threshold float DEFAULT 0.3,
  match_count int DEFAULT 6,
  filter_province text DEFAULT NULL
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  metadata JSONB,
  similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    tax_documents.id,
    tax_documents.content,
    tax_documents.metadata,
    1 - (tax_documents.embedding <=> query_embedding) AS similarity
  FROM tax_documents
  WHERE 1 - (tax_documents.embedding <=> query_embedding) > match_threshold
    AND (filter_province IS NULL OR tax_documents.province = filter_province)
  ORDER BY similarity DESC
  LIMIT match_count;
END;
$$;
Step 2

Create Knowledge Documents

Write markdown files containing your domain knowledge. Structure them with clear headers and sections - this helps the chunking process preserve context.

Example Document Structure

# Home Office Deduction Guide

## Overview
If you work from home, you may be able to deduct
a portion of your home expenses...

## Eligibility Requirements
You can claim home office expenses if either:
1. Your home is your principal place of business
2. You regularly meet clients at your home

## Deductible Expenses
- Heat, electricity, water
- Home insurance
- Property taxes
- Internet (business portion)
- Mortgage interest (NOT principal)

## Calculation Method
Square footage of office / Total home square footage
× Eligible expenses = Deduction

## Sources
- CRA Folio S4-F2-C2
- Form T2125

Pro Tip: Document Quality Matters

The quality of your RAG system is directly tied to your knowledge documents. Write clear, well-structured content with authoritative sources. Include specific numbers, rules, and examples. The more comprehensive your documents, the better your chatbot answers.

Step 3

Build the Ingestion Script

Create a script that reads your markdown files, chunks them into smaller pieces, generates embeddings, and stores them in Supabase.

scripts/ingest-docs.ts

import { createClient } from '@supabase/supabase-js';
import { pipeline } from '@xenova/transformers';
import * as fs from 'fs';
import * as path from 'path';

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_KEY!
);

// Load embedding model
const embedder = await pipeline(
  'feature-extraction',
  'Xenova/all-MiniLM-L6-v2'
);

async function generateEmbedding(text: string) {
  const output = await embedder(text, {
    pooling: 'mean',
    normalize: true
  });
  return Array.from(output.data);
}

// Simple chunking function
function chunkText(text: string, maxChars = 1500) {
  const sections = text.split(/\n## /);
  const chunks: string[] = [];

  for (const section of sections) {
    if (section.length <= maxChars) {
      chunks.push(section);
    } else {
      // Split long sections by paragraphs
      const paragraphs = section.split('\n\n');
      let current = '';
      for (const para of paragraphs) {
        if ((current + para).length > maxChars) {
          if (current) chunks.push(current);
          current = para;
        } else {
          current += (current ? '\n\n' : '') + para;
        }
      }
      if (current) chunks.push(current);
    }
  }
  return chunks;
}

async function ingestFile(filePath: string) {
  const content = fs.readFileSync(filePath, 'utf-8');
  const fileName = path.basename(filePath, '.md');
  const chunks = chunkText(content);

  for (const chunk of chunks) {
    const embedding = await generateEmbedding(chunk);

    await supabase.from('tax_documents').insert({
      content: chunk,
      embedding,
      metadata: {
        source: fileName,
        title: fileName.replace(/-/g, ' ')
      }
    });
  }

  console.log(`Ingested ${chunks.length} chunks from ${fileName}`);
}

// Run ingestion
const dataDir = './app/taxgpt/data';
const files = fs.readdirSync(dataDir, { recursive: true })
  .filter(f => f.toString().endsWith('.md'));

for (const file of files) {
  await ingestFile(path.join(dataDir, file.toString()));
}

Why Local Embeddings?

We use @xenova/transformers with all-MiniLM-L6-v2 for embeddings. This runs locally with no API costs. The trade-off is 384 dimensions instead of OpenAI's 1536, but for most RAG use cases, the quality is excellent and retrieval is just as accurate.

Step 4

Create the API Route

The API route is where the magic happens: receive the user's question, generate an embedding, search for relevant chunks, build a context-enriched prompt, and stream the AI response.

app/api/chat/route.ts

import { anthropic } from '@ai-sdk/anthropic';
import { streamText, convertToModelMessages, UIMessage } from 'ai';
import { createClient } from '@supabase/supabase-js';
import { pipeline } from '@xenova/transformers';

export const runtime = 'nodejs';
export const maxDuration = 60;

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_KEY!
);

// Singleton embedder
let embedder: any = null;
async function getEmbedder() {
  if (!embedder) {
    embedder = await pipeline(
      'feature-extraction',
      'Xenova/all-MiniLM-L6-v2'
    );
  }
  return embedder;
}

export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();

  // Get last message text
  const lastMessage = messages[messages.length - 1];
  const query = lastMessage.parts
    ?.filter(p => p.type === 'text')
    .map(p => p.text)
    .join('') || '';

  // 1. Generate embedding for user question
  const embed = await getEmbedder();
  const output = await embed(query, {
    pooling: 'mean',
    normalize: true
  });
  const embedding = Array.from(output.data);

  // 2. Search for relevant chunks
  const { data: docs } = await supabase.rpc('match_tax_documents', {
    query_embedding: embedding,
    match_threshold: 0.3,
    match_count: 6
  });

  // 3. Build context
  const context = docs
    ?.map(d => `[Source: ${d.metadata?.source}]\n${d.content}`)
    .join('\n\n---\n\n') || '';

  // 4. System prompt with context
  const systemPrompt = `You are a helpful assistant.

## Retrieved Documents
${context || 'No relevant documents found.'}

Answer based on the retrieved documents. If uncertain,
say so and recommend consulting a professional.`;

  // 5. Stream response
  const result = streamText({
    model: anthropic('claude-sonnet-4-20250514'),
    system: systemPrompt,
    messages: await convertToModelMessages(messages),
  });

  return result.toUIMessageStreamResponse();
}
Step 5

Build the Chat UI

Use the Vercel AI SDK's useChat hook for a smooth chat experience with streaming responses.

app/chat/page.tsx (Simplified)

'use client';

import { useChat } from '@ai-sdk/react';
import { DefaultChatTransport } from 'ai';
import { useState } from 'react';

export default function ChatPage() {
  const { messages, sendMessage, status } = useChat({
    transport: new DefaultChatTransport({
      api: '/api/chat',
    }),
  });
  const [input, setInput] = useState('');
  const isLoading = status === 'streaming';

  const handleSubmit = (e: React.FormEvent) => {
    e.preventDefault();
    if (!input.trim() || isLoading) return;
    sendMessage({ text: input.trim() });
    setInput('');
  };

  // Extract text from message parts (AI SDK v6)
  const getContent = (msg: typeof messages[0]) => {
    return msg.parts
      .filter(p => p.type === 'text')
      .map(p => p.text)
      .join('');
  };

  return (
    <div className="max-w-2xl mx-auto p-4">
      <div className="space-y-4 mb-4">
        {messages.map(m => (
          <div
            key={m.id}
            className={`p-3 rounded ${
              m.role === 'user'
                ? 'bg-blue-100 ml-auto'
                : 'bg-gray-100'
            }`}
          >
            {getContent(m)}
          </div>
        ))}
      </div>

      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          value={input}
          onChange={e => setInput(e.target.value)}
          placeholder="Ask a question..."
          className="flex-1 p-2 border rounded"
          disabled={isLoading}
        />
        <button
          type="submit"
          disabled={isLoading}
          className="px-4 py-2 bg-blue-600 text-white rounded"
        >
          Send
        </button>
      </form>
    </div>
  );
}
Monthly Costs

Production Cost Breakdown

$0

Supabase

Free tier: 500MB + 50K requests

$0

Vercel

Hobby tier for personal projects

$0

Embeddings

Local with Transformers.js

$5-20

Claude API

Based on usage (~$3/M tokens)

Total: $10-25/month

For a production RAG chatbot with custom knowledge base and streaming AI responses.

Want Us to Build Your RAG Chatbot?

We built SK TaxGPT in a single day. Custom AI chatbots for your business starting at $3,000.