Next.js 14 + Supabase pgvector + Claude AI. Build production-ready AI chatbots with custom knowledge bases in under a day.
4-6 hrs
Build Time
$10-25
Monthly Cost
~500
Lines of Code
5
Dependencies
RAG (Retrieval-Augmented Generation) is a technique that combines the power of large language models with your own custom knowledge base. Instead of relying solely on what the AI was trained on, RAG retrieves relevant information from your documents and includes it in the prompt.
This means you can build chatbots that answer questions about your specific domain - whether that's company policies, product documentation, legal guidelines, or tax regulations.
Real Example: We built SK TaxGPT in a day - a chatbot that answers Saskatchewan small business tax questions using 17 custom knowledge documents with 70+ chunks of tax information.
RAG
Fine-Tuning
For most use cases, RAG is the better choice. Fine-tuning is only worth it when you need to change the model's behavior or tone fundamentally.
User Question
Chat UI
Generate Embedding
all-MiniLM-L6-v2
Vector Search
Supabase pgvector
Retrieve Context
Top 6 chunks
Claude AI Response
Streaming
App Router, API Routes, Server Components
PostgreSQL + pgvector extension
useChat hook, streaming responses
Claude Sonnet for intelligent responses
First, create a Supabase project and enable the pgvector extension. This gives you a PostgreSQL database with vector similarity search capabilities.
-- Run in Supabase SQL Editor CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE tax_documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding vector(384), -- MiniLM dimensions
metadata JSONB DEFAULT '{}',
province TEXT DEFAULT 'SK',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create index for fast similarity search
CREATE INDEX ON tax_documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);CREATE OR REPLACE FUNCTION match_tax_documents(
query_embedding vector(384),
match_threshold float DEFAULT 0.3,
match_count int DEFAULT 6,
filter_province text DEFAULT NULL
)
RETURNS TABLE (
id UUID,
content TEXT,
metadata JSONB,
similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY
SELECT
tax_documents.id,
tax_documents.content,
tax_documents.metadata,
1 - (tax_documents.embedding <=> query_embedding) AS similarity
FROM tax_documents
WHERE 1 - (tax_documents.embedding <=> query_embedding) > match_threshold
AND (filter_province IS NULL OR tax_documents.province = filter_province)
ORDER BY similarity DESC
LIMIT match_count;
END;
$$;Write markdown files containing your domain knowledge. Structure them with clear headers and sections - this helps the chunking process preserve context.
# Home Office Deduction Guide ## Overview If you work from home, you may be able to deduct a portion of your home expenses... ## Eligibility Requirements You can claim home office expenses if either: 1. Your home is your principal place of business 2. You regularly meet clients at your home ## Deductible Expenses - Heat, electricity, water - Home insurance - Property taxes - Internet (business portion) - Mortgage interest (NOT principal) ## Calculation Method Square footage of office / Total home square footage × Eligible expenses = Deduction ## Sources - CRA Folio S4-F2-C2 - Form T2125
The quality of your RAG system is directly tied to your knowledge documents. Write clear, well-structured content with authoritative sources. Include specific numbers, rules, and examples. The more comprehensive your documents, the better your chatbot answers.
Create a script that reads your markdown files, chunks them into smaller pieces, generates embeddings, and stores them in Supabase.
import { createClient } from '@supabase/supabase-js';
import { pipeline } from '@xenova/transformers';
import * as fs from 'fs';
import * as path from 'path';
const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_KEY!
);
// Load embedding model
const embedder = await pipeline(
'feature-extraction',
'Xenova/all-MiniLM-L6-v2'
);
async function generateEmbedding(text: string) {
const output = await embedder(text, {
pooling: 'mean',
normalize: true
});
return Array.from(output.data);
}
// Simple chunking function
function chunkText(text: string, maxChars = 1500) {
const sections = text.split(/\n## /);
const chunks: string[] = [];
for (const section of sections) {
if (section.length <= maxChars) {
chunks.push(section);
} else {
// Split long sections by paragraphs
const paragraphs = section.split('\n\n');
let current = '';
for (const para of paragraphs) {
if ((current + para).length > maxChars) {
if (current) chunks.push(current);
current = para;
} else {
current += (current ? '\n\n' : '') + para;
}
}
if (current) chunks.push(current);
}
}
return chunks;
}
async function ingestFile(filePath: string) {
const content = fs.readFileSync(filePath, 'utf-8');
const fileName = path.basename(filePath, '.md');
const chunks = chunkText(content);
for (const chunk of chunks) {
const embedding = await generateEmbedding(chunk);
await supabase.from('tax_documents').insert({
content: chunk,
embedding,
metadata: {
source: fileName,
title: fileName.replace(/-/g, ' ')
}
});
}
console.log(`Ingested ${chunks.length} chunks from ${fileName}`);
}
// Run ingestion
const dataDir = './app/taxgpt/data';
const files = fs.readdirSync(dataDir, { recursive: true })
.filter(f => f.toString().endsWith('.md'));
for (const file of files) {
await ingestFile(path.join(dataDir, file.toString()));
}We use @xenova/transformers with all-MiniLM-L6-v2 for embeddings. This runs locally with no API costs. The trade-off is 384 dimensions instead of OpenAI's 1536, but for most RAG use cases, the quality is excellent and retrieval is just as accurate.
The API route is where the magic happens: receive the user's question, generate an embedding, search for relevant chunks, build a context-enriched prompt, and stream the AI response.
import { anthropic } from '@ai-sdk/anthropic';
import { streamText, convertToModelMessages, UIMessage } from 'ai';
import { createClient } from '@supabase/supabase-js';
import { pipeline } from '@xenova/transformers';
export const runtime = 'nodejs';
export const maxDuration = 60;
const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_KEY!
);
// Singleton embedder
let embedder: any = null;
async function getEmbedder() {
if (!embedder) {
embedder = await pipeline(
'feature-extraction',
'Xenova/all-MiniLM-L6-v2'
);
}
return embedder;
}
export async function POST(req: Request) {
const { messages }: { messages: UIMessage[] } = await req.json();
// Get last message text
const lastMessage = messages[messages.length - 1];
const query = lastMessage.parts
?.filter(p => p.type === 'text')
.map(p => p.text)
.join('') || '';
// 1. Generate embedding for user question
const embed = await getEmbedder();
const output = await embed(query, {
pooling: 'mean',
normalize: true
});
const embedding = Array.from(output.data);
// 2. Search for relevant chunks
const { data: docs } = await supabase.rpc('match_tax_documents', {
query_embedding: embedding,
match_threshold: 0.3,
match_count: 6
});
// 3. Build context
const context = docs
?.map(d => `[Source: ${d.metadata?.source}]\n${d.content}`)
.join('\n\n---\n\n') || '';
// 4. System prompt with context
const systemPrompt = `You are a helpful assistant.
## Retrieved Documents
${context || 'No relevant documents found.'}
Answer based on the retrieved documents. If uncertain,
say so and recommend consulting a professional.`;
// 5. Stream response
const result = streamText({
model: anthropic('claude-sonnet-4-20250514'),
system: systemPrompt,
messages: await convertToModelMessages(messages),
});
return result.toUIMessageStreamResponse();
}Use the Vercel AI SDK's useChat hook for a smooth chat experience with streaming responses.
'use client';
import { useChat } from '@ai-sdk/react';
import { DefaultChatTransport } from 'ai';
import { useState } from 'react';
export default function ChatPage() {
const { messages, sendMessage, status } = useChat({
transport: new DefaultChatTransport({
api: '/api/chat',
}),
});
const [input, setInput] = useState('');
const isLoading = status === 'streaming';
const handleSubmit = (e: React.FormEvent) => {
e.preventDefault();
if (!input.trim() || isLoading) return;
sendMessage({ text: input.trim() });
setInput('');
};
// Extract text from message parts (AI SDK v6)
const getContent = (msg: typeof messages[0]) => {
return msg.parts
.filter(p => p.type === 'text')
.map(p => p.text)
.join('');
};
return (
<div className="max-w-2xl mx-auto p-4">
<div className="space-y-4 mb-4">
{messages.map(m => (
<div
key={m.id}
className={`p-3 rounded ${
m.role === 'user'
? 'bg-blue-100 ml-auto'
: 'bg-gray-100'
}`}
>
{getContent(m)}
</div>
))}
</div>
<form onSubmit={handleSubmit} className="flex gap-2">
<input
value={input}
onChange={e => setInput(e.target.value)}
placeholder="Ask a question..."
className="flex-1 p-2 border rounded"
disabled={isLoading}
/>
<button
type="submit"
disabled={isLoading}
className="px-4 py-2 bg-blue-600 text-white rounded"
>
Send
</button>
</form>
</div>
);
}$0
Supabase
Free tier: 500MB + 50K requests
$0
Vercel
Hobby tier for personal projects
$0
Embeddings
Local with Transformers.js
$5-20
Claude API
Based on usage (~$3/M tokens)
Total: $10-25/month
For a production RAG chatbot with custom knowledge base and streaming AI responses.
We built SK TaxGPT in a single day. Custom AI chatbots for your business starting at $3,000.
Related Resources