Two ways to use this template
- 1. Click "Copy prompt" below
- 2. Paste into Cursor, Claude Code, Codex, or any coding agent
- 3. Your agent builds the app — it asks questions along the way so the result is exactly what you want
Follow the steps below to set things up manually, at your own pace.
Lakebase pgvector
Enable vector similarity search in Lakebase using the pgvector extension. Covers extension setup, vector table design, insert and cosine retrieval helpers, and IVFFlat/HNSW index options.
Enable vector similarity search in your Lakebase Postgres database using the pgvector extension, with a server-side module for storing and querying embeddings.
When done, you will have:
- The pgvector extension enabled on your Lakebase instance
- A vector embedding table with configurable dimensions
- Server-side functions for inserting documents and performing similarity search
- An IVFFlat or HNSW index for efficient nearest-neighbor queries
Prerequisites
Verify these Databricks workspace features are enabled before starting. If any check fails, ask your workspace admin to enable the feature.
- Databricks CLI authenticated. Run
databricks auth profilesand confirm at least one profile showsValid: YES. If none do, authenticate withdatabricks auth login --host <workspace-url> --profile <PROFILE>. - Lakebase Postgres available. Run
databricks postgres list-projects --profile <PROFILE>and confirm the command succeeds. Anot enablederror means Lakebase is not available to this identity. - A provisioned Lakebase project. Complete the Create a Lakebase Instance template first. You will enable the
vectorextension against its primary endpoint. databricks psqlavailable in your CLI. Rundatabricks psql --helpand confirm the subcommand exists. If it does not, upgrade the Databricks CLI (see Set Up Your Local Dev Environment).
Vector Search with Lakebase and pgvector
Enable vector similarity search in Lakebase using the pgvector extension.
This template assumes you have already completed the Create a Lakebase Instance template and have a Lakebase project provisioned.
1. Enable pgvector
databricks psql --project <project-name> --profile <PROFILE> -- -c "
CREATE EXTENSION IF NOT EXISTS vector;
"
2. Create embedding table
CREATE SCHEMA IF NOT EXISTS vectors;
CREATE TABLE IF NOT EXISTS vectors.documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding VECTOR(1024),
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Vector dimensions:
VECTOR(1024)must match your embedding model's output dimension.databricks-gte-large-enanddatabricks-bge-large-enboth produce 1024 dimensions. If you use a different model (for example, a 768- or 1536-dimension model), changeVECTOR(1024)to match.
3. Server-side vector store module
Create server/lib/vector-store.ts with table setup, insert, and similarity search. Call setupVectorTables(appkit) from server.ts before starting the server.
server/lib/vector-store.ts:
import type { Application } from "express";
interface AppKitWithLakebase {
lakebase: {
query(
text: string,
params?: unknown[],
): Promise<{ rows: Record<string, unknown>[] }>;
};
server: {
extend(fn: (app: Application) => void): void;
};
}
export async function setupVectorTables(appkit: AppKitWithLakebase) {
try {
await appkit.lakebase.query("CREATE EXTENSION IF NOT EXISTS vector");
} catch (err: unknown) {
const code = (err as { code?: string }).code;
if (code === "42501") {
console.log(
"[vectors] Skipping extension creation — insufficient privileges (likely already exists)",
);
} else {
throw err;
}
}
const { rows } = await appkit.lakebase.query(
`SELECT 1 FROM information_schema.tables
WHERE table_schema = 'vectors' AND table_name = 'documents'`,
);
if (rows.length > 0) return;
await appkit.lakebase.query(`CREATE SCHEMA IF NOT EXISTS vectors`);
await appkit.lakebase.query(`
CREATE TABLE IF NOT EXISTS vectors.documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding VECTOR(1024),
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
)
`);
}
export async function insertDocument(
appkit: AppKitWithLakebase,
input: {
content: string;
embedding: number[];
metadata?: Record<string, unknown>;
},
) {
const result = await appkit.lakebase.query(
`INSERT INTO vectors.documents (content, embedding, metadata)
VALUES ($1, $2::vector, $3)
RETURNING id, content, metadata, created_at`,
[
input.content,
JSON.stringify(input.embedding),
JSON.stringify(input.metadata ?? {}),
],
);
return result.rows[0];
}
export async function retrieveSimilar(
appkit: AppKitWithLakebase,
queryEmbedding: number[],
limit = 5,
) {
const result = await appkit.lakebase.query(
`SELECT id, content, metadata, 1 - (embedding <=> $1::vector) AS similarity
FROM vectors.documents
WHERE embedding IS NOT NULL
ORDER BY embedding <=> $1::vector
LIMIT $2`,
[JSON.stringify(queryEmbedding), limit],
);
return result.rows;
}
Distance operators:
<=>cosine (default for text),<->L2,<#>inner product.
4. Create an index
Add after inserting initial data (IVFFlat needs representative data to build):
CREATE INDEX IF NOT EXISTS idx_documents_embedding
ON vectors.documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
ANALYZE vectors.documents;
For higher recall without tuning, use
USING hnsw (embedding vector_cosine_ops)instead.