Skip to main content

Lakebase plugin

Provides a PostgreSQL connection pool for Databricks Lakebase Autoscaling with automatic OAuth token refresh.

Key features:

  • Standard pg.Pool compatible with any PostgreSQL library or ORM
  • Automatic OAuth token refresh (1-hour tokens, 2-minute refresh buffer)
  • Token caching to minimize API calls
  • Built-in OpenTelemetry instrumentation (query duration, pool connections, token refresh)
  • AppKit logger configured by default for query and connection events

Getting started with the Lakebase

The easiest way to get started with the Lakebase plugin is to use the Databricks CLI to create a new Databricks app with AppKit installed and the Lakebase plugin.

Prerequisites

Steps

  1. Firstly, create a new Lakebase Postgres Autoscaling project according to the Get started documentation.
  2. To add the Lakebase plugin to your project, run the databricks apps init command and interactively select the Lakebase plugin. The CLI will guide you through picking a Lakebase project, branch, and database.
    • When asked, select Yes to deploy the app to Databricks Apps right after its creation.

Basic usage

import { createApp, lakebase, server } from "@databricks/appkit";

await createApp({
plugins: [server(), lakebase()],
});

Accessing the pool

After initialization, access Lakebase through the AppKit.lakebase object:

const AppKit = await createApp({
plugins: [server(), lakebase()],
});

await AppKit.lakebase.query(`CREATE SCHEMA IF NOT EXISTS app`);

await AppKit.lakebase.query(`CREATE TABLE IF NOT EXISTS app.orders (
id SERIAL PRIMARY KEY,
user_id VARCHAR(255) NOT NULL,
amount DECIMAL(10, 2) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)`);

const result = await AppKit.lakebase.query(
"SELECT * FROM app.orders WHERE user_id = $1",
[userId],
);

// Raw pg.Pool (for ORMs or advanced usage)
const pool = AppKit.lakebase.pool;

// ORM-ready config objects
const ormConfig = AppKit.lakebase.getOrmConfig(); // { host, port, database, ... }
const pgConfig = AppKit.lakebase.getPgConfig(); // pg.PoolConfig

Configuration

Environment variables

The required environment variables are:

VariableDescription
LAKEBASE_ENDPOINTEndpoint resource path (e.g. projects/.../branches/.../endpoints/...)
PGHOSTLakebase host (auto-injected in production by the postgres Databricks Apps resource)
PGDATABASEDatabase name (auto-injected in production by the postgres Databricks Apps resource)
PGSSLMODETLS mode - set to require (auto-injected in production by the postgres Databricks Apps resource)

When deployed to Databricks Apps with a postgres database resource configured, PGHOST, PGDATABASE, PGSSLMODE, PGUSER, PGPORT, and PGAPPNAME are automatically injected by the platform. Only LAKEBASE_ENDPOINT must be set explicitly:

env:
- name: LAKEBASE_ENDPOINT
valueFrom: postgres

For local development, the .env file is automatically generated by databricks apps init with the correct values for your Lakebase project.

For the full configuration reference (SSL, pool size, timeouts, logging, ORM examples), see the @databricks/lakebase README.

Pool configuration

Pass a pool object to override any defaults:

await createApp({
plugins: [
lakebase({
pool: {
max: 10, // Max pool connections (default: 10)
connectionTimeoutMillis: 5000, // Connection timeout ms (default: 10000)
idleTimeoutMillis: 30000, // Idle connection timeout ms (default: 30000)
},
}),
],
});

On-Behalf-Of (OBO) — per-user connections

When your app needs Row-Level Security (RLS) or per-user data isolation, use asUser(req) to execute queries using a per-user Lakebase connection pool. Each user's pool is authenticated with their Databricks identity, so PostgreSQL's current_user reflects the actual user.

Prerequisites

  1. Enable user authorization in your Databricks App with the postgres scope. See User authorization for setup instructions. In your databricks.yml:

    resources:
    apps:
    app:
    user_api_scopes:
    - postgres

    Apps scaffolded with databricks apps init and the Lakebase plugin include this automatically.

  2. Each app user needs a Postgres role in Lakebase. Create one with the Databricks CLI:

    databricks postgres create-role "projects/{project_id}/branches/{branch_id}" \
    --json '{"spec": {"identity_type": "USER", "postgres_role": "user@example.com"}}'

    Alternatively, create roles in the Lakebase UI under Branch OverviewAdd role.

    note

    Do not grant databricks_superuser to OBO users — superusers bypass RLS. Use fine-grained grants instead.

Usage

No configuration needed — just call asUser(req):

const AppKit = await createApp({
plugins: [server(), lakebase()],
});

// Service principal query (default — bypasses RLS as table owner)
const all = await AppKit.lakebase.query("SELECT * FROM app.orders");

// User-scoped query (per-user pool, RLS enforced)
app.get("/api/my-orders", async (req, res) => {
const result = await AppKit.lakebase
.asUser(req)
.query("SELECT * FROM app.orders ORDER BY created_at DESC");
res.json(result.rows);
});

When asUser(req) is called:

  1. The user's token and identity are extracted from x-forwarded-access-token and x-forwarded-email headers (set automatically by Databricks Apps).
  2. A per-user pg.Pool is created (or reused) with the user's OAuth credentials.
  3. query() and pool use the user's pool — current_user in PostgreSQL reflects the user's identity.

Row-Level Security example

-- As the service principal (during app setup):
ALTER TABLE app.orders ENABLE ROW LEVEL SECURITY;

CREATE POLICY user_orders ON app.orders
FOR ALL TO PUBLIC
USING (owner = current_user);

-- Grant access so OBO users can query
GRANT USAGE ON SCHEMA app TO PUBLIC;
GRANT SELECT, INSERT ON ALL TABLES IN SCHEMA app TO PUBLIC;

How it works

  • The service principal pool (AppKit.lakebase.pool) is always created and used for DDL operations, seeding, and admin queries.
  • Per-user pools are created on the first asUser(req) call and cached by user identity. Each pool has its own OAuth token refresh cycle.
  • Idle connections within per-user pools close automatically (30s idle timeout). Empty pool objects are cleaned up periodically.
  • On shutdown, all pools (SP + user) are closed gracefully.
  • In development mode (NODE_ENV=development), if no user token is available, asUser(req) falls back to the SP pool with a warning.
RLS and superusers

PostgreSQL superusers bypass Row-Level Security entirely. Users with the databricks_superuser role will see all rows regardless of RLS policies. For RLS enforcement, use fine-grained grants instead of the superuser role.

Database Permissions

When you create the app with the Lakebase resource using the Getting started guide, the Service Principal is automatically granted CONNECT_AND_CREATE permission on the postgres resource. This lets the Service Principal connect to the database and create new objects, but not access any existing schemas or tables.

Local development

To develop locally against a deployed Lakebase database:

  1. Deploy the app first. The Service Principal creates the database schema and tables on first deploy. Apps generated from databricks apps init handle this automatically - they check if tables exist on startup and skip creation if they do.

  2. Grant databricks_superuser (skip if you are the Lakebase project owner — you already have full access):

    # Create a new role with databricks_superuser
    databricks postgres create-role "projects/{project_id}/branches/{branch_id}" \
    --json '{"spec": {"identity_type": "USER", "postgres_role": "user@example.com", "membership_roles": ["DATABRICKS_SUPERUSER"]}}'

    To grant superuser to an existing role, use update-role:

    databricks postgres update-role \
    "projects/{project_id}/branches/{branch_id}/roles/{role_id}" \
    "spec.membership_roles" \
    --json '{"spec": {"membership_roles": ["DATABRICKS_SUPERUSER"]}}'

    Alternatively, you can manage roles in the Lakebase Autoscaling UI under your project's Branch Overview page → Add role / Edit role.

  3. Run locally - your Databricks user identity (email) is used for OAuth authentication. The databricks_superuser role gives full DML access (read/write data) but not DDL (creating schemas or tables) - that's why deploying first matters (see note below).

For other users, repeat step 2 to create an OAuth role with databricks_superuser for each user.

tip

Postgres password authentication is a simpler alternative that avoids OAuth role permission complexity. However, it requires you to set up a password for the user in the Branch Overview page in the Lakebase Autoscaling UI.

Why deploy first?

When the app is deployed, the Service Principal creates schemas and tables and becomes their owner. databricks_superuser gives full DML access (read/write) but not DDL, so local development works only after the schema exists.

If you run npm run dev first, your credentials own the schema and the deployed app hits permission denied. To recover, export any data first (pg_dump or a temporary schema copy), then drop the schema and redeploy. After redeploying, the Service Principal recreates the schema on startup. (PostgreSQL schema ownership is tied to the role that created it and cannot be reassigned by regular users.)

Fine-grained permissions

For most use cases, databricks_superuser is sufficient. If you need schema-level grants instead, refer to the official documentation:

SQL script for fine-grained grants

Deploy and run the app at least once before executing these grants so the Service Principal initializes the database schema first.

Replace subject with the user email and schema with your schema name:

CREATE EXTENSION IF NOT EXISTS databricks_auth;

DO $$
DECLARE
subject TEXT := 'your-subject'; -- User email like name@databricks.com
schema TEXT := 'your_schema'; -- Replace 'your_schema' with your schema name
BEGIN
-- Create OAuth role for the Databricks identity
PERFORM databricks_create_role(subject, 'USER');

-- Connection and schema access
EXECUTE format('GRANT CONNECT ON DATABASE "databricks_postgres" TO %I', subject);
EXECUTE format('GRANT ALL ON SCHEMA %s TO %I', schema, subject);

-- Privileges on existing objects
EXECUTE format('GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA %s TO %I', schema, subject);
EXECUTE format('GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA %s TO %I', schema, subject);
EXECUTE format('GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA %s TO %I', schema, subject);
EXECUTE format('GRANT ALL PRIVILEGES ON ALL PROCEDURES IN SCHEMA %s TO %I', schema, subject);

-- Default privileges on future objects
EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA %s GRANT ALL ON TABLES TO %I', schema, subject);
EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA %s GRANT ALL ON SEQUENCES TO %I', schema, subject);
EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA %s GRANT ALL ON FUNCTIONS TO %I', schema, subject);
EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA %s GRANT ALL ON ROUTINES TO %I', schema, subject);
END $$;