---
sidebar_position: 4
---

# Lakebase plugin

Provides a PostgreSQL connection pool for Databricks Lakebase Autoscaling with automatic OAuth token refresh.

**Key features:**
- Standard `pg.Pool` compatible with any PostgreSQL library or ORM
- Automatic OAuth token refresh (1-hour tokens, 2-minute refresh buffer)
- Token caching to minimize API calls
- Built-in OpenTelemetry instrumentation (query duration, pool connections, token refresh)
- AppKit logger configured by default for query and connection events

## Getting started with the Lakebase

The easiest way to get started with the Lakebase plugin is to use the Databricks CLI to create a new Databricks app with AppKit installed and the Lakebase plugin.

### Prerequisites

- [Node.js](https://nodejs.org) v22+ environment with `npm`
- Databricks CLI (v1.0.0 or higher): install and configure it according to the [official tutorial](https://docs.databricks.com/aws/en/dev-tools/cli/tutorial).
- A new Databricks app with AppKit installed. See [Bootstrap a new Databricks app](../index.md#quick-start-options) for more details.

### Steps

1. Firstly, create a new Lakebase Postgres Autoscaling project according to the [Get started documentation](https://docs.databricks.com/aws/en/oltp/projects/get-started).
1. To add the Lakebase plugin to your project, run the `databricks apps init` command and interactively select the **Lakebase** plugin. The CLI will guide you through picking a Lakebase project, branch, and database.
    - When asked, select **Yes** to deploy the app to Databricks Apps right after its creation.

## Basic usage

```ts
import { createApp, lakebase, server } from "@databricks/appkit";

await createApp({
  plugins: [server(), lakebase()],
});
```

## Accessing the pool

After initialization, access Lakebase through the `AppKit.lakebase` object:

```ts
const AppKit = await createApp({
  plugins: [server(), lakebase()],
});

await AppKit.lakebase.query(`CREATE SCHEMA IF NOT EXISTS app`);

await AppKit.lakebase.query(`CREATE TABLE IF NOT EXISTS app.orders (
  id SERIAL PRIMARY KEY,
  user_id VARCHAR(255) NOT NULL,
  amount DECIMAL(10, 2) NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)`);

const result = await AppKit.lakebase.query(
  "SELECT * FROM app.orders WHERE user_id = $1",
  [userId],
);

// Raw pg.Pool (for ORMs or advanced usage)
const pool = AppKit.lakebase.pool;

// ORM-ready config objects
const ormConfig = AppKit.lakebase.getOrmConfig();  // { host, port, database, ... }
const pgConfig = AppKit.lakebase.getPgConfig();    // pg.PoolConfig
```

## Configuration

### Environment variables

The required environment variables are:

| Variable | Description |
|---|---|
| `LAKEBASE_ENDPOINT` | Endpoint resource path (e.g. `projects/.../branches/.../endpoints/...`) |
| `PGHOST` | Lakebase host (auto-injected in production by the `postgres` Databricks Apps resource) |
| `PGDATABASE` | Database name (auto-injected in production by the `postgres` Databricks Apps resource) |
| `PGSSLMODE` | TLS mode - set to `require` (auto-injected in production by the `postgres` Databricks Apps resource) |

When deployed to Databricks Apps with a `postgres` database resource configured, `PGHOST`, `PGDATABASE`, `PGSSLMODE`, `PGUSER`, `PGPORT`, and `PGAPPNAME` are automatically injected by the platform. Only `LAKEBASE_ENDPOINT` must be set explicitly:

```yaml
env:
  - name: LAKEBASE_ENDPOINT
    valueFrom: postgres
```

For local development, the `.env` file is automatically generated by `databricks apps init` with the correct values for your Lakebase project.

For the full configuration reference (SSL, pool size, timeouts, logging, ORM examples), see the [`@databricks/lakebase` README](https://github.com/databricks/appkit/blob/main/packages/lakebase/README.md).

### Pool configuration

Pass a `pool` object to override any defaults:

```ts
await createApp({
  plugins: [
    lakebase({
      pool: {
        max: 10,                      // Max pool connections (default: 10)
        connectionTimeoutMillis: 5000, // Connection timeout ms (default: 10000)
        idleTimeoutMillis: 30000,      // Idle connection timeout ms (default: 30000)
      },
    }),
  ],
});
```

## On-Behalf-Of (OBO) — per-user connections

When your app needs Row-Level Security (RLS) or per-user data isolation, use `asUser(req)` to execute queries using a per-user Lakebase connection pool. Each user's pool is authenticated with their Databricks identity, so PostgreSQL's `current_user` reflects the actual user.

### Prerequisites

1. **Enable user authorization** in your Databricks App with the **`postgres`** scope. See [User authorization](https://docs.databricks.com/aws/en/dev-tools/databricks-apps/auth#user-authorization) for setup instructions. In your `databricks.yml`:
   ```yaml
   resources:
     apps:
       app:
         user_api_scopes:
           - postgres
   ```
   Apps scaffolded with `databricks apps init` and the Lakebase plugin include this automatically.

2. Each app user needs a **Postgres role** in Lakebase. Create one with the Databricks CLI:

   ```bash
   databricks postgres create-role "projects/{project_id}/branches/{branch_id}" \
     --json '{"spec": {"identity_type": "USER", "postgres_role": "user@example.com"}}'
   ```

   Alternatively, create roles in the Lakebase UI under **Branch Overview** → **Add role**.

   :::note
   Do not grant `databricks_superuser` to OBO users — superusers bypass RLS. Use [fine-grained grants](#fine-grained-permissions) instead.
   :::

### Usage

No configuration needed — just call `asUser(req)`:

```ts
const AppKit = await createApp({
  plugins: [server(), lakebase()],
});

// Service principal query (default — bypasses RLS as table owner)
const all = await AppKit.lakebase.query("SELECT * FROM app.orders");

// User-scoped query (per-user pool, RLS enforced)
app.get("/api/my-orders", async (req, res) => {
  const result = await AppKit.lakebase
    .asUser(req)
    .query("SELECT * FROM app.orders ORDER BY created_at DESC");
  res.json(result.rows);
});
```

When `asUser(req)` is called:
1. The user's token and identity are extracted from `x-forwarded-access-token` and `x-forwarded-email` headers (set automatically by Databricks Apps).
2. A per-user `pg.Pool` is created (or reused) with the user's OAuth credentials.
3. `query()` and `pool` use the user's pool — `current_user` in PostgreSQL reflects the user's identity.

### Row-Level Security example

```sql
-- As the service principal (during app setup):
ALTER TABLE app.orders ENABLE ROW LEVEL SECURITY;

CREATE POLICY user_orders ON app.orders
  FOR ALL TO PUBLIC
  USING (owner = current_user);

-- Grant access so OBO users can query
GRANT USAGE ON SCHEMA app TO PUBLIC;
GRANT SELECT, INSERT ON ALL TABLES IN SCHEMA app TO PUBLIC;
```

### How it works

- The **service principal pool** (`AppKit.lakebase.pool`) is always created and used for DDL operations, seeding, and admin queries.
- **Per-user pools** are created on the first `asUser(req)` call and cached by user identity. Each pool has its own OAuth token refresh cycle.
- Idle connections within per-user pools close automatically (30s idle timeout). Empty pool objects are cleaned up periodically.
- On shutdown, all pools (SP + user) are closed gracefully.
- In development mode (`NODE_ENV=development`), if no user token is available, `asUser(req)` falls back to the SP pool with a warning.

:::caution[RLS and superusers]
PostgreSQL superusers bypass Row-Level Security entirely. Users with the `databricks_superuser` role will see all rows regardless of RLS policies. For RLS enforcement, use [fine-grained grants](#fine-grained-permissions) instead of the superuser role.
:::

## Database Permissions

When you create the app with the Lakebase resource using the [Getting started](#getting-started-with-the-lakebase) guide, the Service Principal is automatically granted `CONNECT_AND_CREATE` permission on the `postgres` resource. This lets the Service Principal connect to the database and create new objects, but **not access any existing schemas or tables.**

### Local development

To develop locally against a deployed Lakebase database:

1. **Deploy the app first.** The Service Principal creates the database schema and tables on first deploy. Apps generated from `databricks apps init` handle this automatically - they check if tables exist on startup and skip creation if they do.

2. **Grant `databricks_superuser`** (skip if you are the Lakebase project owner — you already have full access):

   ```bash
   # Create a new role with databricks_superuser
   databricks postgres create-role "projects/{project_id}/branches/{branch_id}" \
     --json '{"spec": {"identity_type": "USER", "postgres_role": "user@example.com", "membership_roles": ["DATABRICKS_SUPERUSER"]}}'
   ```

   To grant superuser to an existing role, use [`update-role`](https://docs.databricks.com/aws/en/dev-tools/cli/reference/postgres-commands#databricks-postgres-update-role):

   ```bash
   databricks postgres update-role \
     "projects/{project_id}/branches/{branch_id}/roles/{role_id}" \
     "spec.membership_roles" \
     --json '{"spec": {"membership_roles": ["DATABRICKS_SUPERUSER"]}}'
   ```

   Alternatively, you can manage roles in the Lakebase Autoscaling UI under your project's **Branch Overview** page → **Add role** / **Edit role**.

3. **Run locally** - your Databricks user identity (email) is used for OAuth authentication. The `databricks_superuser` role gives full **DML access** (read/write data) but **not DDL** (creating schemas or tables) - that's why deploying first matters (see note below).

For other users, repeat step 2 to create an OAuth role with `databricks_superuser` for each user.

:::tip
[Postgres password authentication](https://docs.databricks.com/aws/en/oltp/projects/authentication#overview) is a simpler alternative that avoids OAuth role permission complexity. However, it requires you to set up a password for the user in the **Branch Overview** page in the Lakebase Autoscaling UI.
:::

:::info[Why deploy first?]
When the app is deployed, the Service Principal creates schemas and tables and becomes their owner. `databricks_superuser` gives full DML access (read/write) but not DDL, so local development works only after the schema exists.

If you run `npm run dev` first, your credentials own the schema and the deployed app hits `permission denied`. To recover, export any data first (`pg_dump` or a temporary schema copy), then drop the schema and redeploy. After redeploying, the Service Principal recreates the schema on startup. (PostgreSQL schema ownership is tied to the role that created it and cannot be reassigned by regular users.)
:::

### Fine-grained permissions

For most use cases, `databricks_superuser` is sufficient. If you need schema-level grants instead, refer to the official documentation:

- [Manage database permissions](https://docs.databricks.com/aws/en/oltp/projects/manage-roles-permissions)
- [Postgres roles](https://docs.databricks.com/aws/en/oltp/projects/postgres-roles)

<details>
<summary>SQL script for fine-grained grants</summary>

Deploy and run the app at least once before executing these grants so the Service Principal initializes the database schema first.

Replace `subject` with the user email and `schema` with your schema name:

```sql
CREATE EXTENSION IF NOT EXISTS databricks_auth;

DO $$
DECLARE
  subject TEXT := 'your-subject';  -- User email like name@databricks.com
  schema TEXT := 'your_schema'; -- Replace 'your_schema' with your schema name
BEGIN
  -- Create OAuth role for the Databricks identity
  PERFORM databricks_create_role(subject, 'USER');

  -- Connection and schema access
  EXECUTE format('GRANT CONNECT ON DATABASE "databricks_postgres" TO %I', subject);
  EXECUTE format('GRANT ALL ON SCHEMA %s TO %I', schema, subject);

  -- Privileges on existing objects
  EXECUTE format('GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA %s TO %I', schema, subject);
  EXECUTE format('GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA %s TO %I', schema, subject);
  EXECUTE format('GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA %s TO %I', schema, subject);
  EXECUTE format('GRANT ALL PRIVILEGES ON ALL PROCEDURES IN SCHEMA %s TO %I', schema, subject);

  -- Default privileges on future objects
  EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA %s GRANT ALL ON TABLES TO %I', schema, subject);
  EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA %s GRANT ALL ON SEQUENCES TO %I', schema, subject);
  EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA %s GRANT ALL ON FUNCTIONS TO %I', schema, subject);
  EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA %s GRANT ALL ON ROUTINES TO %I', schema, subject);
END $$;
```

</details>
