Designing a Multi-Tenant Data Platform

Multi-tenancy is one of those infrastructure concerns that seems straightforward until you start implementing it. The naive approach — a separate database per tenant — is simple to reason about but creates operational nightmares at scale. The shared-database approach is operationally clean but demands rigorous isolation guarantees. This post explains how Cupel achieves strong tenant isolation using a shared PostgreSQL database with Row-Level Security, per-organization Temporal task queues, Kubernetes namespace isolation, and layered encryption.

The Spectrum of Multi-Tenant Isolation

Multi-tenant architectures exist on a spectrum. At one end is full infrastructure isolation: each tenant gets their own database server, application instances, and network boundary. At the other end is complete sharing: all tenants share everything, separated only by a tenant_id column. Most production systems land somewhere in between.

Database-Per-Tenant: The Operational Tax

A separate database per tenant provides strong isolation by default. Tenant A's queries cannot accidentally touch Tenant B's data, even if application code has bugs. But the operational costs compound quickly:

Connection pool management scales linearly with tenant count. At 200 tenants with 20 connections each, you manage 4,000 database connections.
Schema migrations must be applied to every database individually. A migration that takes 30 seconds per database takes over an hour across 200 tenants.
Cross-tenant analytics and internal reporting require federated queries or ETL pipelines that aggregate data from all databases.
Monitoring, alerting, and backup strategies multiply by the number of tenants.

Shared Database with RLS: The Cupel Approach

Cupel uses a single PostgreSQL database with Row-Level Security (RLS) policies that enforce tenant isolation at the database layer. Every table that contains tenant-specific data includes an org_id column, and RLS policies ensure that queries only return rows belonging to the authenticated organization.

Row-Level Security is enforced by the PostgreSQL engine itself, not by the application. Even if application code omits a WHERE clause filtering by org_id, the database will only return rows matching the current session's tenant context. This is a defense-in-depth layer that operates below the application.

PostgreSQL RLS Implementation

The core pattern involves three pieces: session variables, RLS policies, and a session factory function that wires them together.

The Session Factory

Every database operation in Cupel goes through get_tenant_session(), which creates a SQLAlchemy session with the tenant context set as a PostgreSQL session variable:

from sqlalchemy import text
from sqlalchemy.orm import Session
from contextlib import contextmanager

@contextmanager
def get_tenant_session(org_id: str, db_engine) -> Session:
    """Create a database session scoped to a specific organization.

    Sets the PostgreSQL session variable `app.current_org_id` which is
    referenced by all RLS policies. Every query executed within this
    session is automatically filtered to the specified organization.

    Args:
        org_id: The organization identifier to scope this session to.
        db_engine: SQLAlchemy engine instance.

    Yields:
        A SQLAlchemy Session with RLS context active.

    Raises:
        ValueError: If org_id is empty or None.
    """
    if not org_id:
        raise ValueError("org_id is required for tenant-scoped sessions")

    with Session(db_engine) as session:
        # Set the session variable that RLS policies reference
        session.execute(
            text("SET app.current_org_id = :org_id"),
            {"org_id": org_id},
        )
        try:
            yield session
            session.commit()
        except Exception:
            session.rollback()
            raise

The RLS Policy Pattern

Each table with tenant data has an RLS policy that references the session variable:

-- Enable RLS on the pipelines table
ALTER TABLE pipelines ENABLE ROW LEVEL SECURITY;

-- Force RLS even for table owners (prevents accidental bypass)
ALTER TABLE pipelines FORCE ROW LEVEL SECURITY;

-- Policy: users can only see rows belonging to their organization
CREATE POLICY tenant_isolation_policy ON pipelines
    USING (org_id = current_setting('app.current_org_id')::uuid);

-- Same pattern applied to all tenant-scoped tables
ALTER TABLE pipeline_runs ENABLE ROW LEVEL SECURITY;
ALTER TABLE pipeline_runs FORCE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation_policy ON pipeline_runs
    USING (org_id = current_setting('app.current_org_id')::uuid);

ALTER TABLE quality_results ENABLE ROW LEVEL SECURITY;
ALTER TABLE quality_results FORCE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation_policy ON quality_results
    USING (org_id = current_setting('app.current_org_id')::uuid);

The FORCE ROW LEVEL SECURITY directive is critical. Without it, table owners (typically the role used by the application) bypass RLS policies entirely. This is a common misconfiguration that silently breaks tenant isolation.

Handling Administrative Queries

Some operations require cross-tenant access: billing aggregation, platform monitoring, compliance auditing. These use a separate database role with explicit RLS bypass privileges:

-- Admin role that bypasses RLS for cross-tenant operations
CREATE ROLE cupel_admin NOLOGIN;
ALTER ROLE cupel_admin SET row_security = off;

-- Application role used by the API — RLS always active
CREATE ROLE cupel_app NOLOGIN;
ALTER ROLE cupel_app SET row_security = on;

The API gateway exclusively uses cupel_app. Administrative services use cupel_admin through a separate, audited code path.

Temporal Workflow Isolation

Database isolation handles data at rest, but workflow execution also needs isolation. Cupel uses per-organization Temporal task queues to ensure that one tenant's pipeline workload cannot starve another tenant's workers.

def get_task_queue(org_id: str) -> str:
    """Return the Temporal task queue name for an organization.

    Each organization's workflows execute on a dedicated task queue,
    serviced by workers provisioned for that organization. This prevents
    noisy-neighbor effects where one tenant's heavy workload delays
    another tenant's pipeline execution.

    Args:
        org_id: The organization identifier.

    Returns:
        Task queue name in the format 'cupel-hosted-{org_id}'.
    """
    return f"cupel-hosted-{org_id}"

Workers are provisioned per task queue, and each queue has its own rate limits and concurrency controls. If Organization A submits 500 pipeline runs simultaneously, only Organization A's queue backs up. Organization B's pipelines continue executing on their dedicated workers without delay.

Per-org task queues also provide a natural observability boundary. When investigating a performance issue, operators can inspect a single organization's task queue depth, worker utilization, and workflow latency without filtering through cross-tenant noise.

Kubernetes Namespace Isolation

For the Cupel Hosted deployment model, each organization runs within its own Kubernetes namespace. This provides process-level isolation, resource limits, and network boundaries:

# Per-org namespace with resource quotas
apiVersion: v1
kind: Namespace
metadata:
  name: cupel-org-acme-corp
  labels:
    cupel.io/org-id: "acme-corp"
    cupel.io/tier: "enterprise"
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: cupel-org-acme-corp
spec:
  hard:
    requests.cpu: "32"
    requests.memory: 128Gi
    limits.cpu: "64"
    limits.memory: 256Gi
    pods: "100"
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-cross-org-traffic
  namespace: cupel-org-acme-corp
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              cupel.io/org-id: "acme-corp"
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              cupel.io/org-id: "acme-corp"
    - to:
        - namespaceSelector:
            matchLabels:
              cupel.io/component: "control-plane"

The NetworkPolicy ensures that pods in one organization's namespace cannot communicate with pods in another organization's namespace. The only permitted cross-namespace traffic is to the shared control plane.

Encryption Key Isolation

Data at rest is encrypted using per-organization KMS keys. Each organization has a dedicated AWS KMS data key used for encrypting their staging data, connector credentials, and sensitive configuration:

def get_org_encryption_key(org_id: str) -> str:
    """Retrieve the KMS key ARN for an organization.

    Each organization has a dedicated KMS key for encrypting their data
    at rest. This ensures that a key compromise for one organization
    does not expose another organization's data.

    Args:
        org_id: The organization identifier.

    Returns:
        AWS KMS key ARN for the organization.
    """
    return f"arn:aws:kms:us-east-1:123456789:key/cupel-org-{org_id}"

Enterprise customers on the Cupel Hosted plan can bring their own KMS keys (BYOK), giving them complete control over their encryption keys and the ability to revoke Cupel's access to their data at any time.

Redis Key Isolation

Redis is used for schema caching, real-time pipeline status, and session management. All Redis keys are prefixed with the organization ID to prevent key collisions and enable per-org cache invalidation:

def redis_key(org_id: str, namespace: str, key: str) -> str:
    """Generate a tenant-scoped Redis key.

    All Redis keys follow the pattern org:{org_id}:{namespace}:{key}
    to ensure complete key space isolation between organizations.

    Args:
        org_id: The organization identifier.
        namespace: Logical grouping (e.g., 'cache', 'status', 'session').
        key: The specific key within the namespace.

    Returns:
        Fully qualified Redis key string.
    """
    return f"org:{org_id}:{namespace}:{key}"

The Defense-in-Depth Model

No single isolation mechanism is sufficient on its own. RLS prevents data leaks at the database layer. Per-org task queues prevent workload interference at the orchestration layer. Kubernetes namespaces prevent process-level cross-contamination. Per-org KMS keys prevent encryption key sharing. Redis key prefixing prevents cache poisoning.

Each layer guards against different failure modes. If application code has a bug that omits tenant filtering, RLS catches it. If a worker process is compromised, Kubernetes network policies limit lateral movement. If a KMS key is rotated or revoked, only the affected organization's data is impacted.

The shared-database approach requires more engineering discipline than database-per-tenant, but it provides significantly better operational characteristics at scale. Schema migrations are applied once. Connection pools are shared. Cross-tenant analytics run against a single database. The isolation guarantees come from the enforcement layers described above, not from physical separation.

What This Means for Cupel Users

Organizations using Cupel operate in complete isolation from other tenants without managing separate infrastructure. Teams within an organization benefit from shared resources (component libraries, data product catalogs) while maintaining workspace-level isolation enforced by the same RLS pattern. The multi-layered isolation model means that even in the shared SaaS deployment, tenant data boundaries are enforced by the database engine itself, not by application-level filtering that can be bypassed by bugs or misconfigurations.