← Back to posts
comparison guide self-hosted · · 14 min read

Complete Guide to Self-Hosted JupyterHub for Multi-User Data Science 2026

Deploy JupyterHub on your own server to give your team managed, isolated notebook environments. Docker Compose setup, authentication, spawners, and resource management.

OS
Editorial Team

Jupyter notebooks have become the de facto standard for data science, machine learning, and scientific computing. But running individual Jupyter Notebook servers for every team member is a maintenance nightmare. JupyterHub solves this by providing a multi-user server that spawns, manages, and proxies individual notebook instances — all from a single deployment you control.

This guide walks you through setting up JupyterHub on your own infrastructure, choosing the right authenticator and spawner, configuring resource limits, and comparing it to hosted alternatives.

Why Self-Host Your Notebook Platform

When you hand your data team access to a notebook environment, you’re giving them access to arbitrary code execution. Here’s why running your own JupyterHub instance beats paying for a hosted service:

Data Stays On Your Infrastructure

Notebook work often involves proprietary datasets, customer data, or unreleased research. Self-hosting ensures raw data, intermediate results, and trained models never leave your network. There are no third-party data processing agreements to negotiate, no audit trails pointing to external servers, and no risk of a vendor breach exposing your data.

Full Control Over the Environment

Hosted notebook platforms lock you into specific Python versions, pre-installed libraries, and resource tiers. With a self-hosted JupyterHub, you decide:

  • Which Python versions and package repositories are available
  • Whether users get GPU access and which GPU models
  • How much RAM and CPU each user can consume
  • Which system-level tools ( compilers, database clients, system libraries) are installed
  • Network policies — can notebooks reach the internet, internal databases, or both

Cost Efficiency at Scale

Hosted notebook services charge per compute hour or per user seat. At small scale this seems affordable, but costs compound quickly:

Team SizeHosted (est. monthly)Self-Hosted (VPS)
5 users$150 – $300$40 – $80
20 users$600 – $1,500$80 – $200
50 users$1,500 – $4,000$150 – $400

Self-hosting requires upfront configuration effort, but the marginal cost per additional user is near zero. A single 8-core, 32 GB RAM VPS at ~$80/month comfortably handles 20–30 concurrent notebook users with reasonable resource limits.

Compliance and Auditing

If your organization operates under GDPR, HIPAA, SOC 2, or similar frameworks, self-hosting gives you direct control over:

  • Access logs and authentication records
  • Data encryption at rest and in transit
  • Backup and retention policies
  • Network segmentation and firewall rules

What Is JupyterHub?

JupyterHub is a multi-user hub that manages and proxies multiple instances of the single-user Jupyter Notebook server. It was originally developed by the Jupyter Project team (Project Jupyter) and is open source under a modified BSD license.

Architecture

JupyterHub consists of three main components:

  1. Hub — The central management server. Handles authentication, spawns user servers, and tracks running instances. Built with Python using Tornado as the async web framework.

  2. Proxy — Routes HTTP requests between users and their individual notebook servers. The default proxy is configurable-http-proxy (CHP), a Node.js-based reverse proxy.

  3. Single-User Notebook Servers — One per authenticated user. Each runs in its own process or container with isolated filesystem, environment variables, and resource limits.

The flow works like this: a user visits your JupyterHub URL → authenticates → the Hub spawns a single-user server → the proxy routes all subsequent requests to that server.

Spawners: How Servers Are Launched

The spawner is the most important configuration decision. It determines how individual notebook servers are launched and isolated:

SpawnerIsolationBest ForResource Overhead
LocalProcessSpawnerNone (same OS user or separate Unix users)Single-server developmentMinimal
[docker](https://www.docker.com/)SpawnerDocker containersTeams sharing one hostLow–kubernetes KubeSpawner
SystemdSpawnersystemd servicesLinux servers with systemdLow
SSHSpawnerSSH to remote hostsDistributed compute clustersVariable

For most teams starting out, DockerSpawner is the sweet spot — it gives container isolation without the complexity of Kubernetes.

Authenticators: How Users Log In

The authenticator controls who can access your JupyterHub:

AuthenticatorUse CaseSetup Complexity
DummyAuthenticatorTesting onlyTrivial
FirstUseAuthenticatorSimple self-signupLow
OAuthenticator (GitHub, Google, GitLab)Organizations with SSOMedium
LDAPAuthenticatorCorporate LDAP/Active DirectoryMedium–High
NativeAuthenticatorSelf-managed user databaseLow
PAMAuthenticator (default)System user accountsLow

Docker Compose Deployment Guide

The fastest way to get JupyterHub running is with Docker Compose. This setup uses DockerSpawner for container isolation and FirstUseAuthenticator for simple user management.

Prerequisites

  • A Linux server (Ubuntu 22.04+, Debian 12+, or equivalent)
  • Docker Engine and Docker Compose installed
  • A domain name pointing to your server (for TLS)
  • At least 4 GB RAM (8 GB recommended for 5+ concurrent users)

Step 1: Create the Project Directory

1
2
mkdir -p ~/jupyterhub/data
cd ~/jupyterhub

Step 2: Write the Docker Compose File

Create docker-compose.yml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
version: "3.8"

services:
  jupyterhub:
    image: jupyterhub/jupyterhub:latest
    container_name: jupyterhub
    restart: unless-stopped
    ports:
      - "8000:8000"
    volumes:
      - ./jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./data:/srv/jupyterhub/data
    environment:
      - DOCKER_NETWORK_NAME=jupyterhub-network
    networks:
      - jupyterhub-network

networks:
  jupyterhub-network:
    name: jupyterhub-network
    driver: bridge

Step 3: Write the JupyterHub Configuration

Create jupyterhub_config.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import os

# --- Networking ---
c.JupyterHub.ip = "0.0.0.0"
c.JupyterHub.port = 8000

# --- Authenticator: FirstUse (users set password on first login) ---
c.JupyterHub.authenticator_class = "firstuseauthenticator.FirstUseAuthenticator"
c.FirstUseAuthenticator.create_users = False  # Admin must pre-create users

# --- Spawner: Docker ---
c.JupyterHub.spawner_class = "dockerspawner.DockerSpawner"
c.DockerSpawner.image = "jupyter/scipy-notebook:latest"
c.DockerSpawner.network_name = os.environ["DOCKER_NETWORK_NAME"]
c.DockerSpawner.remove = True  # Delete container on stop (data persists in volumes)

# --- Per-User Volumes ---
c.DockerSpawner.volumes = {
    "jupyterhub-user-{username}": "/home/jovyan/work"
}

# --- Resource Limits ---
c.DockerSpawner.mem_limit = "2G"
c.DockerSpawner.cpu_limit = 2.0
c.DockerSpawner.extra_host_config = {
    "memswap_limit": "2G",  # Prevent swap usage beyond RAM limit
}

# --- Admin Users ---
c.Authenticator.admin_users = {"admin"}

# --- Pre-Spawn Hook: Pull image before spawning ---
c.DockerSpawner.pull_policy = "always"

This configuration creates a production-ready setup with:

  • Container isolation: Each user gets their own Docker container
  • Persistent storage: User notebooks survive container recreation via Docker volumes
  • Resource limits: 2 GB RAM and 2 CPU cores per user
  • No swap: Prevents a single user from consuming swap and starving others
  • Automatic image pulls: Always uses the latest notebook image

Step 4: Install Required Python Packages in the Hub

The Hub container needs dockerspawner and firstuseauthenticator. Create a Dockerfile:

1
2
3
4
5
FROM jupyterhub/jupyterhub:latest

RUN pip install --no-cache-dir \
    dockerspawner \
    firstuseauthenticator

Update the compose service to use this Dockerfile:

1
2
3
4
5
6
services:
  jupyterhub:
    build:
      context: .
      dockerfile: Dockerfile
    # ... rest stays the same

Step 5: Create Admin User

1
2
3
4
docker compose run jupyterhub bash -c \
  "python -c \"from firstuseauthenticator import FirstUseAuthenticator; \
  a = FirstUseAuthenticator(); \
  a.add_user('admin')\""

Step 6: Launch

1
docker compose up -d

Visit http://your-server-ip:8000 and log in as admin with a password of your choice.

Step 7: Add TLS with a Reverse Proxy

For production, put JupyterHub behind a reverse proxy with TLS termination. Here’s an Nginx configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
server {
    listen 443 ssl http2;
    server_name notebooks.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/notebooks.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/notebooks.yourdomain.com/privkey.pem;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support (required for Jupyter kernel communication)
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        # Increase timeouts for long-running notebook operations
        proxy_read_timeout 86400s;
        proxy_send_timeout 86400s;
    }
}

Update jupyterhub_config.py to be aware of the proxy:

1
2
3
c.JupyterHub.bind_url = "https://notebooks.yourdomain.com/"
c.JupyterHub.port = 8000
c.JupyterHub.ip = "127.0.0.1"  # Only listen on localhost

Restart JupyterHub after updating the configuration:

1
docker compose restart

Managing Users and Environments

Creating and Removing Users

With FirstUseAuthenticator, you can pre-create user accounts:

1
2
# In jupyterhub_config.py
c.Authenticator.allowed_users = {"alice", "bob", "charlie"}

Users in this set can log in. Users not in this set will be rejected, even if they guess a valid username/password pair.

To manage users at runtime, use the JupyterHub admin panel at /hub/admin, or the REST API:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# List all users
curl -H "Authorization: token $JUPYTERHUB_API_TOKEN" \
  https://notebooks.yourdomain.com/hub/api/users

# Create a user
curl -X POST \
  -H "Authorization: token $JUPYTERHUB_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "newuser"}' \
  https://notebooks.yourdomain.com/hub/api/users/newuser

# Delete a user
curl -X DELETE \
  -H "Authorization: token $JUPYTERHUB_API_TOKEN" \
  https://notebooks.yourdomain.com/hub/api/users/olduser

Providing Custom Notebook Environments

Different teams need different software stacks. You can offer multiple notebook images and let users choose:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# In jupyterhub_config.py
c.DockerSpawner.image = "jupyter/scipy-notebook:latest"

c.DockerSpawner.extra_create_kwargs = {
    "environment": {
        "GRANT_SUDO": "yes",
    }
}

# Alternative: let users pick from a profile list
c.KubeSpawner.profile_list = [
    {
        "display_name": "Python (SciPy Stack)",
        "slug": "scipy",
        "default": True,
        "kubespawner_override": {
            "image": "jupyter/scipy-notebook:latest",
        },
    },
    {
        "display_name": "R (IRKernel)",
        "slug": "r",
        "kubespawner_override": {
            "image": "jupyter/r-notebook:latest",
        },
    },
    {
        "display_name": "GPU-enabled (PyTorch)",
        "slug": "gpu",
        "kubespawner_override": {
            "image": "jupyter/pytorch-notebook:latest",
            "extra_resource_limits": {"nvidia.com/gpu": "1"},
        },
    },
]

Building a Custom Notebook Image

For teams with specific requirements, build your own image:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
FROM jupyter/scipy-notebook:latest

# Install system dependencies
USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
    libpq-dev \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Install Python packages
USER $NB_USER
RUN pip install --no-cache-dir \
    psycopg2-binary \
    sqlalchemy \
    pandas-gbq \
    plotly \
    seaborn \
    scikit-learn \
    xgboost \
    lightgbm

# Pre-download a large dataset
RUN mkdir -p /home/$NB_USER/data
COPY datasets/ /home/$NB_USER/data/

Build and reference it in your config:

1
docker build -t myorg/notebook:2026.04 .
1
c.DockerSpawner.image = "myorg/notebook:2026.04"

Scaling Beyond a Single Server

Docker Compose works great for teams up to ~30 users. Beyond that, you need Kubernetes.

Kubernetes Deployment with KubeSpawner

First, install JupyterHub on Kubernetes using the official Helm chart:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Add the JupyterHub Helm repository
helm repo add jupyterhub https://hub.jupyter.org/helm-chart/
helm repo update

# Create a values file (config.yaml)
cat > config.yaml << 'EOF'
hub:
  config:
    Authenticator:
      admin_users:
        - admin
      allowed_users:
        - alice
        - bob
    JupyterHub:
      authenticator_class: oauthenticator.github.GitHubOAuthenticator
    GitHubOAuthenticator:
      client_id: "YOUR_GITHUB_CLIENT_ID"
      client_secret: "YOUR_GITHUB_CLIENT_SECRET"
      oauth_callback_url: "https://notebooks.yourdomain.com/hub/oauth_callback"

singleuser:
  image:
    name: jupyter/scipy-notebook
    tag: latest
  memory:
    limit: 4G
    guarantee: 1G
  cpu:
    limit: 2
    guarantee: 0.5
  storage:
    dynamic:
      storageClass: standard
    capacity: 10Gi

proxy:
  service:
    type: LoadBalancer
EOF

# Deploy
helm install jhub jupyterhub/jupyterhub \
  --namespace jupyterhub \
  --create-namespace \
  -f config.yaml

KubeSpawner gives you:

  • Horizontal pod autoscaling — spawn pods across multiple nodes
  • Node selectors and tolerations — route GPU workloads to GPU nodes
  • Resource quotas — enforce team-level budgets
  • Persistent volume claims — per-user storage that survives pod restarts

JupyterHub vs Alternatives

FeatureJupyterHub (self-hosted)Google ColabDatabricks CommunitySageMaker Studio
CostServer cost onlyFree tier / $10+/moFree tier / $0.40+/DBUPay per usage
Data locationYour serverGoogle CloudDatabricks cloudAWS
Multi-userYesShared notebooksWorkspace-basedStudio domains
Custom environmentsFull controlLimitedLimitedModerate
GPU accessYour hardwareFree tier limitedPaidPaid
Offline accessYesNoNoNo
API accessFull REST APILimited REST APIREST APIAWS SDK
Persistent storageUnlimited (disk space)12 GB (ephemeral)Workspace storageEFS / S3
Auth integrationOAuth, LDAP, PAM, customGoogle accountDatabricks accountAWS IAM
Kubernetes nativeYes (KubeSpawner)NoNoManaged EKS

When to Choose JupyterHub

  • Your team has 5+ regular notebook users
  • Data cannot leave your infrastructure (compliance, privacy, IP)
  • You need custom software stacks that hosted platforms don’t support
  • You want to manage costs predictably rather than per-seat or per-hour
  • You need integration with existing identity providers (LDAP, GitHub, GitLab)

When a Hosted Service Makes Sense

  • You’re a solo researcher or student with occasional needs
  • You need instant access to high-end GPUs without hardware investment
  • Your team is geographically distributed and you don’t want to manage infrastructure
  • You want zero-maintenance setup and don’t mind vendor lock-in

Advanced Configuration Tips

Pre-Pull Images Across the Cluster

When using multiple nodes, pulling large notebook images on first spawn causes delays. Use the image-puller daemonset:

1
2
3
4
5
6
# Enable continuous image pulling with Helm
helm upgrade jhub jupyterhub/jupyterhub \
  --namespace jupyterhub \
  -f config.yaml \
  --set prePuller.hook.enabled=true \
  --set prePuller.continuous.enabled=true

Automatic Idle Culling

Save resources by stopping idle servers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# In jupyterhub_config.py
c.JupyterHub.services = [
    {
        "name": "cull-idle",
        "admin": True,
        "command": [
            "python", "-m", "jupyterhub_idle_culler",
            "--timeout=3600",       # 1 hour idle timeout
            "--cull-every=300",     # Check every 5 minutes
            "--concurrency=10",     # Process up to 10 at once
        ],
    }
]

Install the culler package:

1
RUN pip install --no-cache-dir jupyterhub-idle-culler

Shared Project Directories

For team collaboration, mount a shared volume:

1
2
3
4
c.DockerSpawner.volumes = {
    "jupyterhub-user-{username}": "/home/jovyan/work",
    "jupyterhub-shared": "/home/jovyan/shared",
}

Create the shared volume:

1
docker volume create jupyterhub-shared

Set permissions so all users can read and write:

1
2
docker run --rm -v jupyterhub-shared:/data alpine \
  sh -c "chmod 777 /data"

Backup and Disaster Recovery

Your JupyterHub data lives in Docker volumes. Back them up regularly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/bash
# backup.sh — Run via cron: 0 2 * * * /opt/jupyterhub/backup.sh

BACKUP_DIR="/backup/jupyterhub/$(date +%Y-%m-%d)"
mkdir -p "$BACKUP_DIR"

# Backup the hub database
docker run --rm \
  -v jupyterhub-data:/data:ro \
  -v "$BACKUP_DIR:/backup" \
  alpine tar czf /backup/hub-data.tar.gz -C /data .

# Backup all user volumes
for vol in $(docker volume ls -q --filter "name=jupyterhub-user-"); do
  docker run --rm \
    -v "$vol:/data:ro" \
    -v "$BACKUP_DIR:/backup" \
    alpine tar czf "/backup/${vol}.tar.gz" -C /data .
done

# Backup the shared volume
docker run --rm \
  -v jupyterhub-shared:/data:ro \
  -v "$BACKUP_DIR:/backup" \
  alpine tar czf /backup/jupyterhub-shared.tar.gz -C /data .

# Keep only the last 7 days of backups
find /backup/jupyterhub -maxdepth 1 -mtime +7 -type d -exec rm -rf {} +

echo "Backup completed: $BACKUP_DIR"

To restore a user’s data:

1
2
3
4
5
docker volume create jupyterhub-user-alice
docker run --rm \
  -v jupyterhub-user-alice:/data \
  -v /backup/jupyterhub/2026-04-14:/source:ro \
  alpine tar xzf /source/jupyterhub-user-alice.tar.gz -C /data

Monitoring and Health Checks

JupyterHub exposes a Prometheus-compatible metrics endpoint. Enable it in your config:

1
2
3
4
5
6
7
8
# In jupyterhub_config.py
c.JupyterHub.load_roles = [
    {
        "name": "prometheus",
        "scopes": ["read:users", "read:servers", "read:services"],
        "services": ["prometheus"],
    }
]

Query the metrics at /hub/metrics:

1
2
curl -H "Authorization: token $PROMETHEUS_TOKEN" \
  https://notebooks.yourdomain.com/hub/metrics

Key metrics to monitor:

MetricWhat It Tells YouAlert Threshold
jupyterhub_running_serversActive notebook instances> 80% of max capacity
jupyterhub_spawn_fails_totalFailed server spawns> 5 in 10 minutes
jupyterhub_auth_failures_totalFailed login attempts> 20 in 5 minutes
jupyterhub_cull_requests_totalIdle servers stoppedInformational

Conclusion

JupyterHub gives you enterprise-grade notebook infrastructure without the enterprise price tag. A single afternoon of setup gives your team persistent, isolated, and fully customizable computational environments — backed by your own storage, your own auth, and your own network policies.

The Docker Compose approach in this guide gets you from zero to running in under an hour. When you’re ready to scale, the Helm chart and KubeSpawner handle the transition to Kubernetes seamlessly. The investment you make in self-hosting pays dividends in data control, cost predictability, and environment flexibility that hosted platforms simply cannot match.

Frequently Asked Questions (FAQ)

Which one should I choose in 2026?

The best choice depends on your specific requirements:

  • For beginners: Start with the simplest option that covers your core use case
  • For production: Choose the solution with the most active community and documentation
  • For teams: Look for collaboration features and user management
  • For privacy: Prefer fully open-source, self-hosted options with no telemetry

Refer to the comparison table above for detailed feature breakdowns.

Can I migrate between these tools?

Most tools support data import/export. Always:

  1. Backup your current data
  2. Test the migration on a staging environment
  3. Check official migration guides in the documentation

Are there free versions available?

All tools in this guide offer free, open-source editions. Some also provide paid plans with additional features, priority support, or managed hosting.

How do I get started?

  1. Review the comparison table to identify your requirements
  2. Visit the official documentation (links provided above)
  3. Start with a Docker Compose setup for easy testing
  4. Join the community forums for troubleshooting
Advertise here