Data analysts, scientists, and engineers need interactive environments where they can write code, run queries, visualize results, and share findings — all in a single document. Self-hosted notebook platforms provide this capability while keeping your data and code within your own infrastructure.

Apache Zeppelin, Polynote, and JupyterLab are three leading open-source notebook platforms, each with distinct strengths. Zeppelin excels at multi-language notebooks with built-in visualizations, Polynote offers seamless Scala/Python interoperability, and JupyterLab provides the most extensible notebook experience with hundreds of kernels and extensions.

In this guide, we compare all three platforms with Docker Compose configurations, feature comparisons, and deployment instructions.

What Are Self-Hosted Data Notebooks?

Notebook platforms combine executable code cells with rich text, visualizations, and interactive widgets into a single document. Unlike traditional IDEs, notebooks let you run code incrementally, inspect intermediate results, and create shareable reports — all in a web-based interface.

Self-hosting a notebook platform gives you:

  • Full data control — your datasets, models, and analysis results never leave your network
  • Custom kernel support — install any language runtime or data processing library
  • Team collaboration — share notebooks, results, and dashboards across your organization
  • Integration with existing tools — connect to your databases, data warehouses, and version control systems

Apache Zeppelin

Apache Zeppelin (6,620+ stars) is a web-based notebook platform that supports multiple language backends called “interpreters.” It’s designed for data engineering teams who need to work with Spark, SQL, Python, and other languages in a single notebook.

Key Features

  • Multi-interpreter support — Spark, SQL, Python, Shell, Markdown, and 20+ built-in interpreters
  • Built-in visualizations — bar charts, pie charts, line graphs, and tables without external libraries
  • Collaborative editing — multiple users can edit the same notebook simultaneously
  • Paragraph scheduling — run notebook cells in parallel or on a schedule
  • Native Spark integration — first-class Apache Spark support with automatic context management

Docker Compose Setup

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
version: "3.8"
services:
  zeppelin:
    image: apache/zeppelin:0.11.1
    ports:
      - "8080:8080"
    volumes:
      - ./notebooks:/notebook
      - ./conf:/zeppelin/conf
    environment:
      - ZEPPELIN_LOG_DIR=/zeppelin/logs
      - ZEPPELIN_NOTEBOOK_DIR=/notebook
    restart: unless-stopped

Configuration

conf/zeppelin-site.xml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
<configuration>
  <property>
    <name>zeppelin.server.addr</name>
    <value>0.0.0.0</value>
  </property>
  <property>
    <name>zeppelin.server.port</name>
    <value>8080</value>
  </property>
  <property>
    <name>zeppelin.interpreter.connect.timeout</name>
    <value>60000</value>
  </property>
</configuration>

Pros and Cons

ProsCons
Best-in-class Spark integrationWeb UI feels dated compared to JupyterLab
Built-in chart visualizationsSmaller extension ecosystem
Collaborative editing out of the boxInterpreter management can be complex
Multiple language support per notebookLess active development than Jupyter

Polynote

Polynote (4,590+ stars) by Netflix is a polyglot notebook designed specifically for Scala and Python interoperability. It addresses a common pain point: Scala developers who need to use Python libraries (or vice versa) without context-switching between environments.

Key Features

  • Seamless Scala/Python interop — use Python libraries directly from Scala cells and share variables
  • Dependency management — declare Maven, PyPI, and Ivy dependencies in the notebook metadata
  • Built-in data visualizations — pandas DataFrame rendering, Vega-Lite charts
  • Kernel isolation — each notebook runs in its own process for stability
  • Versioned notebook state — inspect intermediate results from previous cell executions

Docker Compose Setup

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
version: "3.8"
services:
  polynote:
    image: polynote/polynote:latest
    ports:
      - "8192:8192"
    volumes:
      - ./notebooks:/notebooks
      - ./config.yml:/opt/polynote/config.yml
    environment:
      - POLYNOTE_HOME=/opt/polynote
    restart: unless-stopped

config.yml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
listen:
  host: 0.0.0.0
  port: 8192

storage:
  dir: /notebooks
  mounts:
    notebooks: /notebooks

dependencies:
  maven:
    - org.apache.spark:spark-sql_2.12:3.5.0
  python:
    - pandas
    - matplotlib
    - numpy

Pros and Cons

ProsCons
Unique Scala/Python interoperabilitySmaller community, less active development
Clean, modern web interfaceLimited to Scala and Python
Built-in dependency resolutionNo collaborative editing
Great for JVM-based data teamsFewer visualization options than Zeppelin

JupyterLab

JupyterLab (15,100+ stars) is the most widely adopted notebook platform, evolved from the classic Jupyter Notebook. It provides a flexible, extensible IDE-like interface with support for hundreds of language kernels, extensions, and integrations.

Key Features

  • 100+ language kernels — Python, R, Julia, Scala, JavaScript, Go, and many more
  • Extension ecosystem — 1,000+ JupyterLab extensions for Git, debuggers, and more
  • Multi-panel interface — editors, terminals, consoles, and notebooks in a single window
  • Real-time collaboration — multiple users can edit the same notebook simultaneously (with Jupyter Server extension)
  • Interactive widgets — ipywidgets, Voila dashboards, and custom UI components

Docker Compose Setup

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
version: "3.8"
services:
  jupyterlab:
    image: jupyter/scipy-notebook:latest
    ports:
      - "8888:8888"
    volumes:
      - ./notebooks:/home/jovyan/work
    environment:
      - JUPYTER_ENABLE_LAB=yes
      - GRANT_SUDO=yes
    command: >
      start.sh jupyter lab
      --NotebookApp.token=''
      --ServerApp.root_dir=/home/jovyan/work
    restart: unless-stopped

Configuration

jupyter_server_config.py:

1
2
3
4
5
c.ServerApp.ip = '0.0.0.0'
c.ServerApp.port = 8888
c.ServerApp.open_browser = False
c.ServerApp.allow_origin = '*'
c.ContentsManager.allow_hidden = True

Pros and Cons

ProsCons
Largest ecosystem of kernels and extensionsCan feel overwhelming for new users
Industry standard for data scienceResource-heavy for simple tasks
Excellent Python and R supportMulti-language notebooks less seamless than Zeppelin
Active development and communityRequires more setup for Spark integration

Comparison Table

FeatureApache ZeppelinPolynoteJupyterLab
GitHub Stars6,620+4,590+15,100+
Primary LanguagesScala, Python, SQL, ShellScala, PythonAny (100+ kernels)
Multi-Language Per Notebook✅ Native✅ Native (Scala+Python)✅ Via kernels
Built-in Visualizations✅ Charts, tables✅ Vega-Lite, pandas✅ Via matplotlib/plotly
Collaborative Editing✅ Built-in✅ With extension
Spark Integration✅ First-class✅ Good⚠️ Requires setup
Extension EcosystemLimitedMinimal1,000+ extensions
Dependency ManagementInterpreter configMaven + PyPIpip/conda per kernel
Real-time Collaboration
Docker Image Size~1.5 GB~800 MB~7 GB (SciPy)
Best ForData engineering teamsScala/Python teamsGeneral data science

Choosing the Right Notebook Platform

Choose Apache Zeppelin if: Your team works heavily with Apache Spark and needs built-in SQL, Python, and Shell support in a single notebook. The collaborative editing and built-in visualizations make it ideal for data engineering workflows.

Choose Polynote if: Your team uses both Scala and Python and needs seamless interoperability between them. The ability to share variables across language boundaries without serialization overhead is Polynote’s standout feature.

Choose JupyterLab if: You need the most flexible, extensible notebook platform with support for virtually any programming language. The massive extension ecosystem and industry-standard status make it the safest choice for most data science teams.

Why Self-Host Your Data Notebook Platform?

Cloud notebook services (Google Colab, Databricks notebooks, SageMaker) offer convenience but come with data transfer costs, usage limits, and vendor lock-in. Self-hosting keeps your data, code, and compute resources under your control.

Data governance — sensitive datasets, customer information, and proprietary models never leave your infrastructure. This is essential for organizations with strict data residency or compliance requirements.

Unlimited compute — cloud notebooks impose memory, CPU, and runtime limits. Self-hosted platforms can use your full cluster capacity for long-running analyses, large dataset processing, and model training.

Custom environment — install any library, connect to any database, and configure any tool your team needs without waiting for cloud platform updates or dealing with sandbox restrictions.

For related data science infrastructure, see our JupyterHub multi-user platform guide and our ELN lab notebook comparison.

FAQ

Can Zeppelin and JupyterLab run side by side?

Yes. They use different default ports (Zeppelin: 8080, JupyterLab: 8888) and can run on the same server. You can also route both through a reverse proxy with different path prefixes for unified access.

Does Polynote support languages other than Scala and Python?

Polynote is specifically designed for Scala and Python interoperability. It doesn’t support other languages natively. If you need R, Julia, or other kernels, JupyterLab is a better choice.

How do I add custom Python libraries to JupyterLab?

If you’re using the official Jupyter Docker stacks, you can extend the image with a Dockerfile that installs additional packages via pip install. Alternatively, run !pip install package-name directly in a notebook cell for temporary installations.

Can Zeppelin connect to external databases?

Yes. Zeppelin has built-in JDBC interpreters that can connect to PostgreSQL, MySQL, Oracle, and other databases. You need to place the appropriate JDBC driver JAR files in Zeppelin’s interpreter/jdbc/ directory.

Is JupyterLab suitable for production dashboards?

JupyterLab itself is an interactive development environment. For production dashboards, consider using Voila (which converts notebooks to standalone web apps) or Panel (which creates interactive dashboards from Python code).

How do I secure a self-hosted notebook platform?

All three platforms support token-based authentication. Zeppelin also supports Shiro and LDAP authentication. JupyterLab can integrate with OAuth providers. Always run behind a reverse proxy with TLS, and restrict access to your internal network or VPN.