Self-Hosted Hyperparameter Optimization: Optuna vs Ray Tune vs Hyperopt Guide 2026

Finding the right hyperparameters for a model can make the difference between mediocre and state-of-the-art performance. Manual tuning is slow and error-prone. Automated hyperparameter optimization (HPO) frameworks systematically search the parameter space, finding better configurations with fewer training runs.

This guide compares the three leading open-source HPO frameworks for self-hosted deployment: Optuna (Preferred Networks), Ray Tune (Anyscale/Ray project), and Hyperopt (Montreal Institute for Learning Algorithms). We’ll cover search algorithms, distributed execution, Docker deployment configurations, and help you choose the right tool for your training infrastructure.

For the training frameworks that these HPO tools optimize, see our distributed training comparison. If you need to track HPO experiments alongside training runs, our experiment tracking guide covers the best tools. And for monitoring GPU utilization during HPO sweeps, check our GPU monitoring guide.

Why Automated Hyperparameter Optimization

Hyperparameters — learning rates, batch sizes, network depths, regularization strengths — control how a model learns. Getting them right is critical, but the search space is often enormous. A neural network with 10 tunable hyperparameters, each with just 5 candidate values, has nearly 10 million possible combinations.

Automated HPO frameworks address this challenge through:

Intelligent search — Bayesian optimization and tree-structured Parzen estimators find good configurations faster than grid search
Early stopping — Pruning bad trials mid-training saves compute resources
Parallel execution — Running multiple configurations simultaneously across GPUs or nodes
Reproducibility — Tracking every trial’s parameters, metrics, and outcomes

Optuna: Tree-Structured Bayesian Optimization

Optuna is a lightweight, Python-native HPO framework that uses Tree-Structured Parzen Estimator (TPE) for efficient parameter search. Its define-by-run API makes it easy to express complex search spaces with conditional parameters.

Architecture

Optuna uses a central Study object that manages trials. Each trial samples hyperparameters using TPE, which builds probability distributions over good and bad parameter configurations and samples from the better region. Optuna also supports CMA-ES (Covariance Matrix Adaptation Evolution Strategy) for continuous parameter spaces.

1
2
3
pip install optuna
# For dashboard visualization
pip install optuna-dashboard

Docker Deployment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# docker-compose-optuna.yml
version: "3.8"

services:
  optuna-storage:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: optuna
      POSTGRES_USER: optuna
      POSTGRES_PASSWORD: optuna_secret
    volumes:
      - optuna-db:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  optuna-dashboard:
    image: ghcr.io/optuna/optuna-dashboard:v0.15.0
    ports:
      - "8080:8080"
    command: "postgresql+psycopg2://optuna:optuna_secret@optuna-storage:5432/optuna"
    depends_on:
      - optuna-storage

  optuna-worker:
    image: python:3.11-slim
    volumes:
      - ./training-scripts:/workspace
      - /data/datasets:/datasets:ro
    working_dir: /workspace
    command: >
      bash -c "pip install optuna torch &&
      python optimize.py --storage postgresql+psycopg2://optuna:optuna_secret@optuna-storage:5432/optuna
      --study-name my-study"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

volumes:
  optuna-db:

Search Space Definition

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import optuna

def objective(trial):
    lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
    batch_size = trial.suggest_categorical("batch_size", [32, 64, 128, 256])
    n_layers = trial.suggest_int("n_layers", 2, 8)

    # Conditional parameter
    if n_layers > 4:
        dropout = trial.suggest_float("dropout", 0.1, 0.5)
    else:
        dropout = 0.0

    return train_model(lr, batch_size, n_layers, dropout)

study = optuna.create_study(
    direction="minimize",
    sampler=optuna.samplers.TPESampler(seed=42),
    pruner=optuna.pruners.MedianPruner(n_startup_trials=10),
    storage="postgresql://optuna:secret@localhost:5432/optuna",
    study_name="image-classifier",
    load_if_exists=True,
)
study.optimize(objective, n_trials=200, n_jobs=1)

Key Strengths

Define-by-run API for dynamic search spaces
Built-in pruning (Median, Hyperband, Patient, Percentile)
Web dashboard for real-time visualization
PostgreSQL/MySQL/SQLite storage backends
Multi-objective optimization
Integration with PyTorch, TensorFlow, scikit-learn, XGBoost, LightGBM

Ray Tune: Distributed Hyperparameter Search at Scale

Ray Tune is part of the Ray distributed computing ecosystem. It provides scalable hyperparameter search with support for multiple search algorithms, schedulers, and seamless integration with the broader Ray ecosystem (data processing, serving, reinforcement learning).

Architecture

Ray Tune runs on top of the Ray core distributed runtime. It manages trials across a cluster of nodes, each running on separate CPUs or GPUs. Tune integrates with external search algorithms (Optuna, Hyperopt, Ax, Nevergrad, BOHB) as drop-in samplers.

1
2
3
pip install "ray[tune]"
# With Optuna sampler
pip install "ray[tune]" optuna

Docker Deployment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# docker-compose-raytune.yml
version: "3.8"

services:
  ray-head:
    image: rayproject/ray:2.31.0-gpu
    ports:
      - "6379:6379"
      - "8265:8265"  # Ray Dashboard
    command: >
      ray start --head
      --port=6379
      --dashboard-host=0.0.0.0
      --num-cpus=8
      --num-gpus=4
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 4
              capabilities: [gpu]
    volumes:
      - ./training-scripts:/workspace
      - /data/datasets:/datasets:ro
    environment:
      - RAY_DEDUP_LOGS=0

  ray-worker:
    image: rayproject/ray:2.31.0-gpu
    command: >
      ray start
      --address=ray-head:6379
      --num-cpus=8
      --num-gpus=4
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 4
              capabilities: [gpu]
    volumes:
      - /data/datasets:/datasets:ro
    depends_on:
      - ray-head

Search Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from ray import tune
from ray.tune.schedulers import AsyncHyperBandScheduler
from ray.tune.search.optuna import OptunaSearch

search_alg = OptunaSearch(
    metric="val_loss",
    mode="min",
)

scheduler = AsyncHyperBandScheduler(
    max_t=100,
    grace_period=10,
    reduction_factor=3,
)

tuner = tune.Tuner(
    train_fn,
    param_space={
        "lr": tune.loguniform(1e-5, 1e-1),
        "batch_size": tune.choice([32, 64, 128, 256]),
        "n_layers": tune.randint(2, 8),
    },
    tune_config=tune.TuneConfig(
        search_alg=search_alg,
        scheduler=scheduler,
        num_samples=200,
        resources_per_trial={"cpu": 2, "gpu": 1},
    ),
    run_config=tune.RunConfig(
        name="image-classifier-tune",
        storage_path="/workspace/ray_results",
    ),
)
results = tuner.fit()

Key Strengths

Massive scalability — thousands of trials across hundreds of nodes
Built-in cluster management with Ray autoscaler
Multiple scheduler algorithms (HyperBand, ASHA, PBT, FIFO)
Integrates with Optuna, Hyperopt, Ax, Nevergrad, BOHB as search backends
Population-Based Training (PBT) for dynamic hyperparameter adjustment
Ray Dashboard for real-time monitoring
Native integration with Ray Serve for model serving

Hyperopt: Bayesian Optimization Foundation

Hyperopt is one of the earliest HPO frameworks, implementing Sequential Model-Based Optimization (SMBO) with Tree-Structured Parzen Estimators. It provides the foundational algorithms that many newer frameworks build upon.

Architecture

Hyperopt uses a fmin interface where you define an objective function and a search space using its domain-specific language (DSL). The TPE algorithm maintains two distributions — one over parameters that yielded good results and one over all parameters — and samples from the ratio of these distributions.

1
2
3
pip install hyperopt
# For MongoDB storage
pip install hyperopt[mongo]

Docker Deployment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# docker-compose-hyperopt.yml
version: "3.8"

services:
  mongodb:
    image: mongo:7
    ports:
      - "27017:27017"
    volumes:
      - hyperopt-db:/data/db
    environment:
      MONGO_INITDB_DATABASE: hyperopt

  hyperopt-mongo-worker:
    image: python:3.11-slim
    volumes:
      - ./training-scripts:/workspace
      - /data/datasets:/datasets:ro
    working_dir: /workspace
    command: >
      bash -c "pip install hyperopt[pymongo] torch &&
      hyperopt-mongo-worker
      --mongo=mongodb:27017/hyperopt
      --poll-interval=0.1"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    depends_on:
      - mongodb

  hyperopt-search:
    image: python:3.11-slim
    volumes:
      - ./training-scripts:/workspace
    working_dir: /workspace
    command: >
      bash -c "pip install hyperopt[pymongo] torch &&
      python search.py --mongo mongodb:27017/hyperopt"
    depends_on:
      - mongodb

volumes:
  hyperopt-db:

Search Space Definition

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from hyperopt.mongoexp import MongoTrials

space = {
    "lr": hp.loguniform("lr", -11.5, -2.3),  # 1e-5 to 1e-1
    "batch_size": hp.choice("batch_size", [32, 64, 128, 256]),
    "n_layers": hp.quniform("n_layers", 2, 8, 1),
    "dropout": hp.uniform("dropout", 0.0, 0.5),
}

def objective(params):
    params["batch_size"] = int(params["batch_size"])
    params["n_layers"] = int(params["n_layers"])

    loss = train_model(
        lr=params["lr"],
        batch_size=params["batch_size"],
        n_layers=params["n_layers"],
        dropout=params["dropout"],
    )
    return {"loss": loss, "status": STATUS_OK}

# MongoDB-based distributed trials
trials = MongoTrials("mongo://mongodb:27017/hyperopt/jobs", exp_key="exp1")

best = fmin(
    fn=objective,
    space=space,
    algo=tpe.suggest,
    max_evals=200,
    trials=trials,
    rstate=np.random.default_rng(42),
)

Key Strengths

Lightweight, minimal dependencies
TPE algorithm — the foundation for Optuna and Ray Tune’s Optuna integration
MongoDB-based distributed execution
Simple search space DSL
Random search baseline for comparison
Integration with scikit-learn, Keras, PyTorch

Comparison Table

Feature	Optuna	Ray Tune	Hyperopt
Search Algorithms	TPE, CMA-ES, Grid, Random	TPE (via Optuna), Hyperopt, Ax, Nevergrad, BOHB, Random, Grid	TPE, Random
Pruning/Scheduling	Median, Hyperband, Patient, Percentile	ASHA, HyperBand, PBT, FIFO, Median	None (manual)
Distributed Execution	RDB-based (PostgreSQL, MySQL)	Ray cluster (native distributed)	MongoDB-based
Web Dashboard	Optuna Dashboard	Ray Dashboard (built-in)	None
Multi-Objective	Yes (NSGA-II, Motpe)	Yes	No
Search Space API	Define-by-run (Python)	Dict-based (tune.*)	DSL (hp.*)
Framework Integration	PyTorch, TF, sklearn, XGBoost, LightGBM	PyTorch, TF, sklearn, XGBoost, JAX, RLlib	PyTorch, TF, Keras, sklearn
Population-Based Training	No	Yes (PBT)	No
Cold Start	Random init	Random init	Random init
Ease of Setup	Simple (pip + DB)	Moderate (Ray cluster)	Moderate (MongoDB)
Scalability	Medium (DB-bounded)	High (Ray cluster)	Medium (Mongo-bounded)
Active Maintainer	Preferred Networks	Anyscale / Ray community	MILA / community
GitHub Stars	14,100+	42,400+ (Ray)	7,600+
Last Update	2026-05	2026-05	2026-03

Choosing the Right HPO Framework

Choose Optuna when:

You want the best balance of features and simplicity
You need a web dashboard for monitoring trials
Your search space has conditional parameters (define-by-run API excels here)
You’re running on a single node or small cluster with a shared database
You need multi-objective optimization

Choose Ray Tune when:

You need to run hundreds or thousands of trials across a GPU cluster
You want Population-Based Training for dynamic hyperparameter adjustment
You’re already using Ray for data processing or model serving
You need the flexibility to swap between multiple search algorithms
You want built-in autoscaling for cloud or on-premise clusters

Choose Hyperopt when:

You want a lightweight dependency with minimal setup
You’re implementing custom search algorithms on top of TPE
You need a proven, battle-tested algorithm as a baseline
Your infrastructure already runs MongoDB for other services

Why Self-Host Hyperparameter Optimization

Cost Efficiency: HPO runs many trials, each consuming GPU hours. Cloud GPU costs multiply quickly — 200 trials at 2 hours each on an A100 instance can cost thousands of dollars. Self-hosted clusters amortize hardware costs across thousands of experiments.

Privacy: Hyperparameter search results reveal information about your models and datasets. Self-hosted HPO keeps trial histories, configurations, and performance metrics within your infrastructure.

Unlimited Trials: Cloud providers impose quotas on GPU usage and concurrent instances. Self-hosted clusters let you run HPO sweeps continuously without hitting rate limits or quota walls.

For the broader ML pipeline that HPO fits into, see our ML pipeline orchestration guide for automating the full training-to-deployment workflow, and our feature store comparison for feature engineering before HPO.

FAQ

What is the difference between grid search, random search, and Bayesian optimization?

Grid search exhaustively tries every combination of hyperparameters — efficient only when the search space is tiny. Random search samples randomly from the parameter space and often outperforms grid search with fewer trials because it doesn’t waste computation on irrelevant parameters. Bayesian optimization (TPE, used by all three frameworks) builds a probabilistic model of the objective function and uses it to select the most promising configurations, typically finding better results with fewer trials than random search.

How many trials do I need for effective hyperparameter optimization?

As a starting point, 50-100 trials is usually enough to find a good configuration for most models. For complex models with many hyperparameters, 200-500 trials may be needed. The exact number depends on the search space size and the sensitivity of your model to hyperparameter changes. Use pruning to stop unpromising trials early and save compute.

Can I resume an HPO study after the cluster restarts?

Yes — all three frameworks support persistence. Optuna uses RDB storage (PostgreSQL, MySQL, SQLite) that persists across restarts. Ray Tune saves checkpoints to disk and can resume from the latest state. Hyperopt with MongoDB storage maintains trial state in the database.

Should I use the same HPO framework for all my projects?

Not necessarily. For single-node experiments with quick iterations, Optuna’s simplicity is ideal. For large-scale cluster searches, Ray Tune’s distributed execution is worth the setup complexity. Hyperopt works well as a lightweight baseline or when you need to implement custom search algorithms.

How does pruning save compute during HPO?

Pruning stops trials early when their intermediate metrics are worse than previous trials. For example, if trial #50’s validation loss after 5 epochs is higher than the median of the best 50% of trials at epoch 5, it gets terminated. This can save 30-60% of compute by not wasting GPU hours on configurations that clearly won’t converge to good results.

Can I use HPO with distributed training frameworks?

Yes — HPO and distributed training are complementary. Each HPO trial can itself use distributed training (e.g., DeepSpeed or FSDP across multiple GPUs). However, this multiplies resource requirements. A common pattern is to use single-GPU trials for HPO to find good hyperparameters, then scale up to distributed training for the final model training run.

What is Population-Based Training (PBT) and when should I use it?

PBT runs multiple trials in parallel and periodically replaces underperforming configurations with perturbed copies of better ones. Unlike standard HPO where each trial runs independently, PBT allows trials to learn from each other. It’s particularly useful when you want to find not just good hyperparameters but also good learning rate schedules or training dynamics. Only Ray Tune supports PBT natively.

Why Automated Hyperparameter Optimization

Optuna: Tree-Structured Bayesian Optimization

Architecture

Docker Deployment

Search Space Definition

Key Strengths

Ray Tune: Distributed Hyperparameter Search at Scale

Architecture

Docker Deployment

Search Configuration

Key Strengths

Hyperopt: Bayesian Optimization Foundation

Architecture

Docker Deployment

Search Space Definition

Key Strengths

Comparison Table

Choosing the Right HPO Framework

Why Self-Host Hyperparameter Optimization

FAQ

What is the difference between grid search, random search, and Bayesian optimization?

How many trials do I need for effective hyperparameter optimization?

Can I resume an HPO study after the cluster restarts?

Should I use the same HPO framework for all my projects?

How does pruning save compute during HPO?

Can I use HPO with distributed training frameworks?

What is Population-Based Training (PBT) and when should I use it?

Related Posts

Self-Hosted Distributed Training: Horovod vs DeepSpeed vs PyTorch FSDP Guide 2026

Self-Hosted GPU Management in Kubernetes: NVIDIA GPU Operator vs Container Toolkit vs Volcano Guide 2026

Dive vs SlimToolkit vs Hadolint: Container Image Optimization Guide 2026