← Back to posts
comparison guide self-hosted · · 14 min read

Feast vs Featureform vs Hopsworks: Best Self-Hosted ML Feature Store 2026

Compare Feast, Featureform, and Hopsworks — the top open source ML feature stores for 2026. Self-hosting guide with Docker, Helm, and Kubernetes deployment examples.

OS
Editorial Team

A feature store is a centralized platform that manages, stores, and serves machine learning features for both training and inference. It solves one of the most common pain points in production ML: the gap between how features are computed during experimentation versus how they are served in production. Without a feature store, data science teams often rebuild feature pipelines from scratch for every model, leading to training-serving skew, duplicated effort, and inconsistent results.

In this guide, we compare three leading open source feature store platforms — Feast, Featureform, and Hopsworks — covering architecture, deployment options, supported backends, and real-world suitability. Whether you are running a small team with a handful of models or an enterprise ML platform with hundreds of features, this comparison will help you choose the right tool.

Why Self-Host a Feature Store?

Third-party feature store solutions exist (Tecton, AWS SageMaker Feature Store, Databricks Feature Store), but self-hosting offers several advantages:

  • Data sovereignty — features never leave your infrastructure, critical for regulated industries (healthcare, finance)
  • Cost control — no per-feature or per-request pricing; you pay only for your own compute
  • Customization — integrate with existing data stacks (PostgreSQL, Redis, Kafka, S3) without vendor lock-in
  • Low-latency serving — deploy the online store close to your inference service, reducing round-trip time
  • Full observability — log feature access, monitor drift, and audit feature lineage using your own monitoring stack

For teams already running self-hosted ML infrastructure — experiment tracking, model registries, and data pipelines — a self-hosted feature store is the natural next piece. If you are already using tools like MLflow for ML experiment tracking, integrating a feature store completes the MLOps pipeline.

What Is a Feature Store?

A feature store provides three core capabilities:

  1. Feature engineering — define features once using declarative schemas (entities, transformations, time windows)
  2. Offline store — historical feature values stored in a data warehouse or lake for model training and backfilling
  3. Online store — low-latency key-value store (Redis, DynamoDB) for real-time feature retrieval during inference

Additional features include feature versioning, point-in-time correctness (avoiding data leakage), feature sharing across teams, and monitoring for feature drift.

Feast: The Open Source Feature Store Pioneer

Repository: feast-dev/feast Stars: 6,974 | Language: Python | Last Active: April 2026

Feast (Feature Store) is the most widely adopted open source feature store. Originally developed by Gojek and Tecton, it is now maintained under the Linux Foundation’s LF AI & Data umbrella. Feast provides a Python SDK for defining features and supports a wide range of offline and online store backends.

Architecture

Feast follows a decoupled architecture:

  • Feature definitions — written in Python using the Feast SDK (entities, features, data sources)
  • Offline store — stores historical feature data (PostgreSQL, BigQuery, Snowflake, Redshift, DuckDB, Parquet files)
  • Online store — serves features at low latency (Redis, SQLite, Datastore, DynamoDB, Snowflake)
  • Registry — metadata store (file-based or database-backed) that tracks feature definitions and versions

Installation and Setup

Install Feast via pip:

1
pip install feast

Initialize a new Feast repository:

1
2
feast init my_feature_repo
cd my_feature_repo

This creates a basic project structure with feature_store.yaml, a data/ directory, and a sample feature definition file.

Configuration

The feature_store.yaml defines your project’s backend connections:

1
2
3
4
5
6
7
8
project: my_project
registry: data/registry.db
provider: local
online_store:
    type: redis
    connection_string: "localhost:6379"
offline_store:
    type: duckdb

For production deployments on kubernetes, Feast provides Helm charts:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# values.yaml for Feast Helm chart
feast:
  core:
    registry:
      path: "s3://my-bucket/registry.db"
  online:
    type: redis
    host: "redis-master.default.svc.cluster.local"
    port: 6379
  offline:
    type: bigquery

Deploy with Helm:

1
2
helm repo add feast https://feast-dev.github.io/feast/charts
helm install feast feast/feast-core -f values.yaml

Defining Features

Here is how you define a feature set in Feast:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
from feast import Entity, FeatureService, FeatureView, Field
from feast.types import Float32, Int64, String
from feast.infra.offline_stores.duckdb import DuckDBSource

driver = Entity(name="driver", join_keys=["driver_id"])

driver_stats = FeatureView(
    name="driver_stats",
    entities=[driver],
    schema=[
        Field(name="driver_id", dtype=Int64),
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64),
    ],
    source=driver_stats_source,
    online=True,
    ttl=timedelta(hours=2),
)

Materialize features to the online store:

1
feast materialize-incremental $(date -u +%Y-%m-%dT%H:%M:%S)

Strengths

  • Largest ecosystem — most stars, contributors, and community support of any open source feature store
  • Flexible backends — supports the widest range of offline and online store combinations
  • Python-native — integrates seamlessly with pandas, scikit-learn, PyTorch, and TensorFlow workflows
  • Point-in-time correctness — built-in support for historical feature retrieval with correct timestamps
  • LF AI & Data project — governed by a neutral foundation, reducing vendor risk

Limitations

  • No built-in UI — feature management is entirely code-driven; no web dashboard out of the box
  • No native model training integration — you need external tools (MLflow, Kubeflow) for the training pipeline
  • Kubernetes complexity — Helm deployment requires familiarity with K8s concepts
  • No built-in feature monitoring — drift detection requires integration with external monitoring tools

Featureform: The Virtual Feature Store

Repository: featureform/featureform Stars: 1,969 | Language: Go | Last Active: July 2025

Featureform takes a fundamentally different approach: instead of moving data into its own storage layer, it acts as a virtual feature store that sits on top of your existing data infrastructure. You define features as transformations over data sources you already have (PostgreSQL, Snowflake, BigQuery, Redis, Spark), and Featureform orchestrates the computation and retrieval without duplicating the data.

Architecture

Featureform’s architecture centers on a provider model:

  • Providers — connectors to your existing data infrastructure (offline providers for training data, online providers for serving)
  • Orchestrator — schedules and executes feature computations using your compute infrastructure (Spark, Ray, local)
  • API layer — REST/gRPC interface for feature retrieval during inference
  • No data movement — features stay in your existing databases; Featureform manages metadata and orchestration only

Installation

Featurefdockerns as a binary or Docker container:

1
2
# Using the official installer
curl -sfL https://install.featureform.com | sh

Or via Docker Compose:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
version: "3.8"

services:
  featureform:
    image: featureformcom/featureform:latest
    ports:
      - "6565:6565"  # gRPC API
      - "8080:8080"  # HTTP API
    environment:
      - FEATUREFORM_PROVIDER_TYPE=redis
      - FEATUREFORM_PROVIDER_CONNECTION_STRING=redis:6379
      - FEATUREFORM_OFFLINE_PROVIDER_TYPE=postgres
      - FEATUREFORM_OFFLINE_PROVIDER_HOST=postgres
      - FEATUREFORM_OFFLINE_PROVIDER_PORT=5432
    depends_on:
      - redis
      - postgres

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD: featureform
      POSTGRES_DB: featureform
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Start with:

1
docker compose up -d

Defining Features

Featureform uses Python decorators to define features over existing data sources:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import featureform as ff

# Register existing data sources
ff.register_postgres(
    name="postgres_db",
    host="localhost",
    port="5432",
    database="featureform",
    user="postgres",
    password="featureform",
)

ff.register_redis(
    name="redis_db",
    host="localhost",
    port="6379",
)

# Define a training set from existing tables
@ff.training_set(name="fraud_detection_training", source="postgres_db")
def fraud_training_set(transactions):
    return f"""
        SELECT
            transaction_id,
            amount,
            merchant_category,
            user_age,
            is_fraud
        FROM transactions
        WHERE created_at > '2025-01-01'
    """

# Define a feature that transforms existing data
@ff.feature(name="user_avg_spend", source="postgres_db")
def user_avg_spend(users):
    return f"""
        SELECT
            user_id,
            AVG(transaction_amount) as avg_spend
        FROM user_transactions
        GROUP BY user_id
    """

ff.register_training_set("fraud_detection_training")
ff.register_feature("user_avg_spend", variant="v1")

Strengths

  • Zero data duplication — features live in your existing databases, no ETL into a separate store
  • Go-based server — single binary, fast startup, low resource footprint
  • Simple deployment — Docker Compose works out of the box; no Kubernetes required
  • Provider flexibility — mix and match providers (PostgreSQL for offline, Redis for online, Spark for compute)
  • SQL-native — features defined as SQL queries, accessible to data analysts who do not know Python

Limitations

  • Smaller community — fewer contributors and less community activity compared to Feast
  • Less mature — fewer integrations, fewer production deployments documented
  • Limited offline store options — primarily designed for SQL-based offline stores; less support for data lake formats
  • No Helm chart — Kubernetes deployment requires manual setup
  • Development pace has slowed — last significant push was in mid-2025

Hopsworks: The Full-Stack AI Platform

Repository: logicalclocks/hopsworks Stars: 1,292 | Language: Java | Last Active: February 2025

Hopsworks is not just a feature store — it is a complete data-intensive AI platform that includes a feature store, model registry, training pipeline, and serving layer. Built by Logical Clocks (founded by the researchers behind Apache Hops), it uses Apache Hops as its execution engine and provides a web-based UI for managing the entire ML lifecycle.

Architecture

Hopsworks integrates multiple components into a single platform:

  • Feature Store — online (Redis/MySQL) and offline (Hive/Iceberg) stores with point-in-time correctness
  • Model Registry — versioned model storage with metadata and lineage tracking
  • Training Pipeline — built-in Jupyter notebooks and Spark-based feature computation
  • Serving Layer — REST/gRPC endpoints for model and feature serving
  • Web UI — full management interface for features, models, projects, and monitoring

Installation

Hopsworks provides an installer script for bare-metal or VM deployment:

1
2
3
4
5
6
# Download the installer
wget https://repo.hops.works/install.sh
chmod +x install.sh

# Run the installer (requires root)
sudo ./install.sh --action install

For Docker-based local development:

1
2
3
4
5
docker run -d \
  --name hopsworks \
  -p 8080:8080 \
  -p 9080:9080 \
  logicalclocks/hopsworks:latest

Access the UI at http://localhost:8080. The default credentials are admin:admin (change immediately in production).

Using the Feature Store

Hopsworks provides both a Python SDK and a web UI for feature management. Here is the Python approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import hopsworks

# Connect to the Hopsworks feature store
project = hopsworks.login()
fs = project.get_feature_store()

# Create a feature group
fg = fs.create_feature_group(
    name="driver_features",
    description="Driver behavior features for fraud detection",
    version=1,
    primary_key=["driver_id"],
    event_time="timestamp",
    online_enabled=True,
)

# Ingest data from a pandas DataFrame
fg.insert(driver_df)

# Retrieve features for training
feature_view = fs.get_feature_view(
    name="driver_fv",
    version=1,
)
training_data = feature_view.get_training_data()

The web UI provides visual feature management, including:

  • Feature browsing and search across all projects
  • Feature lineage visualization (source → transformation → feature group → training dataset)
  • Feature statistics and distribution analysis
  • Model deployment monitoring and A/B testing dashboards

Strengths

  • Complete platform — feature store, model registry, training, and serving in one package
  • Web UI included — no need for separate dashboards; full management interface out of the box
  • Enterprise-grade — built-in multi-tenancy, RBAC, and audit logging
  • Apache Hops integration — scalable Spark-based feature computation
  • Iceberg support — modern table format for the offline store with ACID guarantees

Limitations

  • Heavy deployment — requires significant resources (minimum 16 GB RAM, multi-core CPU)
  • Java/Scala stack — less Python-native compared to Feast; SDK is good but core is JVM-based
  • Opinionated architecture — designed around the Hops ecosystem; harder to integrate with non-Hops tools
  • Slower development — last major commit was in early 2025; community is smaller than Feast
  • Complex initial setup — the installer requires root access and configures many services

Comparison Table

FeatureFeastFeatureformHopsworks
Stars6,9741,9691,292
LanguagePythonGoJava
ArchitectureDedicated storeVirtual layerFull platform
Online StoresRedis, SQLite, Datastore, DynamoDB, SnowflakeRedis, MySQL, CassandraRedis, MySQL
Offline StoresPostgreSQL, BigQuery, Snowflake, Redshift, DuckDB, ParquetPostgreSQL, BigQuery, Snowflake, SparkHive, Iceberg
Compute EnginePython (local), Dataflow, SparkSpark, Ray, localApache Hops (Spark)
Web UINoNo (CLI/REST only)Yes (built-in)
KubernetesHelm charts availableManual setupBuilt-in (Hopsworks K8s)
Docker SupportYes (SDK only)Yes (full server)Yes (limited local dev)
Feature MonitoringExternal integrationExternal integrationBuilt-in
Model RegistryExternal (MLflow, etc.)ExternalBuilt-in
Multi-tenancyNoLimitedYes (projects + RBAC)
Point-in-TimeYesYesYes
Best ForTeams wanting flexibilityTeams with existing data infraTeams wanting an all-in-one platform

Deployment Comparison

Feast on Kubernetes (Production)

Feast is designed for cloud-native deployment. The recommended production setup uses Helm:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Add the Feast Helm repository
helm repo add feast https://feast-dev.github.io/feast/charts
helm repo update

# Create a namespace
kubectl create namespace feast

# Install Feast Core
helm install feast-core feast/feast-core \
  --namespace feast \
  --set registry.type=s3 \
  --set registry.bucket=my-feast-registry \
  --set online.type=redis \
  --set online.host=redis-master.default.svc.cluster.local

# Install Feast Online Serving
helm install feast-online feast/feast-online \
  --namespace feast \
  --set redis.host=redis-master.default.svc.cluster.local

Featureform with Docker Compose (Development)

Featureform is the easiest to get running locally:

1
2
3
4
docker compose up -d

# Verify the API is running
curl http://localhost:8080/api/health

Hopsworks on a VM (All-in-One)

Hopsworks ships as a bundled installer:

1
2
3
4
5
# Minimum requirements: 4 vCPUs, 16 GB RAM, 50 GB disk
sudo ./install.sh --action install

# After installation, access the UI
# http://<server-ip>:8080

Feature Engineering Workflow Comparison

Here is how the end-to-end workflow differs across the three platforms:

Feast Workflow

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# 1. Define features in Python
feast apply

# 2. Materialize features to online store
feast materialize 2025-01-01T00:00:00 2026-04-19T00:00:00

# 3. Retrieve features in Python for training
from feast import FeatureStore
store = FeatureStore(repo_path=".")
training_df = store.get_historical_features(
    features=["driver_stats:conv_rate", "driver_stats:avg_daily_trips"],
    entity_df=entity_df,
).to_df()

# 4. Train model and save to external registry
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier().fit(X, y)
# Save to MLflow (see our ML experiment tracking guide for details)

Featureform Workflow

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# 1. Connect and define sources via decorators
# 2. Create training sets and features via decorators
ff.apply()  # Registers everything with the orchestrator

# 3. Retrieve features at inference time
import featureform as ff
client = ff.Client(host="localhost:6565")
features = client.get_features(
    feature_names=["user_avg_spend.v1"],
    entity_rows=[{"user_id": "u123"}],
)

# 4. Get training data
training_data = client.get_training_set(
    "fraud_detection_training",
    resource_type="TrainingSet",
)

Hopsworks Workflow

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import hopsworks

# 1. Connect to project
project = hopsworks.login()
fs = project.get_feature_store()

# 2. Create and populate feature groups
fg = fs.create_feature_group(name="transactions", version=1)
fg.insert(transaction_df)

# 3. Create feature view and get training data
fv = fs.get_feature_view("fraud_detection", version=1)
X_train, y_train = fv.get_training_data()

# 4. Train and register model in built-in registry
mr = project.get_model_registry()
model = mr.python.create_model(name="fraud_model", version=1)
model.save(model_binary)

Choosing the Right Feature Store

Choose Feast if:

  • You want the most mature and actively developed open source feature store
  • Your team is Python-centric and uses pandas/scikit-learn/TensorFlow
  • You need maximum flexibility in backend choices (multiple offline/online store combinations)
  • You already have a Kubernetes cluster and are comfortable with Helm
  • You want LF AI & Data governance and long-term project sustainability

Choose Featureform if:

  • You want to avoid data duplication and keep features in existing databases
  • You prefer SQL-based feature definitions over Python SDKs
  • You need a lightweight, single-binary deployment (no Kubernetes required)
  • Your data infrastructure already includes PostgreSQL, Snowflake, or BigQuery
  • You are a smaller team that values simplicity over ecosystem breadth

Choose Hopsworks if:

  • You want a complete ML platform (feature store + model registry + serving + UI)
  • You need multi-tenancy and role-based access control
  • You value a web-based management interface for non-technical team members
  • You are building on Apache Spark and want deep Hops ecosystem integration
  • You need built-in feature monitoring and model deployment capabilities

For teams building a comprehensive MLOps stack, consider how a feature store integrates with other components. Our guides on self-hosted data orchestration with Apache Airflow, Prefect, and Dagster and ML experiment tracking with MLflow, ClearML, and Aim cover the adjacent pieces of the pipeline.

FAQ

What is a feature store and why do I need one?

A feature store is a centralized system for managing machine learning features. It stores both historical features (for training) and real-time features (for inference), ensuring consistency between your training and production environments. Without a feature store, teams often rebuild feature computation logic for every model, leading to training-serving skew, duplicated effort, and inconsistent predictions across models.

Can I use a feature store without Kubernetes?

Yes. Featureform runs as a single binary and works with Docker Compose — no Kubernetes required. Feast can run locally using the Python SDK for development and testing, though production deployments typically use Helm on Kubernetes. Hopsworks provides a bare-metal installer that works on any Linux VM.

Which feature store has the best community support?

Feast has the largest community by a significant margin, with nearly 7,000 GitHub stars, regular releases, and backing from the LF AI & Data foundation. It has the most documentation, tutorials, and third-party integrations. Featureform and Hopsworks have smaller but active communities.

Do I need a feature store for batch (offline) ML models?

Feature stores are most valuable when you have real-time inference needs, multiple models sharing features, or a large team where feature discovery and reuse matters. For a single batch model with a simple feature set, a feature store may be overkill. But as your ML footprint grows — more models, more teams, more features — the organizational benefits compound quickly.

How does a feature store prevent data leakage?

Feature stores enforce point-in-time correctness: when retrieving historical features for training, they return only the feature values that were available at the time of each training event. This prevents future data from leaking into training features, which would artificially inflate model performance and cause failures in production.

Can I migrate from one feature store to another?

Migration between feature stores requires re-defining feature schemas and re-materializing data. Feast and Featureform use different abstraction layers, so there is no direct migration path. Hopsworks has its own internal format. Plan for migration effort proportional to your feature count — expect to rebuild feature definitions and re-run historical feature computation.

Advertise here