Feast vs Featureform vs Hopsworks: Best Self-Hosted ML Feature Store 2026

A feature store is a centralized platform that manages, stores, and serves machine learning features for both training and inference. It solves one of the most common pain points in production ML: the gap between how features are computed during experimentation versus how they are served in production. Without a feature store, data science teams often rebuild feature pipelines from scratch for every model, leading to training-serving skew, duplicated effort, and inconsistent results.

In this guide, we compare three leading open source feature store platforms — Feast, Featureform, and Hopsworks — covering architecture, deployment options, supported backends, and real-world suitability. Whether you are running a small team with a handful of models or an enterprise ML platform with hundreds of features, this comparison will help you choose the right tool.

Why Self-Host a Feature Store?

Third-party feature store solutions exist (Tecton, AWS SageMaker Feature Store, Databricks Feature Store), but self-hosting offers several advantages:

Data sovereignty — features never leave your infrastructure, critical for regulated industries (healthcare, finance)
Cost control — no per-feature or per-request pricing; you pay only for your own compute
Customization — integrate with existing data stacks (PostgreSQL, Redis, Kafka, S3) without vendor lock-in
Low-latency serving — deploy the online store close to your inference service, reducing round-trip time
Full observability — log feature access, monitor drift, and audit feature lineage using your own monitoring stack

For teams already running self-hosted ML infrastructure — experiment tracking, model registries, and data pipelines — a self-hosted feature store is the natural next piece. If you are already using tools like MLflow for ML experiment tracking, integrating a feature store completes the MLOps pipeline.

What Is a Feature Store?

A feature store provides three core capabilities:

Feature engineering — define features once using declarative schemas (entities, transformations, time windows)
Offline store — historical feature values stored in a data warehouse or lake for model training and backfilling
Online store — low-latency key-value store (Redis, DynamoDB) for real-time feature retrieval during inference

Additional features include feature versioning, point-in-time correctness (avoiding data leakage), feature sharing across teams, and monitoring for feature drift.

Feast: The Open Source Feature Store Pioneer

Repository: feast-dev/feast Stars: 6,974 | Language: Python | Last Active: April 2026

Feast (Feature Store) is the most widely adopted open source feature store. Originally developed by Gojek and Tecton, it is now maintained under the Linux Foundation’s LF AI & Data umbrella. Feast provides a Python SDK for defining features and supports a wide range of offline and online store backends.

Architecture

Feast follows a decoupled architecture:

Feature definitions — written in Python using the Feast SDK (entities, features, data sources)
Offline store — stores historical feature data (PostgreSQL, BigQuery, Snowflake, Redshift, DuckDB, Parquet files)
Online store — serves features at low latency (Redis, SQLite, Datastore, DynamoDB, Snowflake)
Registry — metadata store (file-based or database-backed) that tracks feature definitions and versions

Installation and Setup

Install Feast via pip:

1
pip install feast

Initialize a new Feast repository:

1
2
feast init my_feature_repo
cd my_feature_repo

This creates a basic project structure with feature_store.yaml, a data/ directory, and a sample feature definition file.

Configuration

The feature_store.yaml defines your project’s backend connections:

1
2
3
4
5
6
7
8
project: my_project
registry: data/registry.db
provider: local
online_store:
    type: redis
    connection_string: "localhost:6379"
offline_store:
    type: duckdb

For production deployments on kubernetes, Feast provides Helm charts:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# values.yaml for Feast Helm chart
feast:
  core:
    registry:
      path: "s3://my-bucket/registry.db"
  online:
    type: redis
    host: "redis-master.default.svc.cluster.local"
    port: 6379
  offline:
    type: bigquery

Deploy with Helm:

1
2
helm repo add feast https://feast-dev.github.io/feast/charts
helm install feast feast/feast-core -f values.yaml

Defining Features

Here is how you define a feature set in Feast:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
from feast import Entity, FeatureService, FeatureView, Field
from feast.types import Float32, Int64, String
from feast.infra.offline_stores.duckdb import DuckDBSource

driver = Entity(name="driver", join_keys=["driver_id"])

driver_stats = FeatureView(
    name="driver_stats",
    entities=[driver],
    schema=[
        Field(name="driver_id", dtype=Int64),
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64),
    ],
    source=driver_stats_source,
    online=True,
    ttl=timedelta(hours=2),
)

Materialize features to the online store:

1
feast materialize-incremental $(date -u +%Y-%m-%dT%H:%M:%S)

Strengths

Largest ecosystem — most stars, contributors, and community support of any open source feature store
Flexible backends — supports the widest range of offline and online store combinations
Python-native — integrates seamlessly with pandas, scikit-learn, PyTorch, and TensorFlow workflows
Point-in-time correctness — built-in support for historical feature retrieval with correct timestamps
LF AI & Data project — governed by a neutral foundation, reducing vendor risk

Limitations

No built-in UI — feature management is entirely code-driven; no web dashboard out of the box
No native model training integration — you need external tools (MLflow, Kubeflow) for the training pipeline
Kubernetes complexity — Helm deployment requires familiarity with K8s concepts
No built-in feature monitoring — drift detection requires integration with external monitoring tools

Featureform: The Virtual Feature Store

Repository: featureform/featureform Stars: 1,969 | Language: Go | Last Active: July 2025

Featureform takes a fundamentally different approach: instead of moving data into its own storage layer, it acts as a virtual feature store that sits on top of your existing data infrastructure. You define features as transformations over data sources you already have (PostgreSQL, Snowflake, BigQuery, Redis, Spark), and Featureform orchestrates the computation and retrieval without duplicating the data.

Architecture

Featureform’s architecture centers on a provider model:

Providers — connectors to your existing data infrastructure (offline providers for training data, online providers for serving)
Orchestrator — schedules and executes feature computations using your compute infrastructure (Spark, Ray, local)
API layer — REST/gRPC interface for feature retrieval during inference
No data movement — features stay in your existing databases; Featureform manages metadata and orchestration only

Installation

Featurefdockerns as a binary or Docker container:

1
2
# Using the official installer
curl -sfL https://install.featureform.com | sh

Or via Docker Compose:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
version: "3.8"

services:
  featureform:
    image: featureformcom/featureform:latest
    ports:
      - "6565:6565"  # gRPC API
      - "8080:8080"  # HTTP API
    environment:
      - FEATUREFORM_PROVIDER_TYPE=redis
      - FEATUREFORM_PROVIDER_CONNECTION_STRING=redis:6379
      - FEATUREFORM_OFFLINE_PROVIDER_TYPE=postgres
      - FEATUREFORM_OFFLINE_PROVIDER_HOST=postgres
      - FEATUREFORM_OFFLINE_PROVIDER_PORT=5432
    depends_on:
      - redis
      - postgres

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD: featureform
      POSTGRES_DB: featureform
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Start with:

1
docker compose up -d

Defining Features

Featureform uses Python decorators to define features over existing data sources:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import featureform as ff

# Register existing data sources
ff.register_postgres(
    name="postgres_db",
    host="localhost",
    port="5432",
    database="featureform",
    user="postgres",
    password="featureform",
)

ff.register_redis(
    name="redis_db",
    host="localhost",
    port="6379",
)

# Define a training set from existing tables
@ff.training_set(name="fraud_detection_training", source="postgres_db")
def fraud_training_set(transactions):
    return f"""
        SELECT
            transaction_id,
            amount,
            merchant_category,
            user_age,
            is_fraud
        FROM transactions
        WHERE created_at > '2025-01-01'
    """

# Define a feature that transforms existing data
@ff.feature(name="user_avg_spend", source="postgres_db")
def user_avg_spend(users):
    return f"""
        SELECT
            user_id,
            AVG(transaction_amount) as avg_spend
        FROM user_transactions
        GROUP BY user_id
    """

ff.register_training_set("fraud_detection_training")
ff.register_feature("user_avg_spend", variant="v1")

Strengths

Zero data duplication — features live in your existing databases, no ETL into a separate store
Go-based server — single binary, fast startup, low resource footprint
Simple deployment — Docker Compose works out of the box; no Kubernetes required
Provider flexibility — mix and match providers (PostgreSQL for offline, Redis for online, Spark for compute)
SQL-native — features defined as SQL queries, accessible to data analysts who do not know Python

Limitations

Smaller community — fewer contributors and less community activity compared to Feast
Less mature — fewer integrations, fewer production deployments documented
Limited offline store options — primarily designed for SQL-based offline stores; less support for data lake formats
No Helm chart — Kubernetes deployment requires manual setup
Development pace has slowed — last significant push was in mid-2025

Hopsworks: The Full-Stack AI Platform

Repository: logicalclocks/hopsworks Stars: 1,292 | Language: Java | Last Active: February 2025

Hopsworks is not just a feature store — it is a complete data-intensive AI platform that includes a feature store, model registry, training pipeline, and serving layer. Built by Logical Clocks (founded by the researchers behind Apache Hops), it uses Apache Hops as its execution engine and provides a web-based UI for managing the entire ML lifecycle.

Architecture

Hopsworks integrates multiple components into a single platform:

Feature Store — online (Redis/MySQL) and offline (Hive/Iceberg) stores with point-in-time correctness
Model Registry — versioned model storage with metadata and lineage tracking
Training Pipeline — built-in Jupyter notebooks and Spark-based feature computation
Serving Layer — REST/gRPC endpoints for model and feature serving
Web UI — full management interface for features, models, projects, and monitoring

Installation

Hopsworks provides an installer script for bare-metal or VM deployment:

1
2
3
4
5
6
# Download the installer
wget https://repo.hops.works/install.sh
chmod +x install.sh

# Run the installer (requires root)
sudo ./install.sh --action install

For Docker-based local development:

1
2
3
4
5
docker run -d \
  --name hopsworks \
  -p 8080:8080 \
  -p 9080:9080 \
  logicalclocks/hopsworks:latest

Access the UI at http://localhost:8080. The default credentials are admin:admin (change immediately in production).

Using the Feature Store

Hopsworks provides both a Python SDK and a web UI for feature management. Here is the Python approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import hopsworks

# Connect to the Hopsworks feature store
project = hopsworks.login()
fs = project.get_feature_store()

# Create a feature group
fg = fs.create_feature_group(
    name="driver_features",
    description="Driver behavior features for fraud detection",
    version=1,
    primary_key=["driver_id"],
    event_time="timestamp",
    online_enabled=True,
)

# Ingest data from a pandas DataFrame
fg.insert(driver_df)

# Retrieve features for training
feature_view = fs.get_feature_view(
    name="driver_fv",
    version=1,
)
training_data = feature_view.get_training_data()

The web UI provides visual feature management, including:

Feature browsing and search across all projects
Feature lineage visualization (source → transformation → feature group → training dataset)
Feature statistics and distribution analysis
Model deployment monitoring and A/B testing dashboards

Strengths

Complete platform — feature store, model registry, training, and serving in one package
Web UI included — no need for separate dashboards; full management interface out of the box
Enterprise-grade — built-in multi-tenancy, RBAC, and audit logging
Apache Hops integration — scalable Spark-based feature computation
Iceberg support — modern table format for the offline store with ACID guarantees

Limitations

Heavy deployment — requires significant resources (minimum 16 GB RAM, multi-core CPU)
Java/Scala stack — less Python-native compared to Feast; SDK is good but core is JVM-based
Opinionated architecture — designed around the Hops ecosystem; harder to integrate with non-Hops tools
Slower development — last major commit was in early 2025; community is smaller than Feast
Complex initial setup — the installer requires root access and configures many services

Comparison Table

Feature	Feast	Featureform	Hopsworks
Stars	6,974	1,969	1,292
Language	Python	Go	Java
Architecture	Dedicated store	Virtual layer	Full platform
Online Stores	Redis, SQLite, Datastore, DynamoDB, Snowflake	Redis, MySQL, Cassandra	Redis, MySQL
Offline Stores	PostgreSQL, BigQuery, Snowflake, Redshift, DuckDB, Parquet	PostgreSQL, BigQuery, Snowflake, Spark	Hive, Iceberg
Compute Engine	Python (local), Dataflow, Spark	Spark, Ray, local	Apache Hops (Spark)
Web UI	No	No (CLI/REST only)	Yes (built-in)
Kubernetes	Helm charts available	Manual setup	Built-in (Hopsworks K8s)
Docker Support	Yes (SDK only)	Yes (full server)	Yes (limited local dev)
Feature Monitoring	External integration	External integration	Built-in
Model Registry	External (MLflow, etc.)	External	Built-in
Multi-tenancy	No	Limited	Yes (projects + RBAC)
Point-in-Time	Yes	Yes	Yes
Best For	Teams wanting flexibility	Teams with existing data infra	Teams wanting an all-in-one platform

Deployment Comparison

Feast on Kubernetes (Production)

Feast is designed for cloud-native deployment. The recommended production setup uses Helm:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Add the Feast Helm repository
helm repo add feast https://feast-dev.github.io/feast/charts
helm repo update

# Create a namespace
kubectl create namespace feast

# Install Feast Core
helm install feast-core feast/feast-core \
  --namespace feast \
  --set registry.type=s3 \
  --set registry.bucket=my-feast-registry \
  --set online.type=redis \
  --set online.host=redis-master.default.svc.cluster.local

# Install Feast Online Serving
helm install feast-online feast/feast-online \
  --namespace feast \
  --set redis.host=redis-master.default.svc.cluster.local

Featureform with Docker Compose (Development)

Featureform is the easiest to get running locally:

1
2
3
4
docker compose up -d

# Verify the API is running
curl http://localhost:8080/api/health

Hopsworks on a VM (All-in-One)

Hopsworks ships as a bundled installer:

1
2
3
4
5
# Minimum requirements: 4 vCPUs, 16 GB RAM, 50 GB disk
sudo ./install.sh --action install

# After installation, access the UI
# http://<server-ip>:8080

Feature Engineering Workflow Comparison

Here is how the end-to-end workflow differs across the three platforms:

Feast Workflow

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# 1. Define features in Python
feast apply

# 2. Materialize features to online store
feast materialize 2025-01-01T00:00:00 2026-04-19T00:00:00

# 3. Retrieve features in Python for training
from feast import FeatureStore
store = FeatureStore(repo_path=".")
training_df = store.get_historical_features(
    features=["driver_stats:conv_rate", "driver_stats:avg_daily_trips"],
    entity_df=entity_df,
).to_df()

# 4. Train model and save to external registry
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier().fit(X, y)
# Save to MLflow (see our ML experiment tracking guide for details)

Featureform Workflow

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# 1. Connect and define sources via decorators
# 2. Create training sets and features via decorators
ff.apply()  # Registers everything with the orchestrator

# 3. Retrieve features at inference time
import featureform as ff
client = ff.Client(host="localhost:6565")
features = client.get_features(
    feature_names=["user_avg_spend.v1"],
    entity_rows=[{"user_id": "u123"}],
)

# 4. Get training data
training_data = client.get_training_set(
    "fraud_detection_training",
    resource_type="TrainingSet",
)

Hopsworks Workflow

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import hopsworks

# 1. Connect to project
project = hopsworks.login()
fs = project.get_feature_store()

# 2. Create and populate feature groups
fg = fs.create_feature_group(name="transactions", version=1)
fg.insert(transaction_df)

# 3. Create feature view and get training data
fv = fs.get_feature_view("fraud_detection", version=1)
X_train, y_train = fv.get_training_data()

# 4. Train and register model in built-in registry
mr = project.get_model_registry()
model = mr.python.create_model(name="fraud_model", version=1)
model.save(model_binary)

Choosing the Right Feature Store

Choose Feast if:

You want the most mature and actively developed open source feature store
Your team is Python-centric and uses pandas/scikit-learn/TensorFlow
You need maximum flexibility in backend choices (multiple offline/online store combinations)
You already have a Kubernetes cluster and are comfortable with Helm
You want LF AI & Data governance and long-term project sustainability

Choose Featureform if:

You want to avoid data duplication and keep features in existing databases
You prefer SQL-based feature definitions over Python SDKs
You need a lightweight, single-binary deployment (no Kubernetes required)
Your data infrastructure already includes PostgreSQL, Snowflake, or BigQuery
You are a smaller team that values simplicity over ecosystem breadth

Choose Hopsworks if:

You want a complete ML platform (feature store + model registry + serving + UI)
You need multi-tenancy and role-based access control
You value a web-based management interface for non-technical team members
You are building on Apache Spark and want deep Hops ecosystem integration
You need built-in feature monitoring and model deployment capabilities

For teams building a comprehensive MLOps stack, consider how a feature store integrates with other components. Our guides on self-hosted data orchestration with Apache Airflow, Prefect, and Dagster and ML experiment tracking with MLflow, ClearML, and Aim cover the adjacent pieces of the pipeline.

FAQ

What is a feature store and why do I need one?

A feature store is a centralized system for managing machine learning features. It stores both historical features (for training) and real-time features (for inference), ensuring consistency between your training and production environments. Without a feature store, teams often rebuild feature computation logic for every model, leading to training-serving skew, duplicated effort, and inconsistent predictions across models.

Can I use a feature store without Kubernetes?

Yes. Featureform runs as a single binary and works with Docker Compose — no Kubernetes required. Feast can run locally using the Python SDK for development and testing, though production deployments typically use Helm on Kubernetes. Hopsworks provides a bare-metal installer that works on any Linux VM.

Which feature store has the best community support?

Feast has the largest community by a significant margin, with nearly 7,000 GitHub stars, regular releases, and backing from the LF AI & Data foundation. It has the most documentation, tutorials, and third-party integrations. Featureform and Hopsworks have smaller but active communities.

Do I need a feature store for batch (offline) ML models?

Feature stores are most valuable when you have real-time inference needs, multiple models sharing features, or a large team where feature discovery and reuse matters. For a single batch model with a simple feature set, a feature store may be overkill. But as your ML footprint grows — more models, more teams, more features — the organizational benefits compound quickly.

How does a feature store prevent data leakage?

Feature stores enforce point-in-time correctness: when retrieving historical features for training, they return only the feature values that were available at the time of each training event. This prevents future data from leaking into training features, which would artificially inflate model performance and cause failures in production.

Can I migrate from one feature store to another?

Migration between feature stores requires re-defining feature schemas and re-materializing data. Feast and Featureform use different abstraction layers, so there is no direct migration path. Hopsworks has its own internal format. Plan for migration effort proportional to your feature count — expect to rebuild feature definitions and re-run historical feature computation.

Why Self-Host a Feature Store?

What Is a Feature Store?

Feast: The Open Source Feature Store Pioneer

Architecture

Installation and Setup

Configuration

Defining Features

Strengths

Limitations

Featureform: The Virtual Feature Store

Architecture

Installation

Defining Features

Strengths

Limitations

Hopsworks: The Full-Stack AI Platform

Architecture

Installation

Using the Feature Store

Strengths

Limitations

Comparison Table

Deployment Comparison

Feast on Kubernetes (Production)

Featureform with Docker Compose (Development)

Hopsworks on a VM (All-in-One)

Feature Engineering Workflow Comparison

Feast Workflow

Featureform Workflow

Hopsworks Workflow

Choosing the Right Feature Store

Choose Feast if:

Choose Featureform if:

Choose Hopsworks if:

FAQ

What is a feature store and why do I need one?

Can I use a feature store without Kubernetes?

Which feature store has the best community support?

Do I need a feature store for batch (offline) ML models?

How does a feature store prevent data leakage?

Can I migrate from one feature store to another?

Related Posts

Best Self-Hosted ML Experiment Tracking Tools in 2026: MLflow vs ClearML vs Aim

AIDE vs OSSEC vs Tripwire: Self-Hosted File Integrity Monitoring Guide 2026

Audiobookshelf vs Kavita vs Calibre-Web: Best Self-Hosted Book Server 2026