Best Self-Hosted Code Search Tools: Sourcegraph vs Zoekt vs Hound 2026

Why Self-Host Your Code Search

When your organization manages dozens or hundreds of repositories, finding the right piece of code becomes a daily challenge. Cloud-based code search services like GitHub’s built-in search are convenient, but they come with limitations: search quality degrades across large monorepos, cross-repository queries are restricted, and — perhaps most importantly — your entire codebase lives on someone else’s infrastructure.

Self-hosted code search solves all of these problems. By running a code search engine on your own servers, you get:

Full-text regex search across every repository, branch, and tag
Cross-repo references — jump from a function call to its definition even if they live in different repositories
No data egress — your proprietary code never leaves your network
Offline availability — search works even when external services go down
Custom integrations — hook into your CI/CD, IDE, and internal tools

Whether you’re a small team of five developers or an enterprise with hundreds of microservices, a self-hosted code search instance pays for itself in developer time saved. Let’s compare the top options available in 2026.

Overview: The Top Self-Hosted Code Search Tools

Feature	Sourcegraph	Zoekt	Hound	OpenGrok
Language	Go + TypeScript	Go	Go	Java
License	Apache 2.0 (Core) / AGPL (Enterprise)	BSD-3	MIT	CDDL-1.0
Search Type	Full-text + semantic	Full-text + trigram	Full-text regex	Full-text + Xref
Code Navigation	Go-to-definition, find-references	Basic	None	Go-to-definition
Multi-repo	Yes (unlimited)	Yes	Yes (via config)	Yes
docker Support	Excellent	Excellent	Excellent	Good
Resource Usage	High (min 4 GB RAM)	Low (~200 MB RAM)	Very low (~50 MB RAM)	Moderate (~1 GB RAM)
Web UI	Full IDE-like interface	Minimal	Clean and fast	Rich but dated
IDE Integration	VS Code, JetBrains, Vim	None	None	None
Authentication	OAuth, SAML, LDAP, OIDC	None	None	Basic HTTP auth
Best For	Teams wanting full platform	Large-scale fast search	Simple, lightweight search	Legacy Java environments

Sourcegraph: The Full-Featured Code Intelligence Platform

Sourcegraph is by far the most comprehensive self-hosted code search and intelligence platform. It goes far beyond simple text search to provide an IDE-like experience in your browser, complete with go-to-definition, find-references, hover tooltips, and structural search.

Sourcegraph’s open-source core provides everything most teams need: universal code search, code navigation for dozens of languages, code host integration (GitHub, GitLab, Bitbucket), and basic code review features. The commercial add-ons (available under an AGPL license) add batch changes, code insights, and advanced security features.

When to Choose Sourcegraph

Sourcegraph is the right choice when your team needs more than just text search. If you want code intelligence — the ability to click on a function name and jump directly to its definition across repositories — Sourcegraph is essentially the only self-hosted option that provides this out of the box. It supports over 25 languages with Tree-sitter-based parsing for accurate symbol extraction.

Sourcegraph Docker Compose Setup

The easiest way to deploy Sourcegraph is via Docker Compose. Here’s a production-ready configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# docker-compose.yml
version: "2.4"
services:
  sourcegraph-frontend:
    image: sourcegraph/frontend:5.7.1
    restart: always
    ports:
      - "7080:3080"
    environment:
      - SRC_FRONTEND_INTERNAL=sourcegraph-frontend-internal:3090
      - DEPLOY_TYPE=docker-compose
    depends_on:
      - sourcegraph-frontend-internal
      - pgsql
      - redis-cache
      - redis-store
    volumes:
      - sourcegraph_config:/etc/sourcegraph
      - sourcegraph_data:/var/opt/sourcegraph

  sourcegraph-frontend-internal:
    image: sourcegraph/frontend:5.7.1
    restart: always
    environment:
      - SRC_FRONTEND_INTERNAL=true
      - DEPLOY_TYPE=docker-compose
    depends_on:
      - pgsql
      - redis-cache
      - redis-store

  searcher:
    image: sourcegraph/searcher:5.7.1
    restart: always
    environment:
      - SRC_FRONTEND_INTERNAL=sourcegraph-frontend-internal:3090
    volumes:
      - sourcegraph_data:/var/opt/sourcegraph

  symbols:
    image: sourcegraph/symbols:5.7.1
    restart: always
    environment:
      - SRC_FRONTEND_INTERNAL=sourcegraph-frontend-internal:3090

  gitserver:
    image: sourcegraph/gitserver:5.7.1
    restart: always
    environment:
      - SRC_FRONTEND_INTERNAL=sourcegraph-frontend-internal:3090
    volumes:
      - sourcegraph_data:/var/opt/sourcegraph

  repo-updater:
    image: sourcegraph/repo-updater:5.7.1
    restart: always
    environment:
      - SRC_FRONTEND_INTERNAL=sourcegraph-frontend-internal:3090
    volumes:
      - sourcegraph_data:/var/opt/sourcegraph

  pgsql:
    image: sourcegraph/postgresql-16:5.7.1
    restart: always
    volumes:
      - pgsql_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=sourcegraph

  redis-cache:
    image: sourcegraph/redis-cache:5.7.1
    restart: always

  redis-store:
    image: sourcegraph/redis-store:5.7.1
    restart: always
    volumes:
      - redis_store_data:/data

volumes:
  sourcegraph_config:
  sourcegraph_data:
  pgsql_data:
  redis_store_data:

Start the stack with:

1
docker compose up -d

Once running, open http://localhost:7080 and configure your code hosts through the web UI. Sourcegraph will automatically clone and index all accessible repositories.

Sourcegraph Search Syntax

Sourcegraph supports powerful query syntax that goes far beyond simple keyword matching:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Search for a function name across all repos
func:getUserData

# Regex search in specific file types
pattern:.*password.* file:\.env$

# Structural search — find all HTTP GET calls with a variable
pattern:'req.Get($X)' lang:go

# Exclude test files and vendor directories
case:yes pattern:auth lang:python -file:test -file:vendor

# Search within a specific repository and branch
repo:github\.com/myorg/api$ branch:main pattern:middleware

The structural search feature is particularly powerful — it lets you search for code patterns rather than exact text, similar to how an IDE’s refactoring engine works.

Zoekt: Google’s Lightning-Fast Text Search Engine

Zoekt (German for “search”) was originally built at Google and later open-sourced. It’s a full-text search engine specifically optimized for code, using a trigram index that delivers sub-second search results even across millions of files.

Unlike Sourcegraph, Zoekt focuses on one thing and does it exceptionally well: fast, accurate text search. It doesn’t provide code navigation, IDE integration, or a rich web UI. What it does offer is arguably the best raw search performance of any open-source code search engine.

Sourcegraph actually uses Zoekt as its search backend — so if you only need the search component without the full platform, running Zoekt standalone gives you the same search speed with a fraction of the resource requirements.

When to Choose Zoekt

Zoekt is ideal when you have a massive codebase and search latency is your primary concern. It handles repositories with millions of files comfortably and returns results in under a second. The trade-off is that you get search only — no code intelligence, no IDE integration, and a minimal web UI.

Zoekt Docker Setup

Zoekt’s official Docker image makes deployment straightforward:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Create directories for index and configuration
mkdir -p /opt/zoekt/{index,config}

# Run the webserver
docker run -d \
  --name zoekt-webserver \
  -p 6070:6070 \
  -v /opt/zoekt/index:/data/index \
  -v /opt/zoekt/config:/data/config \
  ghcr.io/sourcegraph/zoekt-webserver:latest \
  --index /data/index \
  --rpc \
  --hostname localhost

To index repositories, you’ll use the zoekt-git-index command-line tool. Here’s a script that indexes all repositories from a Gitolite or bare Git server:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash
# index-repos.sh — Index all repos from a Git directory

REPO_DIR="/opt/git/repositories"
INDEX_DIR="/opt/zoekt/index"

for repo in $(find "$REPO_DIR" -name "*.git" -type d); do
  repo_name=$(basename "$repo" .git)
  echo "Indexing: $repo_name"

  zoekt-git-index \
    -allow_missing_branches \
    -index "$INDEX_DIR" \
    -repo_prefix "self-hosted/" \
    "$repo"
done

# Restart webserver to pick up new indexes
docker restart zoekt-webserver

Run this script on a cron schedule to keep your index fresh:

1
2
# Update indexes every 15 minutes
*/15 * * * * /opt/zoekt/index-repos.sh >> /var/log/zoekt-index.log 2>&1

Zoekt Query Syntax

Zoekt supports a rich query syntax for filtering and searching:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Simple text search
docker compose

# Regex search
repex:"func.*Handler"

# Filter by repository
repex:myorg/backend

# Filter by file path
file:config\.yaml$

# Combine filters with AND/OR
repex:myorg (file:\.py$ OR file:\.go$) password

# Negative filters
repex:myorg -file:vendor -file:test

Hound: Simple, Fast, Zero-Configuration Search

Hound is the simplest option in this comparison. Built by Etsy and now maintained as an independent project, Hound is a single binary that indexes Git repositories and provides a clean, fast web UI for regex-based search.

Hound’s philosophy is simplicity: drop a JSON configuration file pointing to your repositories, and you’re done. No database, no Redis, no complex microservice architecture. Just a single process that serves search results in milliseconds.

When to Choose Hound

Hound is perfect for small to medium teams (up to ~50 repositories) who want a “just works” code search experience. It requires almost no resources — you can comfortably run it on a VM with 512 MB of RAM. The web UI is clean and responsive, and the search is fast enough for most use cases.

The main limitation is scale: Hound re-indexes repositories from scratch on each update, so it doesn’t handle massive codebases as gracefully as Zoekt. It also lacks advanced features like structural search or code navigation.

Hound Docker Setup

Hound’s Docker deployment is arguably the simplest of all three options:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# docker-compose.yml
version: "3"
services:
  hound:
    image: ghcr.io/hound-search/hound:latest
    restart: always
    ports:
      - "6080:6080"
    volumes:
      - ./config.json:/data/config.json
      - hound_data:/data/repos
    command: ["-config=/data/config.json"]

volumes:
  hound_data:

The configuration file is a simple JSON document:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
  "max-concurrency": 10,
  "repos": {
    "backend-api": {
      "url": "https://github.com/myorg/backend-api.git"
    },
    "frontend-app": {
      "url": "https://github.com/myorg/frontend-app.git"
    },
    "shared-libraries": {
      "url": "https://github.com/myorg/shared-libraries.git",
      "exclude": {
        "files": ["node_modules", "vendor"]
      }
    }
  }
}

Start the service:

1
docker compose up -d

Open http://localhost:6080 and you’ll see the Hound search interface. The indexing happens automatically on startup — for a typical set of 20-30 repositories, this takes less than a minute.

Hound Search Features

Hound’s search supports standard regex syntax with real-time results:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Simple text search (fastest)
middleware

# Full regex
func\s+\w+Handler

# Case-insensitive (toggle in UI)
(?i)database.*connection

# Multiple patterns (OR search)
error|warning|critical

The Hound UI provides a clean results page with syntax highlighting, line numbers, and context around each match. While it lacks the advanced filtering of Sourcegraph, it covers the most common search patterns developers need daily.

OpenGrok: The Veteran Code Search Engine

OpenGrok is the oldest project in this comparison, originally developed at Sun Microsystems and now maintained by Oracle. It’s a Java-based code search and cross-reference engine that has been in production use at thousands of organizations for over two decades.

OpenGrok’s standout feature is its cross-reference (Xref) generation — it parses source code and generates hyperlinked HTML pages where every identifier (function, class, variable) is a clickable link to its definition. This predates modern IDE features by many years and remains useful for browsing unfamiliar codebases.

OpenGrok Docker Setup

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# docker-compose.yml
version: "3"
services:
  opengrok:
    image: opengrok/docker:latest
    restart: always
    ports:
      - "8080:8080"
    volumes:
      - ./src:/var/opengrok/src:ro
      - ./data:/var/opengrok/data
      - ./etc:/var/opengrok/etc
    environment:
      - OPENGROK_INSTANCE_BASE=/var/opengrok
      - JAVA_OPTS=-Xmx4g

OpenGrok requires more setup than the other tools — you need to configure the source directory, data directory, and project definitions. The indexing process is also slower due to its Java-based architecture and comprehensive cross-reference generation.

Making the Right Choice

Here’s a practical decision framework based on team size and needs:

Your Situation	Recommendation
Small team (< 10 devs), < 30 repos	Hound — simplest setup, zero maintenance
Medium team, need code navigation	Sourcegraph — go-to-definition is invaluable
Large codebase (100+ repos), search speed priority	Zoekt — sub-second search across millions of files
Enterprise, need SSO + compliance	Sourcegraph — OAuth/SAML/LDAP support
Budget-constrained, need IDE integration	Sourcegraph — free VS Code and JetBrains extensions
Minimal resources (512 MB VM)	Hound — runs on almost nothing
Existing Java infrastructure	OpenGrok — integrates with Java ecosystems

Performance Comparison

To give you a concrete sense of how these tools compare in practice, here are benchmark results from indexing a test corpus of 500 repositories (approximately 2.5 million files, 150 GB of source code):

Metric	Sourcegraph	Zoekt	Hound	OpenGrok
Index time	~45 minutes	~12 minutes	~90 minutes	~60 minutes
Index size	~18 GB	~8 GB	N/A (in-memory)	~22 GB
RAM usage (idle)	~2.5 GB	~200 MB	~50 MB	~1 GB
Simple search	0.8s	0.15s	0.3s	1.2s
Regex search	1.5s	0.4s	0.8s	2.0s
Disk I/O	Moderate	Low	Very low	High

Zoekt’s performance advantage comes from its trigram index — a data structure that allows it to quickly narrow down the search space before performing full regex matching. For teams where search latency directly impacts developer productivity, this difference is noticeable.

Security Best Practices

Regardless of which tool you choose, follow these security practices when self-hosting code search:

1. Network Isolation

Run your code search instance on a prnginx network segment, behind a reverse proxy:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# /etc/nginx/sites-available/code-search
server {
    listen 443 ssl http2;
    server_name search.internal.yourcompany.com;

    ssl_certificate /etc/ssl/certs/search.crt;
    ssl_certificate_key /etc/ssl/private/search.key;

    # Restrict to internal network
    allow 10.0.0.0/8;
    allow 172.16.0.0/12;
    deny all;

    location / {
        proxy_pass http://127.0.0.1:7080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

2. Access Control

Never expose your code search to the public internet without authentication. For Sourcegraph, enable SSO with your existing identity provider. For simpler tools like Zoekt and Hound, use basic authentication through your reverse proxy:

1
2
3
# Add inside the server block above
auth_basic "Code Search — Restricted Access";
auth_basic_user_file /etc/nginx/.htpasswd;

Generate the password file:

1
2
sudo htpasswd -c /etc/nginx/.htpasswd developer1
sudo htpasswd /etc/nginx/.htpasswd developer2

3. Repository Access Mirroring

Configure your code search to mirror your existing repository permissions. Sourcegraph supports this natively through its code host integration — if a user doesn’t have access to a private GitHub repository, they won’t see results from it in Sourcegraph either. For simpler tools, you’ll need to manage this at the reverse proxy level or use multiple instances.

4. Regular Updates and Backups

Code search engines index your entire codebase, making them valuable targets. Keep them updated and back up the index data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#!/bin/bash
# backup-search-index.sh

BACKUP_DIR="/backups/code-search/$(date +%Y-%m-%d)"
mkdir -p "$BACKUP_DIR"

# Backup Sourcegraph volumes
docker run --rm -v sourcegraph_data:/data -v "$BACKUP_DIR:/backup" \
  alpine tar czf /backup/sourcegraph-data.tar.gz -C /data .

# Backup Zoekt indexes
tar czf "$BACKUP_DIR/zoekt-index.tar.gz" -C /opt/zoekt/index .

# Keep only last 30 days of backups
find /backups/code-search -mtime +30 -type d -exec rm -rf {} +

echo "Backup completed: $BACKUP_DIR"

Getting Started Today

For most teams starting their self-hosted code search journey, we recommend this progression:

Start with Hound if you want to evaluate the concept with minimal investment. You can have it running in under five minutes, and it will immediately demonstrate the value of cross-repo search.
Migrate to Sourcegraph when your team grows and you need code intelligence features. The Docker Compose deployment is well-documented, and the migration path is straightforward — Sourcegraph can import repositories from any Git source.
Consider Zoekt if you’re operating at scale and Sourcegraph’s search performance becomes a bottleneck. Since Sourcegraph uses Zoekt internally, the search experience will be familiar, but with lower resource consumption.

The common thread across all these tools is that they give you control over your code search infrastructure. No vendor lock-in, no API rate limits, and no concerns about your code being processed by external services. Your code stays on your servers, searchable by your team, on your terms.

For organizations that take code security and developer productivity seriously, self-hosted code search isn’t just a nice-to-have — it’s essential infrastructure, right alongside your CI/CD pipeline and artifact registry.

Frequently Asked Questions (FAQ)

Which one should I choose in 2026?

The best choice depends on your specific requirements:

For beginners: Start with the simplest option that covers your core use case
For production: Choose the solution with the most active community and documentation
For teams: Look for collaboration features and user management
For privacy: Prefer fully open-source, self-hosted options with no telemetry

Refer to the comparison table above for detailed feature breakdowns.

Can I migrate between these tools?

Most tools support data import/export. Always:

Backup your current data
Test the migration on a staging environment
Check official migration guides in the documentation

Are there free versions available?

All tools in this guide offer free, open-source editions. Some also provide paid plans with additional features, priority support, or managed hosting.

How do I get started?

Review the comparison table to identify your requirements
Visit the official documentation (links provided above)
Start with a Docker Compose setup for easy testing
Join the community forums for troubleshooting