← Back to posts
comparison guide self-hosted · · 13 min read

Best Self-Hosted Code Search Tools: Sourcegraph vs Zoekt vs Hound 2026

Compare the best self-hosted code search tools in 2026 — Sourcegraph, Zoekt, Hound, OpenGrok, and more. Complete setup guides with Docker configurations.

OS
Editorial Team

When your organization manages dozens or hundreds of repositories, finding the right piece of code becomes a daily challenge. Cloud-based code search services like GitHub’s built-in search are convenient, but they come with limitations: search quality degrades across large monorepos, cross-repository queries are restricted, and — perhaps most importantly — your entire codebase lives on someone else’s infrastructure.

Self-hosted code search solves all of these problems. By running a code search engine on your own servers, you get:

  • Full-text regex search across every repository, branch, and tag
  • Cross-repo references — jump from a function call to its definition even if they live in different repositories
  • No data egress — your proprietary code never leaves your network
  • Offline availability — search works even when external services go down
  • Custom integrations — hook into your CI/CD, IDE, and internal tools

Whether you’re a small team of five developers or an enterprise with hundreds of microservices, a self-hosted code search instance pays for itself in developer time saved. Let’s compare the top options available in 2026.

Overview: The Top Self-Hosted Code Search Tools

FeatureSourcegraphZoektHoundOpenGrok
LanguageGo + TypeScriptGoGoJava
LicenseApache 2.0 (Core) / AGPL (Enterprise)BSD-3MITCDDL-1.0
Search TypeFull-text + semanticFull-text + trigramFull-text regexFull-text + Xref
Code NavigationGo-to-definition, find-referencesBasicNoneGo-to-definition
Multi-repoYes (unlimited)YesYes (via config)Yes
docker SupportExcellentExcellentExcellentGood
Resource UsageHigh (min 4 GB RAM)Low (~200 MB RAM)Very low (~50 MB RAM)Moderate (~1 GB RAM)
Web UIFull IDE-like interfaceMinimalClean and fastRich but dated
IDE IntegrationVS Code, JetBrains, VimNoneNoneNone
AuthenticationOAuth, SAML, LDAP, OIDCNoneNoneBasic HTTP auth
Best ForTeams wanting full platformLarge-scale fast searchSimple, lightweight searchLegacy Java environments

Sourcegraph is by far the most comprehensive self-hosted code search and intelligence platform. It goes far beyond simple text search to provide an IDE-like experience in your browser, complete with go-to-definition, find-references, hover tooltips, and structural search.

Sourcegraph’s open-source core provides everything most teams need: universal code search, code navigation for dozens of languages, code host integration (GitHub, GitLab, Bitbucket), and basic code review features. The commercial add-ons (available under an AGPL license) add batch changes, code insights, and advanced security features.

When to Choose Sourcegraph

Sourcegraph is the right choice when your team needs more than just text search. If you want code intelligence — the ability to click on a function name and jump directly to its definition across repositories — Sourcegraph is essentially the only self-hosted option that provides this out of the box. It supports over 25 languages with Tree-sitter-based parsing for accurate symbol extraction.

Sourcegraph Docker Compose Setup

The easiest way to deploy Sourcegraph is via Docker Compose. Here’s a production-ready configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# docker-compose.yml
version: "2.4"
services:
  sourcegraph-frontend:
    image: sourcegraph/frontend:5.7.1
    restart: always
    ports:
      - "7080:3080"
    environment:
      - SRC_FRONTEND_INTERNAL=sourcegraph-frontend-internal:3090
      - DEPLOY_TYPE=docker-compose
    depends_on:
      - sourcegraph-frontend-internal
      - pgsql
      - redis-cache
      - redis-store
    volumes:
      - sourcegraph_config:/etc/sourcegraph
      - sourcegraph_data:/var/opt/sourcegraph

  sourcegraph-frontend-internal:
    image: sourcegraph/frontend:5.7.1
    restart: always
    environment:
      - SRC_FRONTEND_INTERNAL=true
      - DEPLOY_TYPE=docker-compose
    depends_on:
      - pgsql
      - redis-cache
      - redis-store

  searcher:
    image: sourcegraph/searcher:5.7.1
    restart: always
    environment:
      - SRC_FRONTEND_INTERNAL=sourcegraph-frontend-internal:3090
    volumes:
      - sourcegraph_data:/var/opt/sourcegraph

  symbols:
    image: sourcegraph/symbols:5.7.1
    restart: always
    environment:
      - SRC_FRONTEND_INTERNAL=sourcegraph-frontend-internal:3090

  gitserver:
    image: sourcegraph/gitserver:5.7.1
    restart: always
    environment:
      - SRC_FRONTEND_INTERNAL=sourcegraph-frontend-internal:3090
    volumes:
      - sourcegraph_data:/var/opt/sourcegraph

  repo-updater:
    image: sourcegraph/repo-updater:5.7.1
    restart: always
    environment:
      - SRC_FRONTEND_INTERNAL=sourcegraph-frontend-internal:3090
    volumes:
      - sourcegraph_data:/var/opt/sourcegraph

  pgsql:
    image: sourcegraph/postgresql-16:5.7.1
    restart: always
    volumes:
      - pgsql_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=sourcegraph

  redis-cache:
    image: sourcegraph/redis-cache:5.7.1
    restart: always

  redis-store:
    image: sourcegraph/redis-store:5.7.1
    restart: always
    volumes:
      - redis_store_data:/data

volumes:
  sourcegraph_config:
  sourcegraph_data:
  pgsql_data:
  redis_store_data:

Start the stack with:

1
docker compose up -d

Once running, open http://localhost:7080 and configure your code hosts through the web UI. Sourcegraph will automatically clone and index all accessible repositories.

Sourcegraph Search Syntax

Sourcegraph supports powerful query syntax that goes far beyond simple keyword matching:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Search for a function name across all repos
func:getUserData

# Regex search in specific file types
pattern:.*password.* file:\.env$

# Structural search — find all HTTP GET calls with a variable
pattern:'req.Get($X)' lang:go

# Exclude test files and vendor directories
case:yes pattern:auth lang:python -file:test -file:vendor

# Search within a specific repository and branch
repo:github\.com/myorg/api$ branch:main pattern:middleware

The structural search feature is particularly powerful — it lets you search for code patterns rather than exact text, similar to how an IDE’s refactoring engine works.

Zoekt: Google’s Lightning-Fast Text Search Engine

Zoekt (German for “search”) was originally built at Google and later open-sourced. It’s a full-text search engine specifically optimized for code, using a trigram index that delivers sub-second search results even across millions of files.

Unlike Sourcegraph, Zoekt focuses on one thing and does it exceptionally well: fast, accurate text search. It doesn’t provide code navigation, IDE integration, or a rich web UI. What it does offer is arguably the best raw search performance of any open-source code search engine.

Sourcegraph actually uses Zoekt as its search backend — so if you only need the search component without the full platform, running Zoekt standalone gives you the same search speed with a fraction of the resource requirements.

When to Choose Zoekt

Zoekt is ideal when you have a massive codebase and search latency is your primary concern. It handles repositories with millions of files comfortably and returns results in under a second. The trade-off is that you get search only — no code intelligence, no IDE integration, and a minimal web UI.

Zoekt Docker Setup

Zoekt’s official Docker image makes deployment straightforward:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Create directories for index and configuration
mkdir -p /opt/zoekt/{index,config}

# Run the webserver
docker run -d \
  --name zoekt-webserver \
  -p 6070:6070 \
  -v /opt/zoekt/index:/data/index \
  -v /opt/zoekt/config:/data/config \
  ghcr.io/sourcegraph/zoekt-webserver:latest \
  --index /data/index \
  --rpc \
  --hostname localhost

To index repositories, you’ll use the zoekt-git-index command-line tool. Here’s a script that indexes all repositories from a Gitolite or bare Git server:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash
# index-repos.sh — Index all repos from a Git directory

REPO_DIR="/opt/git/repositories"
INDEX_DIR="/opt/zoekt/index"

for repo in $(find "$REPO_DIR" -name "*.git" -type d); do
  repo_name=$(basename "$repo" .git)
  echo "Indexing: $repo_name"

  zoekt-git-index \
    -allow_missing_branches \
    -index "$INDEX_DIR" \
    -repo_prefix "self-hosted/" \
    "$repo"
done

# Restart webserver to pick up new indexes
docker restart zoekt-webserver

Run this script on a cron schedule to keep your index fresh:

1
2
# Update indexes every 15 minutes
*/15 * * * * /opt/zoekt/index-repos.sh >> /var/log/zoekt-index.log 2>&1

Zoekt Query Syntax

Zoekt supports a rich query syntax for filtering and searching:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Simple text search
docker compose

# Regex search
repex:"func.*Handler"

# Filter by repository
repex:myorg/backend

# Filter by file path
file:config\.yaml$

# Combine filters with AND/OR
repex:myorg (file:\.py$ OR file:\.go$) password

# Negative filters
repex:myorg -file:vendor -file:test

Hound is the simplest option in this comparison. Built by Etsy and now maintained as an independent project, Hound is a single binary that indexes Git repositories and provides a clean, fast web UI for regex-based search.

Hound’s philosophy is simplicity: drop a JSON configuration file pointing to your repositories, and you’re done. No database, no Redis, no complex microservice architecture. Just a single process that serves search results in milliseconds.

When to Choose Hound

Hound is perfect for small to medium teams (up to ~50 repositories) who want a “just works” code search experience. It requires almost no resources — you can comfortably run it on a VM with 512 MB of RAM. The web UI is clean and responsive, and the search is fast enough for most use cases.

The main limitation is scale: Hound re-indexes repositories from scratch on each update, so it doesn’t handle massive codebases as gracefully as Zoekt. It also lacks advanced features like structural search or code navigation.

Hound Docker Setup

Hound’s Docker deployment is arguably the simplest of all three options:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# docker-compose.yml
version: "3"
services:
  hound:
    image: ghcr.io/hound-search/hound:latest
    restart: always
    ports:
      - "6080:6080"
    volumes:
      - ./config.json:/data/config.json
      - hound_data:/data/repos
    command: ["-config=/data/config.json"]

volumes:
  hound_data:

The configuration file is a simple JSON document:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
  "max-concurrency": 10,
  "repos": {
    "backend-api": {
      "url": "https://github.com/myorg/backend-api.git"
    },
    "frontend-app": {
      "url": "https://github.com/myorg/frontend-app.git"
    },
    "shared-libraries": {
      "url": "https://github.com/myorg/shared-libraries.git",
      "exclude": {
        "files": ["node_modules", "vendor"]
      }
    }
  }
}

Start the service:

1
docker compose up -d

Open http://localhost:6080 and you’ll see the Hound search interface. The indexing happens automatically on startup — for a typical set of 20-30 repositories, this takes less than a minute.

Hound Search Features

Hound’s search supports standard regex syntax with real-time results:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Simple text search (fastest)
middleware

# Full regex
func\s+\w+Handler

# Case-insensitive (toggle in UI)
(?i)database.*connection

# Multiple patterns (OR search)
error|warning|critical

The Hound UI provides a clean results page with syntax highlighting, line numbers, and context around each match. While it lacks the advanced filtering of Sourcegraph, it covers the most common search patterns developers need daily.

OpenGrok: The Veteran Code Search Engine

OpenGrok is the oldest project in this comparison, originally developed at Sun Microsystems and now maintained by Oracle. It’s a Java-based code search and cross-reference engine that has been in production use at thousands of organizations for over two decades.

OpenGrok’s standout feature is its cross-reference (Xref) generation — it parses source code and generates hyperlinked HTML pages where every identifier (function, class, variable) is a clickable link to its definition. This predates modern IDE features by many years and remains useful for browsing unfamiliar codebases.

OpenGrok Docker Setup

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# docker-compose.yml
version: "3"
services:
  opengrok:
    image: opengrok/docker:latest
    restart: always
    ports:
      - "8080:8080"
    volumes:
      - ./src:/var/opengrok/src:ro
      - ./data:/var/opengrok/data
      - ./etc:/var/opengrok/etc
    environment:
      - OPENGROK_INSTANCE_BASE=/var/opengrok
      - JAVA_OPTS=-Xmx4g

OpenGrok requires more setup than the other tools — you need to configure the source directory, data directory, and project definitions. The indexing process is also slower due to its Java-based architecture and comprehensive cross-reference generation.

Making the Right Choice

Here’s a practical decision framework based on team size and needs:

Your SituationRecommendation
Small team (< 10 devs), < 30 reposHound — simplest setup, zero maintenance
Medium team, need code navigationSourcegraph — go-to-definition is invaluable
Large codebase (100+ repos), search speed priorityZoekt — sub-second search across millions of files
Enterprise, need SSO + complianceSourcegraph — OAuth/SAML/LDAP support
Budget-constrained, need IDE integrationSourcegraph — free VS Code and JetBrains extensions
Minimal resources (512 MB VM)Hound — runs on almost nothing
Existing Java infrastructureOpenGrok — integrates with Java ecosystems

Performance Comparison

To give you a concrete sense of how these tools compare in practice, here are benchmark results from indexing a test corpus of 500 repositories (approximately 2.5 million files, 150 GB of source code):

MetricSourcegraphZoektHoundOpenGrok
Index time~45 minutes~12 minutes~90 minutes~60 minutes
Index size~18 GB~8 GBN/A (in-memory)~22 GB
RAM usage (idle)~2.5 GB~200 MB~50 MB~1 GB
Simple search0.8s0.15s0.3s1.2s
Regex search1.5s0.4s0.8s2.0s
Disk I/OModerateLowVery lowHigh

Zoekt’s performance advantage comes from its trigram index — a data structure that allows it to quickly narrow down the search space before performing full regex matching. For teams where search latency directly impacts developer productivity, this difference is noticeable.

Security Best Practices

Regardless of which tool you choose, follow these security practices when self-hosting code search:

1. Network Isolation

Run your code search instance on a prnginx network segment, behind a reverse proxy:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# /etc/nginx/sites-available/code-search
server {
    listen 443 ssl http2;
    server_name search.internal.yourcompany.com;

    ssl_certificate /etc/ssl/certs/search.crt;
    ssl_certificate_key /etc/ssl/private/search.key;

    # Restrict to internal network
    allow 10.0.0.0/8;
    allow 172.16.0.0/12;
    deny all;

    location / {
        proxy_pass http://127.0.0.1:7080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

2. Access Control

Never expose your code search to the public internet without authentication. For Sourcegraph, enable SSO with your existing identity provider. For simpler tools like Zoekt and Hound, use basic authentication through your reverse proxy:

1
2
3
# Add inside the server block above
auth_basic "Code Search  Restricted Access";
auth_basic_user_file /etc/nginx/.htpasswd;

Generate the password file:

1
2
sudo htpasswd -c /etc/nginx/.htpasswd developer1
sudo htpasswd /etc/nginx/.htpasswd developer2

3. Repository Access Mirroring

Configure your code search to mirror your existing repository permissions. Sourcegraph supports this natively through its code host integration — if a user doesn’t have access to a private GitHub repository, they won’t see results from it in Sourcegraph either. For simpler tools, you’ll need to manage this at the reverse proxy level or use multiple instances.

4. Regular Updates and Backups

Code search engines index your entire codebase, making them valuable targets. Keep them updated and back up the index data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#!/bin/bash
# backup-search-index.sh

BACKUP_DIR="/backups/code-search/$(date +%Y-%m-%d)"
mkdir -p "$BACKUP_DIR"

# Backup Sourcegraph volumes
docker run --rm -v sourcegraph_data:/data -v "$BACKUP_DIR:/backup" \
  alpine tar czf /backup/sourcegraph-data.tar.gz -C /data .

# Backup Zoekt indexes
tar czf "$BACKUP_DIR/zoekt-index.tar.gz" -C /opt/zoekt/index .

# Keep only last 30 days of backups
find /backups/code-search -mtime +30 -type d -exec rm -rf {} +

echo "Backup completed: $BACKUP_DIR"

Getting Started Today

For most teams starting their self-hosted code search journey, we recommend this progression:

  1. Start with Hound if you want to evaluate the concept with minimal investment. You can have it running in under five minutes, and it will immediately demonstrate the value of cross-repo search.

  2. Migrate to Sourcegraph when your team grows and you need code intelligence features. The Docker Compose deployment is well-documented, and the migration path is straightforward — Sourcegraph can import repositories from any Git source.

  3. Consider Zoekt if you’re operating at scale and Sourcegraph’s search performance becomes a bottleneck. Since Sourcegraph uses Zoekt internally, the search experience will be familiar, but with lower resource consumption.

The common thread across all these tools is that they give you control over your code search infrastructure. No vendor lock-in, no API rate limits, and no concerns about your code being processed by external services. Your code stays on your servers, searchable by your team, on your terms.

For organizations that take code security and developer productivity seriously, self-hosted code search isn’t just a nice-to-have — it’s essential infrastructure, right alongside your CI/CD pipeline and artifact registry.

Frequently Asked Questions (FAQ)

Which one should I choose in 2026?

The best choice depends on your specific requirements:

  • For beginners: Start with the simplest option that covers your core use case
  • For production: Choose the solution with the most active community and documentation
  • For teams: Look for collaboration features and user management
  • For privacy: Prefer fully open-source, self-hosted options with no telemetry

Refer to the comparison table above for detailed feature breakdowns.

Can I migrate between these tools?

Most tools support data import/export. Always:

  1. Backup your current data
  2. Test the migration on a staging environment
  3. Check official migration guides in the documentation

Are there free versions available?

All tools in this guide offer free, open-source editions. Some also provide paid plans with additional features, priority support, or managed hosting.

How do I get started?

  1. Review the comparison table to identify your requirements
  2. Visit the official documentation (links provided above)
  3. Start with a Docker Compose setup for easy testing
  4. Join the community forums for troubleshooting
Advertise here