Every household and small business drowns in paper: utility bills, bank statements, insurance policies, tax receipts, contracts, medical records. Commercial solutions like DocuWare or SharePoint cost hundreds per month and lock your sensitive documents inside someone else’s infrastructure. Paperless-ngx changes that entirely. It is a free, open-source document management system that scans, indexes, and organizes every piece of paper you own — running on your own hardware, under your complete control.
In this guide, you will learn what Paperless-ngx is, why self-hosting your documents matters, how to deploy it with docker, and how to configure it for production use. Whether you want to archive a decade of tax records or build a paperless office from scratch, this guide covers everything.
Why Self-Host Your Document Management
The average person receives over 200 pieces of mail per year. Most of it ends up in a filing cabinet or a shoebox. Digitizing those documents and storing them in Google Drive or Dropbox might feel like progress, but it introduces real risks:
- Privacy exposure — Cloud providers scan your files for advertising profiles, policy enforcement, or training data. Financial statements, medical records, and contracts deserve better.
- Vendor lock-in — Migrate away from a cloud service and you discover that export is painful, search is limited, and metadata is lost.
- Subscription creep — Storage needs grow. What starts as a free tier becomes $10, then $20, then $100 per month as your archive expands.
- No OCR control — Cloud storage does not automatically extract text from scanned PDFs, making search unreliable.
- Compliance requirements — GDPR, HIPAA, and financial regulations often require data to remain on-premises or under your direct control.
Paperless-ngx solves all of these problems. It runs on a $35 Raspberry Pi or any spare computer, performs optical character recognition automatically, indexes every word for instant search, and stores everything in an open, portable format. You own the data, you own the infrastructure, and you own the search.
What Is Paperless-ngx
Paperless-ngx is a community-maintained fork of the original Paperless project. It is a Django-based web application that manages your digitized documents with these core capabilities:
- Automatic OCR — Every uploaded or scanned document is processed with Tesseract OCR, making all text searchable instantly.
- Smart tagging — Correspondents, tags, document types, and storage paths are automatically assigned using machine learning trained on your filing habits.
- Full-text search — Search across every word in every document, with highlighting and context snippets.
- Email consumption — Configure an email inbox; Paperless-ngx automatically downloads attachments and files them.
- Multi-format support — PDFs, images (PNG, JPG, TIFF), plain text, and Office documents are all handled natively.
- REST API — Full API access enables integration with scanners, mobile apps, and automation workflows.
- Multi-user support — Role-based access control with per-document ownership and sharing.
- Workflows — Automate complex filing rules based on document content, source, or metadata.
Paperless-ngx is licensed under the GPL-3.0 and has an active community of contributors. The “ngx” suffix denotes the next-generation rewrite with modern tooling, performance improvements, and a redesigned interface.
Comparing Document Management Options
Before diving into installation, it helps to understand where Paperless-ngx sits in the landscape of document management tools.
| Feature | Paperless-ngx | DocuWare | SharePoint | nextcloud Files |
|---|---|---|---|---|
| License | GPL-3.0 (Free) | Commercial | Commercial | AGPL-3.0 (Free) |
| Self-hosted | Yes | On-prem option | On-prem option | Yes |
| Automatic OCR | Built-in (Tesseract) | Add-on | Add-on | Requires app |
| Auto-tagging | ML-based | Manual | Manual | Manual |
| Full-text search | Native | Yes | Yes | Limited |
| Email consumption | Built-in | Yes | Requires flow | Requires app |
| REST API | Yes | Yes | Yes | Yes |
| Workflow engine | Built-in | Yes | Power Automate | Flow |
| Mobile app | Community | Official | Official | Official |
| Multi-user | Yes | Yes | Yes | Yes |
| Storage format | Open (PDF + metadata) | Proprietary DB | Proprietary | Open files |
Paperless-ngx stands out because it combines OCR, automatic classification, and search into a single package with zero licensing cost. Nextcloud Files is a close alternative for basic file storage, but it lacks the document-specific features like automatic OCR processing, correspondent tracking, and ML-based tagging that make Paperless-ngx purpose-built for document management.
System Requirements
Paperless-ngx is lightweight and runs on minimal hardware:
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 2 cores | 4+ cores |
| RAM | 2 GB | 4+ GB |
| Storage | 20 GB | 100+ GB SSD |
| OS | Linux (Debian/Ubuntu) | Debian 12 / Ubuntu 24.04 |
A Raspberry Pi 4 with 4 GB RAM handles a personal archive comfortably. For small business deployments with thousands of documents and multiple concurrent users, a small VPS or dedicated server with 4+ cores and an SSD is ideal.
Installation with Docker Compose
The recommended deployment method is Docker Compose. This ensures all dependencies — PostgreSQL, Redis, Tesseract, and the web application — run in isolated, reproducible containers.
Step 1: Install Docker and Docker Compose
| |
Step 2: Create the Project Directory
| |
Step 3: Create docker-compose.yml
| |
Important: Replace your-secure-db-password, change-this-to-a-long-random-string, and your-admin-password with strong, unique values. Generate a secret key with:
| |
Step 4: Launch the Stack
| |
The first startup takes a minute as PostgreSQL initializes and Paperless-ngx runs database migrations. Check the logs:
| |
Look for Starting Paperless-ngx server — that confirms everything is ready.
Step 5: Access the Web Interface
Open http://your-server-ip:8000 in your browser and log in with the admin credentials you set in the environment variables. You will see a clean, modern dashboard ready to receive documents.
Configuration and Optimization
Enable Multiple OCR Languages
If you receive documents in multiple languages, configure Tesseract to handle them:
| |
Install additional language packs on the host or build a custom Docker image with extra Tesseract language data packages.
Configure Email Consumption
Paperless-ngx can automatically process email attachments. Set up a dedicated email account and add these environment variables:
| |
Alternatively, use the built-in email consumption workflow in the web interface under Settings > Email to configure IMAP accounts with per-account tagging rules.
Set Up Automatic Document Classification
Paperless-ngx includes a machine learning classifier that learns from your filing behavior. After you manually tag 20-30 documents, the system begins suggesting correspondents, tags, and document types automatically.
To fine-tune classification:
- Go to Settings > Correspondents and create entries for regular senders (banks, utility companies, government agencies).
- Go to Settings > Tags and create a taxonomy (e.g.,
finance/taxes,finance/insurance,medical,legal). - Go to Settings > Document Types and define categories (Invoice, Contract, Statement, Receipt, Letter).
- Upload a batch of 30+ documents and manually classify them.
- The classifier will start auto-assigning metadata to new documents.
Configure Workflows for Automation
Workflows are the most powerful feature in Paperless-ngx. They let you define rules like “if a PDF contains the word ‘invoice’ and comes from a specific email address, assign tag finance/invoice and correspondent ‘Acme Corp’.”
Example workflow configuration via the web interface:
- Navigate to Settings > Workflows
- Create a new workflow with a Trigger (e.g., “Document added from consume folder”)
- Add Conditions (e.g., “Content contains ’tax return’”)
- Add Actions (e.g., “Assign tag
finance/tax-2025”, “Set document type to ‘Tax Document’”)
Workflows can also:
- Move documents to specific storage paths
- Send email notifications
- Assign documents to specific users
- Remove pages from scanned documents
- Run custom scripts
Adding Documents
Upload via Web Interface
Drag and drop files directly onto the dashboard. Paperless-ngx processes them immediately — OCR runs in the background, and the document appears in your library within seconds.
Watch the Consume Folder
Any file placed in the consume directory is automatically processed. This is ideal for scanner integration:
| |
Bulk Import
For migrating an existing archive:
| |
Use the REST API
For programmatic access:
| |
Generate API tokens under your user profile in the web interface.
Backup and Disaster Recovery
Your document archive is only as good as your backup strategy. Paperless-ngx stores data in three locations:
| Directory | Contents |
|---|---|
data/ | PostgreSQL database, Redis data, ML classifier models |
media/ | Original documents and OCR text files |
export/ | Exported documents and metadata |
Automated Backup Script
| |
Schedule this with cron for daily backups:
| |
Restore from Backup
| |
Securing Your Deployment
A document management system holds your most sensitive information. Harden it with these steps:
Enable HTTPS with a Reverse Proxy
| |
Create a Caddyfile:
| |
Caddy automatically obtains and renews TLS certificates from Let’s Encrypt.
Restrict Admin Access
| |
Network Isolation
Place Paperless-ngx on a separate Docker network with only necessary port exposure:
| |
Integrations and Ecosystem
Paperless-ngx integrates with the broader self-hosted ecosystem:
- Paperless-ngx mobile apps — Community apps for iOS and Android let you photograph documents on the go and upload them directly.
- Scanner integration — Most network scanners can be configured to save directly to the consume folder via SMB, NFS, or SCP.
- Authentik / Authelia — Use your existing SSO provider for authentication via reverse proxy headers.
- Immich — Cross-reference document photos with your self-hosted photo library.
- Syncthing — Sync the consume folder across multiple devices for distributed scanning.
- N8n / Node-RED — Trigger workflows from external systems via the REST API.
Conclusion
Paperless-ngx transforms a chaotic pile of paper into a searchable, organized digital archive that you fully control. It costs nothing to run, respects your privacy by design, and integrates seamlessly into any self-hosted infrastructure. Whether you are a homeowner archiving decades of financial records or a small business managing contracts and compliance documents, Paperless-ngx provides enterprise-grade document management without the enterprise price tag.
The combination of automatic OCR, machine learning classification, email consumption, and workflow automation means that once set up, your document archive practically manages itself. Add the robust backup strategy and API access, and you have a future-proof solution that grows with your needs.
Start with the Docker Compose deployment above, feed it your first batch of documents, and watch as years of paper chaos transform into an organized, searchable digital library — all running on your own hardware, under your own control.
Frequently Asked Questions (FAQ)
Which one should I choose in 2026?
The best choice depends on your specific requirements:
- For beginners: Start with the simplest option that covers your core use case
- For production: Choose the solution with the most active community and documentation
- For teams: Look for collaboration features and user management
- For privacy: Prefer fully open-source, self-hosted options with no telemetry
Refer to the comparison table above for detailed feature breakdowns.
Can I migrate between these tools?
Most tools support data import/export. Always:
- Backup your current data
- Test the migration on a staging environment
- Check official migration guides in the documentation
Are there free versions available?
All tools in this guide offer free, open-source editions. Some also provide paid plans with additional features, priority support, or managed hosting.
How do I get started?
- Review the comparison table to identify your requirements
- Visit the official documentation (links provided above)
- Start with a Docker Compose setup for easy testing
- Join the community forums for troubleshooting