Introduction
Rate limiting is a critical defense mechanism for web applications. Without it, your API endpoints are vulnerable to brute-force attacks, credential stuffing, resource exhaustion, and runaway automation. A well-implemented rate limiter protects your infrastructure while ensuring fair access for legitimate users.
Python’s web ecosystem offers several purpose-built rate limiting libraries, each with different backends, integration patterns, and throttling strategies. This guide compares five leading solutions: limits (the flexible core library), slowapi (FastAPI/Starlette integration), Flask-Limiter (Flask-specific), Django Ratelimit (Django decorators), and ratelimit (the lightweight decorator).
Comparison Table
| Feature | limits | slowapi | Flask-Limiter | django-ratelimit | ratelimit |
|---|---|---|---|---|---|
| Framework | Agnostic | FastAPI/Starlette | Flask | Django | Agnostic |
| Backends | Redis, Memcached, MongoDB, In-Memory | Inherits from limits | Redis, Memcached, In-Memory, DynamoDB | Cache framework | Time-based |
| Strategies | Fixed Window, Sliding Window, Token Bucket, Leaky Bucket | Same as limits | Fixed Window, Moving Window | Fixed Window, Per-IP, Per-User | Fixed Window (time-based) |
| GitHub Stars | ~4,500 | ~2,000 | ~1,100 | ~1,000 | ~300 |
| Per-Route Config | Yes (via decorator) | Yes | Yes | Yes | Yes |
| Custom Key Func | Yes | Yes | Yes | Yes | No |
| Headers | Customizable | RateLimit-* headers | X-RateLimit-* headers | No | No |
| Cost-Based | Yes | Yes | Yes | No | No |
| Async Support | Yes | Yes | Limited | Async via 4.2+ | No |
| Best For | Custom integrations | FastAPI/ASGI apps | Flask APIs | Django projects | Simple scripts |
limits: The Core Rate Limiting Library
limits is the foundational library that powers most Python rate limiting implementations. It’s backend-agnostic, supporting Redis, Memcached, MongoDB, and in-memory storage, and implements all major rate limiting algorithms.
Installation with Redis backend:
| |
Basic Usage:
| |
limits supports four core algorithms:
- Fixed Window:
"100 per hour"— Simple counter resetting at interval boundaries - Sliding Window:
"100 per hour; moving window"— More accurate, prevents boundary spikes - Token Bucket:
"100 per hour; token bucket"— Allows bursts while enforcing average rate - Leaky Bucket: Sets a maximum processing rate, queuing excess requests
| |
slowapi: FastAPI Integration
slowapi builds on limits to provide seamless FastAPI and Starlette integration. It adds middleware-level rate limiting, automatic response headers, and per-endpoint configuration.
Installation:
| |
FastAPI Integration:
| |
slowapi automatically adds RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset headers to responses, making it easy for clients to respect rate limits without parsing API documentation.
Flask-Limiter: Flask-Specific Solution
Flask-Limiter provides idiomatic Flask integration with decorator-based rate limiting, configurable error responses, and built-in admin monitoring.
Installation:
| |
Configuration:
| |
Flask-Limiter also supports blueprints, conditional rate limiting, and shared limits across multiple routes:
| |
django-ratelimit: Django Native
django-ratelimit integrates deeply with Django’s request cycle, providing class-based views mixins, function-based view decorators, and admin integration.
Installation:
| |
Configuration (settings.py):
| |
Usage:
| |
ratelimit: Simple Decorator for Scripts
For simple scripts and CLI tools, ratelimit provides a zero-dependency decorator with time-based limits:
| |
| |
While simpler than the other options, ratelimit is ideal for data pipeline scripts, web scrapers, and any Python script that needs to respect external API rate limits.
Deployment Architecture
For production deployments, Redis is the recommended backend for all libraries. It provides atomic operations, persistence, and can be shared across multiple application instances for consistent rate limiting across a cluster.
Docker Compose setup with Redis:
| |
Why Self-Host Your Rate Limiting?
Self-hosting rate limiting gives you complete control over your API’s security posture. Unlike SaaS API management platforms that charge per request and store your traffic patterns on their infrastructure, self-hosted solutions run entirely within your environment. You own the data, control the algorithms, and pay only for the compute you use.
Rate limiting is also a critical complement to other Python development practices. Our guides on Python type checking, Python logging libraries, and Python ORM libraries together form a complete quality and security toolkit for Python web applications.
For profiling and performance optimization, see our Python profiling tools guide. Combined with rate limiting, you can build APIs that are both fast and resilient.
FAQ
Which rate limiting algorithm should I use?
Fixed Window is simplest but suffers from boundary spikes (all requests at 11:59 and 12:01 bypassing a “100/hour” limit). Sliding Window (Moving Window) eliminates boundary spikes and is the best default. Token Bucket is ideal when you want to allow bursts while maintaining an average rate. Use Token Bucket for APIs where occasional bursts are acceptable.
Why use Redis instead of in-memory storage?
In-memory storage works for single-process applications but fails in multi-worker or multi-server deployments. Two application servers behind a load balancer would each maintain separate counters, effectively doubling the allowed rate. Redis provides a single source of truth accessible by all instances.
How do I handle rate limiting for authenticated vs. anonymous users?
Use different key functions. For authenticated users, use the user ID as the key (more generous limits). For anonymous users, use the IP address (stricter limits). All five libraries support custom key functions. Example with slowapi:
| |
What headers should my API return?
Return X-RateLimit-Limit (total allowed), X-RateLimit-Remaining (calls left in window), X-RateLimit-Reset (Unix timestamp when window resets), and optionally Retry-After on 429 responses. slowapi and Flask-Limiter include these automatically. For custom implementations with limits, add them manually.
Can I use rate limiting for queue fairness?
Yes. Token Bucket algorithms are particularly good for queue fairness — each consumer gets a consistent token allocation regardless of other consumers’ activity. Combine with per-customer key functions to guarantee each customer gets their fair share of API capacity without being crowded out by heavier users.
💰 想测试你的市场判断力?我用 Polymarket 做预测市场交易——这是全球最大的预测市场平台,从大选结果到技术监管时间线,什么都可以押注。和赌博不同,这是真正的信息市场:你懂的信息越多,胜率越高。我靠预测技术相关事件的走向已经赚了不少。用我的邀请链接注册:Polymarket.com