Managing cloud infrastructure across AWS, Azure, and GCP becomes increasingly complex as organizations scale. Without a centralized inventory, security teams struggle to answer basic questions: How many S3 buckets are publicly accessible? Which IAM roles have overly permissive policies? What resources are orphaned after a project ends?
Three open-source tools have emerged to solve this problem by letting you query your entire cloud estate using familiar interfaces: SQL and graph databases. CloudQuery transforms cloud APIs into SQL-queryable tables, Steampipe provides real-time virtual database access, and Cartography builds a Neo4j knowledge graph of infrastructure relationships.
This guide compares all three tools in depth, including Docker deployment configs, performance benchmarks, and use-case recommendations.
Why Self-Hosted Cloud Infrastructure Querying Matters
Cloud provider consoles give you a resource list, but they do not let you:
- Correlate resources across providers — finding which EC2 instance connects to which RDS database across accounts
- Query historical snapshots — understanding what changed between last week and today
- Build custom compliance checks — detecting unencrypted databases, missing tags, or overly permissive security groups
- Map attack paths — understanding how a compromised resource could lead to privilege escalation
Self-hosted tools keep your infrastructure metadata within your own network, avoiding data exfiltration to third-party SaaS platforms. This matters for organizations with regulatory requirements (SOC 2, HIPAA, FedRAMP) that restrict where asset inventory data can reside.
CloudQuery: ETL Pipeline for Cloud Data
CloudQuery (6,387 stars, actively maintained) treats cloud APIs as data sources and syncs them into databases you control. It extracts cloud resource configurations, transforms them into normalized schemas, and loads them into PostgreSQL, DuckDB, or other destinations.
Architecture
CloudQuery operates as an ETL (Extract, Transform, Load) pipeline:
- Source plugins fetch data from AWS, Azure, GCP, GitHub, Okta, and 70+ other APIs
- Transform layer normalizes data into consistent schemas
- Destination plugins write to PostgreSQL, DuckDB, BigQuery, Snowflake, or local files
This architecture means your data is stored locally and queryable at any time, even when offline.
Docker Deployment
CloudQuery provides official Docker images. Here is a minimal setup with DuckDB as the destination:
| |
And a more complete setup with PostgreSQL:
| |
Querying Your Data
Once synced, you run standard SQL against your local database:
| |
Steampipe: Real-Time Cloud Querying via SQL
Steampipe (7,796 stars, actively maintained) takes a fundamentally different approach. Instead of syncing data to a database, it creates a virtual database using PostgreSQL Foreign Data Wrappers (FDW). When you run a query, Steampipe calls the cloud API in real-time and returns results as if they were in a local table.
Architecture
Steampipe’s FDW-based architecture means:
- No data storage — queries hit live APIs, so results are always current
- Zero sync delay — no waiting for ETL pipelines to complete
- Lower disk usage — no local database required
- Higher API usage — each query makes API calls, which may hit rate limits
This makes Steampipe ideal for ad-hoc investigation and interactive exploration, while CloudQuery is better for scheduled compliance checks and historical analysis.
Docker Deployment
| |
For a query-ready setup with dashboard support:
| |
Interactive Querying
Steampipe launches an interactive SQL shell connected to your cloud APIs:
| |
| |
Steampipe Mods
Steampipe’s unique “mod” system provides pre-built dashboards and compliance benchmarks:
| |
Cartography: Infrastructure Relationship Graph
Cartography (3,848 stars, CNCF project, actively maintained) takes a third approach: it ingests cloud resource data into a Neo4j graph database, enabling you to explore relationships between resources.
While CloudQuery and Steampipe answer “what resources exist,” Cartography answers “how are resources connected?” This is critical for security analysis, where understanding the path from a public-facing asset to a sensitive database reveals attack vectors.
Architecture
Cartography syncs data from multiple sources (AWS, GCP, Azure, GitHub, Okta, Duo, Jamf) into Neo4j. Each resource becomes a node, and relationships (e.g., INSTANCE → belongs to → VPC → contains → SUBNET) become edges.
The graph model enables queries that are difficult or impossible with flat SQL tables:
- Find the shortest path from a public IP to a database
- Identify all resources accessible from a compromised IAM role
- Map the blast radius of a security group rule change
Docker Compose Deployment
Cartography provides an official docker-compose.yml on the master branch that deploys both Neo4j and Cartography:
| |
Querying the Graph
Cartography uses Cypher, Neo4j’s graph query language:
| |
Neo4j Browser Access
Access the Neo4j Browser at http://localhost:7474 to visually explore your infrastructure graph. Nodes are color-coded by resource type, and edges show relationships. This visual exploration is impossible with flat SQL databases.
Comparison Table
| Feature | CloudQuery | Steampipe | Cartography |
|---|---|---|---|
| Query Language | SQL | SQL | Cypher (graph) |
| Data Model | Relational tables | Virtual tables (FDW) | Graph (Neo4j) |
| Data Storage | PostgreSQL, DuckDB, etc. | None (real-time API) | Neo4j database |
| Multi-Cloud | AWS, Azure, GCP, 70+ sources | AWS, Azure, GCP, SaaS | AWS, GCP, Azure, GitHub, Okta |
| Sync Required | Yes (ETL pipeline) | No (real-time) | Yes (graph import) |
| Relationship Queries | Via JOINs | Via JOINs | Native (graph traversal) |
| Dashboard | Yes (CloudQuery UI) | Yes (Steampipe dashboards) | Neo4j Browser |
| Compliance Checks | SQL-based | Built-in mods (CIS, NIST) | Custom Cypher queries |
| Docker Image | ghcr.io/cloudquery/cloudquery | turbot/steampipe | ghcr.io/cartography-cncf/cartography |
| GitHub Stars | 6,387 | 7,796 | 3,848 |
| Language | Go | Go | Python |
| Best For | Scheduled audits, historical data | Ad-hoc exploration, live queries | Security analysis, relationship mapping |
Performance and Cost Considerations
Sync Speed
CloudQuery and Cartography must sync all resources before querying. For a typical AWS account with ~5,000 resources:
- CloudQuery: 2-5 minutes for full sync to PostgreSQL
- Cartography: 3-8 minutes for full sync to Neo4j
- Steampipe: Instant (but each query makes API calls)
API Rate Limits
Steampipe’s real-time approach means every query hits cloud APIs. For large organizations, this can trigger rate limiting. CloudQuery and Cartography mitigate this by batching syncs during off-peak hours.
Storage Requirements
- CloudQuery + PostgreSQL: ~500MB-2GB per cloud account (depends on resource count)
- Cartography + Neo4j: ~1-5GB per cloud account (graph storage is larger due to relationship edges)
- Steampipe: Near zero (no local storage)
Choosing the Right Tool
Use CloudQuery When
- You need scheduled compliance reports that run automatically
- You want historical data for trend analysis and change tracking
- You prefer SQL and already have a PostgreSQL or DuckDB setup
- You need to sync from 70+ sources beyond just cloud providers
Use Steampipe When
- You need instant answers without waiting for sync jobs
- You want interactive exploration of your infrastructure
- You value pre-built compliance mods (CIS, NIST, PCI-DSS)
- Your team is small enough that API rate limits are not a concern
Use Cartography When
- You need to understand relationships between resources
- You are performing security analysis or threat modeling
- You want to visualize your infrastructure as a graph
- You need to answer “what happens if this resource is compromised?”
For most organizations, the ideal setup combines CloudQuery for scheduled audits and Cartography for security analysis. Steampipe serves as a complementary tool for quick, ad-hoc queries.
Installation Commands
CloudQuery (Linux/macOS)
| |
Steampipe (Linux/macOS)
| |
Cartography (Python/pip)
| |
For related reading, see our cloud security audit guide with Prowler vs Scout Suite and container image scanning with Trivy vs Grype.
FAQ
What is the difference between CloudQuery and Steampipe?
CloudQuery uses an ETL pipeline to sync cloud data into a local database (PostgreSQL or DuckDB), enabling scheduled queries and historical analysis. Steampipe uses PostgreSQL Foreign Data Wrappers to query cloud APIs in real-time without storing any data locally. CloudQuery is better for compliance automation; Steampipe is better for interactive exploration.
Can I use Cartography without Neo4j?
No. Cartography is built specifically for Neo4j and uses Cypher queries. The graph database is core to its value proposition — relationship mapping and attack path analysis require a graph data model. However, Neo4j Community Edition is free and open-source, so there is no licensing cost.
Do these tools support Azure and GCP in addition to AWS?
Yes. All three tools support AWS, Azure, and GCP. CloudQuery has the broadest support with 70+ source plugins including GitHub, Okta, Kubernetes, and Terraform Cloud. Steampipe has plugins for AWS, Azure, GCP, Slack, Jira, and more. Cartography supports AWS, GCP, Azure, GitHub, Okta, Duo, and Jamf.
How often should I sync cloud infrastructure data?
For CloudQuery and Cartography, daily syncs are recommended for most organizations. High-security environments may benefit from hourly syncs. Steampipe does not require syncing since it queries APIs in real-time, but frequent interactive queries may hit cloud provider rate limits.
Are these tools free to use?
Yes, all three are open-source and free. CloudQuery is Apache 2.0 licensed, Steampipe is Apache 2.0, and Cartography is Apache 2.0 under the CNCF. All costs are limited to the infrastructure required to run the databases (PostgreSQL, Neo4j) and the cloud API calls themselves.
Can I run these tools on-premises without cloud access?
CloudQuery and Cartography require initial cloud access to sync data. Once synced, you can query the local database without cloud connectivity. Steampipe requires ongoing cloud API access for every query since it does not store data locally.