Introduction

PCI Express (PCIe) is the backbone of modern server hardware, connecting CPUs to NVMe storage, GPUs, network cards, and storage controllers. Understanding your PCIe topology is essential for optimizing performance, troubleshooting hardware issues, and planning upgrades. This guide compares three essential Linux tools for PCIe management: lspci (from pciutils), lstopo (from hwloc), and the broader pciutils toolkit.

Tool Comparison

Featurelspci (pciutils)lstopo (hwloc)setpci (pciutils)
Primary PurposeList PCI devicesVisual topologyPCI configuration
Stars630+701+Bundled with pciutils
Output FormatText tree / verboseASCII/Graphical/XMLRegister read/write
PCIe Speed InfoVia -vvv flagsVia topology viewN/A
NUMA AwarenessNoYesNo
GPU TopologyLimitedFullNo
Installationapt install pciutilsapt install hwlocBundled
Ideal Use CaseQuick device listingSystem topology analysisLow-level debugging

lspci — The PCI Device Lister

lspci is the standard tool for enumerating PCI devices on Linux. It reads the PCI configuration space and presents device information in a human-readable format.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Install pciutils
sudo apt install pciutils

# Basic device listing
lspci

# Tree view showing PCIe topology
lspci -tv

# Detailed verbose output with bandwidth info
lspci -vvv -s 0000:01:00.0

# Show numeric IDs for scripting
lspci -nn

# Filter by device class
lspci -d ::0108  # NVMe devices only

The -vvv flag reveals crucial information: link speed (2.5, 5.0, 8.0, 16.0 GT/s), link width (x1, x4, x8, x16), and negotiated vs. maximum capabilities. This is invaluable for detecting bandwidth bottlenecks.

1
2
3
4
5
# Check PCIe link status
lspci -vvv | grep -E "LnkSta:|LnkCap:" | head -4
# Output example:
# LnkCap: Port #0, Speed 16GT/s, Width x16
# LnkSta: Speed 8GT/s, Width x8  (downgraded!)

lstopo — Hardware Topology Visualizer

lstopo from the hwloc project provides a comprehensive view of your system’s hardware topology, including PCIe interconnects, NUMA nodes, CPU caches, and I/O hubs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Install hwloc
sudo apt install hwloc

# ASCII topology view
lstopo

# Generate PNG topology diagram
lstopo --output-format png > topology.png

# XML output for programmatic analysis
lstopo --output-format xml > topology.xml

# Filter to I/O devices only
lstopo --filter io

# Show PCIe devices with bandwidth
lstopo --verbose

lstopo’s strength lies in showing the relationship between PCIe devices and the rest of the system. It reveals which NUMA node a PCIe slot connects to, helping optimize workloads for data locality.

1
2
# Check if your NVMe drive is on the correct NUMA node
lstopo --filter io | grep -A2 nvme

setpci — Low-Level PCI Configuration

setpci allows direct read/write access to PCI configuration registers. It’s a powerful but potentially dangerous tool for advanced debugging and tuning.

1
2
3
4
5
6
7
8
# Read PCIe capability pointer (offset 0x34)
sudo setpci -s 0000:01:00.0 34.w

# Read device and vendor IDs
sudo setpci -s 0000:01:00.0 00.l

# Enable bus mastering on a device
sudo setpci -s 0000:01:00.0 04.w=0x07

Use setpci with caution. Incorrect register writes can destabilize your system or cause data corruption. Always consult the PCI specification and your device documentation before modifying registers.

Common PCIe Troubleshooting Scenarios

A common issue in multi-GPU servers is cards negotiating lower link widths due to slot sharing or PCIe bifurcation issues.

1
2
3
4
5
6
# Check current link status
GPU_BDF=$(lspci | grep -i "vga\|3d" | head -1 | awk '{print $1}')
sudo lspci -vvv -s $GPU_BDF | grep -E "LnkSta:|LnkCap:"

# Compare with lstopo to find the physical slot
lstopo --filter io | grep -B5 $GPU_BDF

Scenario 2: NVMe Drive Not at Full Speed

NVMe drives should connect at Gen3 x4 or Gen4 x4. A lower link speed or width indicates a problem.

1
2
3
4
5
# Check all NVMe devices
for dev in $(lspci -d ::0108 | awk '{print $1}'); do
    echo "=== $dev ==="
    sudo lspci -vvv -s $dev | grep -E "LnkSta:"
done

Scenario 3: Verifying PCIe ACS/IOMMU Groups

For PCIe passthrough (VFIO), checking IOMMU groups is critical.

1
2
3
4
5
6
# List devices with their IOMMU groups
for d in /sys/kernel/iommu_groups/*/devices/*; do
    n=${d#*/iommu_groups/*}; n=${n%%/*}
    printf "IOMMU Group %s: %s
" "$n" "$(lspci -nns ${d##*/})"
done

Why Self-Host Your Hardware Monitoring?

Managing PCIe topology and hardware monitoring on your own servers gives you complete visibility into your infrastructure’s performance characteristics. Unlike cloud providers where hardware details are abstracted away, self-hosting allows you to optimize PCIe bandwidth allocation, detect hardware degradation early, and maximize the return on your hardware investment. When you control the hardware layer, you can diagnose issues that cloud monitoring tools simply cannot reach. For deeper hardware inventory management, see our guide on Linux hardware inventory tools. If you’re dealing with hardware-level error detection, our hardware error monitoring guide covers MCElog, rasdaemon, and EDAC. For storage-specific monitoring, check our disk health monitoring comparison.

PCIe Bandwidth Troubleshooting and Optimization

PCIe bandwidth problems are among the most common performance bottlenecks in self-hosted servers. Understanding how to detect and fix them can dramatically improve storage and network throughput.

Detecting Bandwidth Bottlenecks

The first sign of a PCIe bottleneck is usually a device performing below its rated speed. An NVMe drive rated for 7,000 MB/s that tops out at 1,800 MB/s, or a 40GbE NIC that cannot exceed 20Gbps, strongly suggests a PCIe link issue.

1
2
3
4
5
# Check NVMe throughput against PCIe link speed
sudo fio --name=test --filename=/dev/nvme0n1 --direct=1 --rw=read --bs=1M --numjobs=1 --iodepth=32 --runtime=10 --time_based --group_reporting | grep "READ:"

# Compare with theoretical max for your link width
# Gen3 x4 = ~3,940 MB/s, Gen4 x4 = ~7,880 MB/s, Gen3 x1 = ~985 MB/s

Physical slot issues: Dust, improper seating, or damaged PCIe fingers can cause negotiation to fall back to lower speeds or widths. Reseating the card often resolves the issue.

PCIe lane sharing: Many motherboards share PCIe lanes between slots, M.2 connectors, and SATA ports. Plugging in an M.2 drive might disable a PCIe slot or halve its bandwidth. Consult your motherboard manual for the PCIe lane allocation table.

Power management: PCIe Active State Power Management (ASPM) can sometimes cause link instability or prevent negotiation at maximum speed. Disable it temporarily to test:

1
2
3
4
# Check current ASPM policy
cat /sys/module/pcie_aspm/parameters/policy
# Disable for testing
echo performance | sudo tee /sys/module/pcie_aspm/parameters/policy

Firmware bugs: Outdated BIOS/UEFI firmware can cause PCIe enumeration issues, especially on newer platforms with PCIe Gen4 and Gen5 support. Always update to the latest firmware available from your motherboard vendor.

Optimizing PCIe Topology for NUMA

On multi-socket servers, PCIe slots are attached to specific NUMA nodes. Running a workload on CPU 0 while its NVMe drive is attached to CPU 1 forces data to cross the inter-socket link (UPI/Infinity Fabric), adding latency and reducing throughput.

1
2
3
4
5
6
7
# Check NUMA node for each device
for dev in /sys/bus/pci/devices/0000:*/numa_node; do
    bdf=$(echo $dev | grep -oP "0000:[0-9a-f:.]+")
    node=$(cat $dev 2>/dev/null)
    name=$(lspci -s $bdf | cut -d" " -f2-)
    echo "NUMA $node: $bdf - $name"
done | sort -n

Use numactl or taskset to pin workloads to the correct NUMA node for best performance. For additional guidance on CPU affinity and NUMA optimization, our hardware inventory guide covers related monitoring techniques.

FAQ

Use lspci -vvv and look for the LnkSta (Link Status) field. It shows the negotiated speed in GT/s and the link width (x1, x4, x8, x16). Compare this with LnkCap (Link Capability) to see if your device is running at full speed. For example, if LnkCap shows Speed 16GT/s, Width x16 but LnkSta shows Speed 8GT/s, Width x8, your device is running at half the possible bandwidth.

What’s the difference between lspci and lstopo?

lspci focuses exclusively on PCI devices — it shows device IDs, vendor names, kernel drivers, and configuration space details. lstopo provides a holistic hardware topology view that includes PCIe devices in context with CPUs, NUMA nodes, caches, and memory controllers. Use lspci for quick device info checks and lstopo when you need to understand how PCIe devices relate to the rest of the system architecture.

Can lspci detect faulty hardware?

Yes, it can help identify issues. A device that shows !!! Unknown header type or reports LnkSta: Speed 2.5GT/s (downgraded) when it should be at 16GT/s indicates a hardware problem. Also, check for UESta (Uncorrectable Error Status) and CESta (Correctable Error Status) in the Advanced Error Reporting (AER) section. A high count of correctable errors often precedes hardware failure.

How do I map an lspci device to a physical slot?

Use lstopo --filter io to see the PCIe topology and identify which physical slots correspond to which bus addresses. Alternatively, lspci -tv shows the tree structure of PCI bridges and devices. For server motherboards, you can also check /sys/bus/pci/slots/ for slot-to-address mappings.

Is it safe to use setpci on a production server?

Only if you know exactly what you’re doing. Reading PCI registers is always safe, but writing to registers with setpci can cause system instability, data loss, or hardware damage. Always test changes on non-production hardware first, document the register you’re modifying, and verify the values against the PCI specification or device datasheet.


💰 想测试你的市场判断力?我用 Polymarket 做预测市场交易——这是全球最大的预测市场平台,从大选结果到技术监管时间线,什么都可以押注。和赌博不同,这是真正的信息市场:你懂的信息越多,胜率越高。我靠预测技术相关事件的走向已经赚了不少。用我的邀请链接注册:Polymarket.com