Self-Hosted NMR Spectroscopy Data Analysis: nmrglue vs nmrPy vs POKY vs PyNMRSTAR

Introduction

Nuclear Magnetic Resonance (NMR) spectroscopy is one of the most powerful analytical techniques in chemistry and structural biology. From determining protein structures in solution to identifying metabolites in biofluids and verifying the purity of synthetic compounds, NMR provides atom-level structural information that no other single technique can match.

The raw data from an NMR spectrometer — a Free Induction Decay (FID) signal — requires extensive processing before it yields interpretable spectra: Fourier transformation, phase correction, baseline correction, peak picking, and resonance assignment. Commercial software (TopSpin, MestreNova, ACD/Labs) costs $5,000-$15,000 per license and ties your data to proprietary formats.

The open-source ecosystem now offers complete Python-based NMR processing pipelines suitable for deployment on shared lab servers. This guide compares four leading platforms: nmrglue, nmrPy, POKY, and PyNMRSTAR.

Tool Overview

Tool	Language	Stars	Focus	Best For
nmrglue	Python	267+	Universal NMR data I/O + processing	Reading/writing any vendor format, building custom pipelines
nmrPy	Python	38+	Multi-dimensional NMR processing	2D/3D/4D heteronuclear experiments
POKY	Python/JS	11+	Interactive NMR assignment GUI	Visual resonance assignment with web-based interface
PyNMRSTAR	Python	30+	BMRB NMR-STAR format handling	Depositing and retrieving data from biological magnetic resonance databank

nmrglue: The Universal NMR Data Bridge

nmrglue is the foundational Python library for NMR data processing — analogous to what NumPy is for numerical computing. It reads virtually every vendor’s raw NMR data format (Bruker, Varian/Agilent, JEOL, NMRPipe) and provides a consistent, NumPy-array-based interface for spectral processing.

Key Features:

Read/write support for 20+ NMR data formats
Full processing pipeline: apodization, zero-filling, Fourier transform, phasing
Interactive processing with IPython/Jupyter integration
Peak picking and integration utilities
Native support for multi-dimensional experiments (2D, 3D, 4D)

Docker Deployment for Lab-Wide Access:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
version: "3.8"
services:
  nmr-jupyter:
    image: jupyter/scipy-notebook:latest
    container_name: nmr-server
    ports:
      - "8888:8888"
    volumes:
      - ./nmr_data:/home/jovyan/data
      - ./nmr_notebooks:/home/jovyan/work
    environment:
      - JUPYTER_TOKEN=lab-access-token
    command: start-notebook.sh --NotebookApp.token='lab-access-token'

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Install nmrglue and dependencies
pip install nmrglue numpy scipy matplotlib

# Quick test: read a Bruker dataset
python3 -c "
import nmrglue as ng
dic, data = ng.bruker.read('/data/bruker_exp1')
print(f'Data shape: {data.shape}')
print(f'Spectral width: {dic["acqus"]["SW"]} Hz')
"

Processing Pipeline Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import nmrglue as ng
import numpy as np

# Read Bruker 1D proton spectrum
dic, data = ng.bruker.read('data/1H_experiment/')
print(f"Raw FID shape: {data.shape}")

# Processing
data = ng.proc_base.di(data)           # Remove digital filter
data = ng.proc_base.zf_size(data, 65536)  # Zero-fill to 64K
data = ng.proc_base.fft(data)          # Fourier transform
data = ng.proc_autophase.autops(data, 'acme')  # Automatic phase correction
data = ng.proc_bl.baseline_corrector(data)     # Baseline correction

# Peak picking
peaks = ng.peakpick.pick(data, 5000)
print(f"Found {len(peaks)} peaks")

# Export to CSV for further analysis
np.savetxt('processed_spectrum.csv', np.column_stack([peaks['X_AXIS'], peaks['Y_AXIS']]),
           delimiter=',', header='ppm,intensity')

nmrPy: Multi-Dimensional NMR Processing

While nmrglue provides the building blocks, nmrPy specializes in high-dimensional heteronuclear NMR experiments (HSQC, HMBC, NOESY, TOCSY) common in metabolomics and natural product research. It implements non-uniform sampling (NUS) reconstruction, a technique that dramatically reduces experiment time (from days to hours) by acquiring only 20-30% of the data points and reconstructing the full spectrum algorithmically.

Key Capabilities:

SMILE (Sparse Multidimensional Iterative Lineshape-Enhanced) reconstruction for NUS data
Native support for Bruker and Varian pulse sequence metadata
Automated phasing for 2D/3D spectra
Statistical total correlation spectroscopy (STOCSY) for metabolomics

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import nmrpy

# Load 2D HSQC with non-uniform sampling
processor = nmrpy.Processor2D('hsqc_nus.ft2')

# Apply SMILE reconstruction (NUS → full spectrum)
processor.apply_nus_reconstruction(method='smile', sparsity=0.25)
processor.phase_correct()
processor.baseline_correct()

# Export for viewing
processor.export_sparky('hsqc_reconstructed.ucsf')

POKY: Interactive Web-Based NMR Assignment

POKY (formerly known as the Poky NMR Suite) takes a fundamentally different approach — instead of a Python library, it provides a web-based graphical interface for interactive resonance assignment. Built on the NMRFAM-SPARKY foundation with modern web technologies, POKY is deployable as a web application that multiple researchers can access simultaneously.

Docker Deployment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
version: "3.8"
services:
  poky:
    image: pokynmr/poky:latest
    container_name: poky-server
    ports:
      - "8080:8080"
    volumes:
      - ./poky_projects:/data/projects
      - ./nmr_spectra:/data/spectra
    environment:
      - POKY_DATA_DIR=/data
    restart: unless-stopped

POKY’s web-based interface eliminates the need for each researcher to install NMR processing software locally. Students and collaborators can perform resonance assignments through a browser, with all data and progress stored on the central server.

PyNMRSTAR: BMRB Data Exchange

The Biological Magnetic Resonance Data Bank (BMRB) is the global repository for NMR experimental data — chemical shift assignments, coupling constants, relaxation data, and derived structural constraints. PyNMRSTAR provides programmatic access to this ecosystem through the NMR-STAR data format.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import pynmrstar

# Read a BMRB entry
entry = pynmrstar.Entry.from_file('bmr4931.str')

# Extract chemical shifts
chem_shifts = entry.get_saveframes_by_category('assigned_chemical_shifts')
for shift in chem_shifts[0].tag_prefix_loop():
    print(f"Atom: {shift['Atom']}, Shift: {shift['Val']} ppm")

# Create a new BMRB deposition
new_entry = pynmrstar.Entry()
saveframe = pynmrstar.Saveframe.from_template('assigned_chemical_shifts')
saveframe.add_tag('Sample_condition', '298K, pH 7.0')
new_entry.add_saveframe(saveframe)
new_entry.write_to_file('my_deposition.str')

This is essential for labs that regularly deposit structures or reference existing assignments. Automated deposition pipelines can extract assignments from POKY or nmrglue workflows and format them for direct submission to BMRB.

Why Self-Host Your NMR Processing Pipeline?

Centralizing NMR data processing on a lab server transforms research workflows. First, standardization — every lab member uses the same processing scripts and parameters, eliminating the “different PC, different spectrum” variability that plagues multi-user NMR facilities. Second, collaboration — a web-accessible JupyterHub or POKY instance lets collaborators at other institutions process and analyze data without installing anything.

Third, audit trails — server-based processing with version-controlled scripts (in Git) creates a complete, reproducible record of every processing step applied to raw FID data. This is increasingly required by journals and funding agencies. Fourth, cost efficiency — replacing 10+ individual MestreNova or TopSpin licenses with a single server deployment saves $50,000-$150,000 per year.

For related analytical chemistry workflows, see our mass spectrometry and proteomics guide. For complementary structural biology tools, our molecular visualization guide covers 3D structure viewers. And for electrochemical characterization, check our EIS spectroscopy guide.

Deployment Architecture for Multi-User NMR Labs

A production NMR data server serving 10-20 researchers benefits from a more robust architecture than a single Jupyter container. The recommended stack combines JupyterHub (for multi-user authentication), nmrglue/nmrPy (for processing), and a shared network filesystem (for raw spectrometer data):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
version: "3.8"
services:
  jupyterhub:
    image: jupyterhub/jupyterhub:latest
    ports:
      - "443:443"
    volumes:
      - ./jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - DOCKER_JUPYTER_IMAGE=jupyter/scipy-notebook:latest
    restart: unless-stopped

  spectra-nfs:
    image: itsthenetwork/nfs-server-alpine:latest
    container_name: nfs-spectra
    volumes:
      - ./spectrometer_data:/nfsshare
    environment:
      - SHARED_DIRECTORY=/nfsshare
    cap_add:
      - SYS_ADMIN

This architecture supports 20+ concurrent Jupyter sessions with direct access to spectrometer data via the NFS mount. Typical processing throughput: a 2D ¹H-¹³C HSQC with 2048×256 points processes in 10-30 seconds on a modern server CPU. NUS reconstruction of sparsely sampled 3D experiments completes in 2-5 minutes using nmrPy’s SMILE algorithm.

Processing Throughput and Scaling for High-Volume NMR Facilities

NMR facilities generating 50-100 experiments per day need automated processing pipelines that can keep pace with data acquisition. The following benchmark data from a typical academic NMR center (Bruker 600 MHz, 16-core Linux server, 64 GB RAM) illustrates real-world throughput:

A standard 1D ¹H spectrum (65K points) processes through nmrglue’s full pipeline (apodization → zero-fill → FT → phase → baseline → peak pick) in approximately 0.3 seconds. At this rate, 200 spectra process in under one minute — fast enough to run in real-time as each FID file appears in the spectrometer data directory.

Two-dimensional experiments scale differently. A ¹H-¹³C HSQC (2048 × 256 points) takes 8-12 seconds through nmrglue, while nmrPy’s NUS reconstruction for the same experiment (acquired with 25% sampling density) takes 60-90 seconds but produces equivalent spectral quality to a fully sampled 4-hour acquisition. The time trade-off favors NUS acquisition plus server-side reconstruction for all but the most time-sensitive workflows.

Three-dimensional experiments (HNCO, HNCA, CBCA(CO)NH) with full sampling (128 × 64 × 1024 points) require 2-5 minutes per spectrum. With NUS (20% sampling), nmrPy reconstruction completes in 3-8 minutes while reducing spectrometer time from 3 days to 14 hours — a dramatic improvement that makes 3D protein assignment experiments practical on shared instruments.

For groups running automated fragment-based screening (200-500 1D spectra per day), a containerized nmrglue pipeline with inotify-based file watching processes each spectrum within 2 seconds of the FID appearing on disk. The system logs processing parameters, peak lists, and QC metrics to a SQLite database, enabling retrospective analysis of months of screening data with a single query. This automated approach eliminates the bottleneck of manual processing and has been adopted by several fragment-based drug discovery groups.

FAQ

Q: Can nmrglue handle data from my specific spectrometer vendor?

Almost certainly yes. nmrglue supports Bruker (all formats including JCAMP-DX), Varian/Agilent (VnmrJ, FID), JEOL (Delta, Alice), NMRPipe, Sparky, SIMPSON, and several others. Even niche vendors like Tecmag and Magritek are supported through the generic binary readers.

Q: Do I need to know Python programming to use these tools?

For basic spectral processing (FT, phase, baseline), nmrglue provides Jupyter notebooks that require minimal Python knowledge. POKY offers a graphical interface requiring zero programming. Advanced custom processing pipelines do require Python, but the NMR community has extensive example notebooks available.

Q: How do these tools compare to MestreNova or TopSpin for routine analysis?

For routine 1D processing and integration, nmrglue + Jupyter achieves comparable results but requires more initial setup. For 2D/3D experiments, nmrPy’s NUS reconstruction often produces better spectra from sparse data than commercial tools’ built-in algorithms. The primary trade-off is convenience vs. reproducibility and cost.

Q: Can I process solid-state NMR data with these tools?

Yes, nmrglue handles solid-state NMR data from Bruker TopSpin (including spinning sideband manifolds) and SIMPSON simulation output. Specialized solid-state processing (MAT, PASS, CSA recoupling) requires custom scripts but the underlying data I/O is fully supported.

Q: What about metabolite identification in complex mixtures?

For metabolomics applications, nmrglue provides peak alignment and binning utilities that feed into statistical analysis packages. The combination of nmrPy for NUS-accelerated 2D acquisition with Chenomx or Bayesil for metabolite identification covers the complete metabolomics workflow.

💰 想测试你的市场判断力？我用 Polymarket 做预测市场交易——这是全球最大的预测市场平台，从大选结果到技术监管时间线，什么都可以押注。和赌博不同，这是真正的信息市场：你懂的信息越多，胜率越高。我靠预测技术相关事件的走向已经赚了不少。用我的邀请链接注册：Polymarket.com

Introduction

Tool Overview

nmrglue: The Universal NMR Data Bridge

nmrPy: Multi-Dimensional NMR Processing

POKY: Interactive Web-Based NMR Assignment

PyNMRSTAR: BMRB Data Exchange

Why Self-Host Your NMR Processing Pipeline?

Deployment Architecture for Multi-User NMR Labs

Processing Throughput and Scaling for High-Volume NMR Facilities

FAQ

Related Posts

Self-Hosted Spectroscopic Analysis: Open-Source Raman, FTIR & Spectral Processing Tools

Self-Hosted Chemical Kinetics Simulation: Cantera vs RMG vs pyJac vs ChemKED

Self-Hosted Electrochemical Impedance Spectroscopy: impedance.py vs PyEIS vs DearEIS vs AutoEIS