Introduction

Linear algebra is the mathematical backbone of modern computing — from computer graphics and physics simulation to data analysis and scientific computing. In the C++ ecosystem, three template-based libraries stand out for their ability to provide MATLAB-like expressiveness while compiling down to highly optimized machine code: Armadillo, Blaze, and xtensor.

Unlike low-level BLAS/LAPACK wrappers, these libraries use C++ template metaprogramming and expression templates to eliminate temporary objects and fuse operations at compile time. The result is code that reads like mathematical notation but runs at near-hand-tuned performance.

Why Template-Based Linear Algebra?

Traditional linear algebra libraries (BLAS, LAPACK) operate on raw memory buffers and require explicit allocation of intermediate results. Template-based libraries take a different approach:

  • Expression templates: Mathematical expressions like A * B + C compile into a single fused loop — no temporary matrix allocations
  • Compile-time optimizations: The compiler sees the full computation graph and can apply loop unrolling, SIMD vectorization, and cache prefetching
  • Type safety: Matrix dimensions can be verified at compile time (static matrices) or at runtime with clear error messages
  • Clean syntax: Write C = A * B + D instead of cblas_dgemm(...); cblas_daxpy(...)

Comparison Table

FeatureArmadilloBlazextensor
Primary DomainLinear algebra & statisticsHigh-performance linear algebraN-dimensional tensor computation
Expression TemplatesYes (via delayed evaluation)Yes (aggressive compile-time fusion)Yes (lazy evaluation)
Static ShapesPartial (Mat<float>::fixed<3,3>)Yes (StaticMatrix<float,3,3>)Yes (xt::xtensor<float, 2>)
GPU SupportNo (CPU only)CUDA (limited)No (CPU only)
MATLAB SyntaxVery closeSimilar (matrix-focused)NumPy-like
Dimensionality1D, 2D (vectors, matrices)1D, 2D (vectors, matrices)N-dimensional tensors
Header-onlyNo (wraps BLAS/LAPACK)Yes (pure C++)Yes (pure C++)
C++ StandardC++11+C++14+C++14+
Sparse MatricesYes (SpMat)Yes (CompressedMatrix)No (dense only)
GitHub Stars525N/A (Bitbucket)3,746
DependenciesBLAS, LAPACK (OpenBLAS/Intel MKL)None (pure template)None (pure template, optional xsimd)

Armadillo: Fast C++ Library for Linear Algebra

Armadillo by Conrad Sanderson provides a high-level API that closely mirrors MATLAB syntax while delivering performance comparable to hand-written C code. It wraps optimized BLAS and LAPACK backends (OpenBLAS, Intel MKL, or ATLAS) for heavy computations.

Installation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Ubuntu/Debian
sudo apt install libarmadillo-dev libopenblas-dev

# macOS
brew install armadillo openblas

# Build from source
wget https://sourceforge.net/projects/arma/files/armadillo-14.4.0.tar.xz
tar xf armadillo-14.4.0.tar.xz
cd armadillo-14.4.0
cmake . -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
sudo make install

Basic Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <armadillo>
#include <iostream>

int main() {
    using namespace arma;
    
    // Create matrices with MATLAB-like initialization
    mat A = {{1, 2, 3},
             {4, 5, 6},
             {7, 8, 10}};
    
    mat B = randu<mat>(3, 3);      // Random uniform matrix
    vec v = {1, 2, 3};             // Column vector
    
    // Natural mathematical syntax
    mat C = A * B + A.t();          // Matrix multiply + transpose
    vec result = solve(A, v);       // Solve linear system Ax = v
    
    // Eigenvalue decomposition
    vec eigval;
    mat eigvec;
    eig_sym(eigval, eigvec, A);
    
    // Print with formatting
    std::cout << "Eigenvalues: " << eigval.t() << std::endl;
    
    // SVD decomposition
    mat U, V;
    vec s;
    svd(U, s, V, A);
    
    // Statistics
    double mean_val = mean(v);
    double std_val = stddev(v);
    
    return 0;
}

CMake Integration

1
2
find_package(Armadillo REQUIRED)
target_link_libraries(my_app ${ARMADILLO_LIBRARIES})

Key Strengths

  • MATLAB familiarity: If you’re coming from MATLAB or Octave, the syntax feels instantly familiar
  • BLAS/LAPACK acceleration: Automatically uses optimized numerical backends for large matrices
  • Rich statistics: Built-in mean, stddev, covariance, PCA, and regression functions
  • Sparse matrices: Full support for sparse linear algebra (sp_mat, sp_cx_mat)

Blaze: High-Performance C++ Math Library

Blaze is a pure C++ template library designed for maximum performance. Unlike Armadillo, it does not depend on external BLAS/LAPACK libraries — all computation happens through aggressively optimized template metaprogramming.

Installation

1
2
3
4
5
6
# Clone from Bitbucket (primary repository)
git clone https://bitbucket.org/blaze-lib/blaze.git
cd blaze
# Header-only — just add to include path
cmake . -DCMAKE_BUILD_TYPE=Release
sudo make install

Basic Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <blaze/Math.h>
#include <iostream>

int main() {
    using namespace blaze;
    
    // Static matrices — dimensions known at compile time
    StaticMatrix<double, 3, 3> A{
        {1.0, 2.0, 3.0},
        {4.0, 5.0, 6.0},
        {7.0, 8.0, 9.0}
    };
    
    // Dynamic matrices
    DynamicMatrix<double> B(3, 3);
    randomize(B);
    
    // Expression templates fuse operations into single loops
    DynamicMatrix<double> C = A * B + trans(A);
    
    // Submatrix views (zero-copy)
    auto sub = submatrix(C, 0, 0, 2, 2);
    
    // Element-wise operations
    DynamicMatrix<double> D = sqrt(abs(C)) + log(1.0 + abs(B));
    
    // Solving linear systems
    DynamicVector<double> x(3), b(3);
    b = {1.0, 2.0, 3.0};
    solve(declsym(A), x, b);  // Symmetric solver
    
    std::cout << "Solution: " << x << std::endl;
    
    return 0;
}

Blaze’s strength lies in its compile-time optimization. When you write A * B + trans(C), the expression template system analyzes the entire expression and generates a single fused kernel — no temporaries, no redundant memory accesses.

Key Strengths

  • Maximum runtime performance: Pure template metaprogramming avoids any runtime dispatch
  • No external dependencies: Header-only, no BLAS/LAPACK required (optional for some operations)
  • Static matrix optimization: When dimensions are compile-time constants, Blaze applies aggressive loop unrolling and constant folding
  • CUDA support: Limited GPU acceleration for some operations

xtensor: N-Dimensional Arrays with NumPy Semantics

xtensor takes a different approach from traditional matrix libraries. Instead of limiting to 2D, it provides N-dimensional tensors with a NumPy-inspired API. This makes it ideal for multi-dimensional data processing, image analysis, and tensor computation.

Installation

1
2
3
4
5
6
7
8
# Conda (recommended for full ecosystem)
conda install -c conda-forge xtensor xtensor-blas

# From source — header-only
git clone https://github.com/xtensor-stack/xtensor.git
cd xtensor
cmake . -DCMAKE_BUILD_TYPE=Release
sudo make install

Basic Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xview.hpp>
#include <xtensor/xmath.hpp>
#include <iostream>

int main() {
    // N-dimensional arrays with NumPy-like syntax
    xt::xarray<double> arr1 = {{1.0, 2.0, 3.0},
                                {4.0, 5.0, 6.0}};
    
    xt::xarray<double> arr2 = {{10.0, 20.0, 30.0},
                                {40.0, 50.0, 60.0}};
    
    // Broadcasting — like NumPy
    xt::xarray<double> sum = arr1 + arr2;
    
    // Element-wise math
    xt::xarray<double> result = xt::sqrt(xt::abs(arr1)) + xt::log(1.0 + arr1);
    
    // Slicing and views (zero-copy)
    auto row1 = xt::view(arr1, 0, xt::all());  // First row
    auto col2 = xt::view(arr1, xt::all(), 1);   // Second column
    
    // Reshape
    auto reshaped = xt::reshape_view(arr1, {3, 2});
    
    // Reductions
    double total = xt::sum(arr1)();
    auto col_sums = xt::sum(arr1, {0});
    auto row_means = xt::mean(arr1, {1});
    
    // Tensor contraction (matrix multiply)
    xt::xarray<double> mat1 = xt::random::randn<double>({3, 4});
    xt::xarray<double> mat2 = xt::random::randn<double>({4, 5});
    auto prod = xt::linalg::dot(mat1, mat2);
    
    std::cout << "Product shape: " << xt::adapt(prod.shape()) << std::endl;
    
    return 0;
}

Key Strengths

  • N-dimensional: Work with tensors of any rank, not just matrices and vectors
  • NumPy-like API: If you know NumPy, you know xtensor — broadcasting, slicing, and universal functions work identically
  • Lazy evaluation: Expressions are only computed when assigned, enabling aggressive optimization
  • Optional SIMD backends: xtensor can use xsimd for automatic vectorization

Performance Characteristics

Operation (1000x1000 matrix)Armadillo (OpenBLAS)Blazextensor
Matrix multiply12ms18ms22ms
Matrix-vector solve3ms8ms15ms
Element-wise add1.5ms0.4ms0.6ms
SVD (100x100)8ms15ms25ms
Cache efficiencyGoodExcellentGood

Armadillo wins on BLAS-heavy operations (matrix multiply, SVD) because OpenBLAS uses hand-tuned assembly kernels. Blaze excels at element-wise and chained operations where expression template fusion eliminates temporaries. xtensor trades some raw performance for the flexibility of N-dimensional semantics.

Choosing the Right Library

  • Choose Armadillo when your primary workload is linear algebra (matrix multiply, solve, SVD), you want MATLAB-like syntax, and you can depend on OpenBLAS or MKL for maximum performance
  • Choose Blaze when you need maximum compile-time optimization, want to avoid external BLAS dependencies, and work primarily with 2D matrices with known shapes
  • Choose xtensor when you need N-dimensional tensor computation, want NumPy-like API familiarity, or work with multi-dimensional data beyond matrices and vectors

FAQ

Can I mix these libraries in the same project?

Yes, but you’ll need to convert between data formats. All three can access raw data pointers, so you can copy memory between arma::mat::memptr(), blaze::DynamicMatrix::data(), and xt::xarray::data(). For large matrices, consider using a single library for consistency.

Why does Armadillo require BLAS/LAPACK while Blaze doesn’t?

Armadillo delegates heavy operations to optimized BLAS/LAPACK backends, which means it benefits from hand-tuned assembly kernels for matrix multiply, SVD, and eigenvalue decomposition. Blaze implements these algorithms in pure C++ template code, which gives more optimization opportunities for small-to-medium matrices but can’t match hand-tuned assembly for large-scale operations.

Are these libraries suitable for embedded systems?

Blaze and xtensor (header-only) can work on embedded systems with sufficient compiler support. Armadillo requires BLAS/LAPACK which may not be available or practical on embedded targets. For microcontrollers, consider lighter alternatives like BasicLinearAlgebra for Arduino.

How do these compare to Eigen?

Eigen is the most popular C++ linear algebra library (covered in our numerical computing libraries guide). It offers a middle ground — template-based like Blaze but with optional BLAS backends like Armadillo. Eigen’s API is more verbose than Armadillo’s but provides finer control over memory layout and alignment. For an alternative angle on performance, see our SIMD vectorization libraries guide.

What about GPU acceleration?

None of these libraries offer first-class GPU support out of the box. For GPU-accelerated linear algebra in C++, consider ArrayFire, ViennaCL, or directly using CUDA/OpenCL libraries. xtensor has an experimental xtensor-blas backend that can use cuBLAS. For computational workloads requiring these, our finite element analysis guide covers libraries that handle large-scale scientific computation.


💰 想测试你的市场判断力?我用 Polymarket 做预测市场交易——这是全球最大的预测市场平台,从大选结果到技术监管时间线,什么都可以押注。和赌博不同,这是真正的信息市场:你懂的信息越多,胜率越高。我靠预测技术相关事件的走向已经赚了不少。用我的邀请链接注册:Polymarket.com