Parallelizing Pauli Paths (P3)
We will implement a parallelized version of the Pauli path algorithm for quantum circuit simulation, focusing on parallelizing the computation over the number of Pauli words. The goal is to accelerate quantum circuit simulation by performing independent Pauli word computations concurrently on NVIDIA GPUs. We will use CUDA programming to exploit GPU parallelism and optimize the simulation for larger quantum systems.
In quantum computing, a central task is to estimate expectation values for an observable \( \hat{H} \) given a quantum circuit that prepares a quantum state \( \lvert \psi \rangle \). Specifically, for a quantum system that evolves under a quantum circuit \( U \), the expectation value of an observable \( \hat{H} \) is given by:
\[ \langle \hat{H} \rangle = \langle \psi \lvert \hat{H} \lvert \psi \rangle = \langle \psi_0 \lvert U^\dagger \hat{H} U \lvert \psi_0 \rangle \]
This expectation value is fundamental in various quantum algorithms and simulations, such as in quantum chemistry for determining energy levels, or in general quantum information processing tasks.
However, simulating quantum systems and computing such expectation values classically is not easy. The quantum state of a system grows exponentially with the number of qubits, making direct simulation of large systems infeasible. Thus, methods that exploit the structure of quantum circuits and observables are essential for efficient simulation.
The most direct and intuitive classical simulation method for quantum circuits is statevector simulation. In this approach, the quantum state \( \lvert \psi \rangle \) is represented as a vector in a complex vector space, and quantum gates are applied directly to this state. The evolution of the system is tracked by applying the corresponding unitary matrices of the quantum gates to the statevector. This can be mathematically described as:
\[ \lvert \psi' \rangle = U \lvert \psi \rangle \]
where \( U \) is the unitary matrix corresponding to a gate in the circuit.
While statevector simulation works well for small systems, it faces exponential scaling with the number of qubits. For a system of \( n \) qubits, the quantum statevector has \( 2^n \) complex amplitudes. As the number of qubits grows, the computational resources required to store and manipulate the statevector grow exponentially, making it infeasible for larger systems.
In quantum simulations, a more scalable approach for simulating circuits and computing expectation values is to represent the observable \( \hat{H} \) as a sum of Pauli words. Pauli words are tensor products of Pauli matrices \( \{ I, X, Y, Z \} \), where each matrix acts on one or more qubits. As such, an observable can be written as:
\[ \hat{H} = \sum_i c_i P_i \]
Here, \( P_i \) represents a Pauli word (such as \( X \otimes Z \otimes I \)), and \( c_i \) is the corresponding coefficient. Pauli words are particularly useful because they provide a structured and efficient way to represent the observable in quantum simulations. In practice, most quantum systems are modeled using observables that are sums of Pauli words.
In the Schrödinger picture, we track the evolution of the quantum state under the action of quantum gates. However, in the Heisenberg picture, the state remains fixed, and instead, we describe the system's evolution by the time evolution of observables.
In this framework, we focus on how the Pauli words, which represent the observable \( \hat{H} \), evolve as quantum gates are applied. The key difference between the two pictures is:
\[ \lvert \psi(t) \rangle = U(t) \lvert \psi(0) \rangle \]
Expectation value is computed as:
\[ \langle \hat{H} \rangle = \langle \psi(t) \lvert \hat{H} \lvert \psi(t) \rangle \]
\[ \hat{H}(t) = U^\dagger(t) \hat{H} U(t) \]
The expectation value is then calculated as:
\[ \langle \hat{H} \rangle = \langle \psi(0) \lvert \hat{H}(t) \lvert \psi(0) \rangle \]
In this approach, we track the evolution of observables—specifically Pauli words—instead of the quantum state. Pauli propagation focuses on how these Pauli words evolve under quantum gates. One challenge in simulating quantum circuits is the exponential growth of Pauli words as the number of qubits increases. Pauli Paths offer an efficient way to manage this complexity by tracking the evolution of Pauli operators, rather than the full state vector. In the Schrödinger picture, simulating state evolution can be computationally expensive. In contrast, the Heisenberg picture allows us to focus on the evolution of Pauli words, where each gate acts on them deterministically. This significantly reduces the computational cost and avoids the exponential blow-up that comes with tracking the full quantum state.
A major obstacle in simulating quantum circuits is the rapid growth in the number of Pauli words as the number of qubits increases. If we were to track all possible Pauli words during the simulation, the computational cost would become prohibitive. To mitigate this, truncation techniques are often applied to limit the number of Pauli words tracked at any given time.
Truncation involves discarding Pauli words that exceed a certain weight threshold. The weight of a Pauli word refers to the number of qubits on which it has a non-trivial component (such as \( X \), \( Y \), or \( Z \), rather than the identity \( I \)). By setting a maximum weight, we can discard Pauli words that would otherwise require excessive computational resources.
The following pseudocode outlines the key steps involved in estimating expectation values using Pauli propagation. This approach efficiently tracks the evolution of Pauli words in the Heisenberg picture without the need to simulate the full quantum state vector.
# Pseudocode for Pauli Propagation Simulation
# Step 1: Decompose observable into Pauli words
observable = decompose_hamiltonian(H)
# observable is a list of Pauli words {P1, P2, ..., Pn}
# where each Pauli word Pi has a corresponding coefficient ci
# Step 2: Initialize Pauli words and their coefficients
Pauli_words = {P1: c1, P2: c2, ..., Pn: cn}
# Step 3: Iterate over quantum gates applied to the circuit
for each gate in quantum_circuit:
# Step 3a: Apply gate transformation to each Pauli word
for each Pauli_word in Pauli_words:
Pauli_words[Pauli_word] = apply_gate_to_pauli(gate, Pauli_word)
# Step 3b: Truncate Pauli words based on weight threshold
Pauli_words = truncate_pauli_words(Pauli_words, max_weight)
# Step 4: Compute expectation value by contracting evolved observable with initial state
expectation_value = 0
for each Pauli_word, coefficient in Pauli_words:
expectation_value += coefficient * compute_expectation(Pauli_word, initial_state)
return expectation_value
Simulating quantum circuits involves managing large-scale computations, especially when tracking Pauli words, which grow exponentially with the number of qubits. The challenge of simulating these systems can be mitigated by parallelizing the computation across multiple processing units, such as GPUs. Certain aspects of the Pauli path propagation algorithm are well-suited to parallelism, which can significantly speed up the computation and make simulating larger quantum circuits feasible.
One of the key aspects of the Pauli path propagation algorithm is that each Pauli word evolves independently under the action of quantum gates. This independence makes Pauli word computations an ideal candidate for parallelization. By mapping each Pauli word to a separate processing thread, we can perform these operations concurrently, dramatically reducing the overall time required for the simulation.
In our parallelized implementation, each Pauli word is treated as an independent task, and we assign each task to a separate thread on the GPU. The main steps that benefit from parallelism include:
Simulating quantum circuits, especially for large-scale quantum systems, presents significant challenges due to the exponential growth of the number of Pauli words as the number of qubits increases. The core difficulty lies in efficiently managing the evolution of a growing number of Pauli words, while also ensuring that the computation remains scalable and performant when parallelized.
The most significant challenge in parallelizing this workload is the exponential growth of the number of Pauli words with respect to the number of qubits in the quantum system. As more qubits are added, the size of the state space increases exponentially, which results in a proportional increase in the number of Pauli words needed to represent observables. This means that mapping each Pauli word to an independent GPU thread becomes more difficult as the number of Pauli words increases, since we may quickly exceed the number of available threads or run into memory limitations.
For instance, with just 5 qubits, there are \( 4^5 = 1024 \) possible Pauli words, but with 10 qubits, this number grows to \( 4^{10} = 1,048,576 \). As the qubit count rises, it becomes increasingly difficult to allocate enough threads to manage all the Pauli words without significant overhead or memory constraints.
The computational workload associated with evolving Pauli words is inherently parallel, as each Pauli word can evolve independently under the action of quantum gates. However, the following characteristics of the workload make mapping it to a parallel system challenging:
Several system constraints make mapping the workload of Pauli word simulation to a parallel architecture challenging:
By undertaking this project, we aim to understand how to efficiently scale the simulation of quantum circuits by managing Pauli word evolution through parallelism. Key aspects of the problem we hope to address include:
Through experimentation and optimization, we hope to find solutions that balance these challenges, enabling efficient and scalable simulation of large quantum circuits.
The primary starting point for this project will be existing open-source codebases and simulators. These include:
We plan to start by implementing C++ versions of the key functionalities found in the PauliPropagation.jl and Qiskit/pauli-prop libraries. These repositories provide a foundation for Pauli path simulation, and translating the existing implementations into C++ will provide both performance improvements and more flexibility for adapting the algorithms to different use cases. C++ will be chosen for its high-performance capabilities, especially for numerical computations, which will be crucial for handling large-scale quantum systems efficiently.
The primary theoretical framework for this project comes from the paper:
“Simulating Quantum Dynamics with Pauli Paths” by L. J. S. Ferreira, D. E. Welander, and M. P. da Silva (2020).
Additional reference material and technical details will be obtained from the upcoming paper:
“A Practical Guide to using Pauli Path Simulators for Utility-Scale Quantum Experiments” by H. Gharibyan, S. Hariprakash, M. Z. Mullath, and V. P. Su, arXiv:2507.10771 (2025). Available: https://arxiv.org/abs/2507.10771
While the above resources will provide most of the theoretical and practical tools needed for this project, there may be some additional computational resources required. Specifically, access to specialized quantum computing hardware or simulators (such as those available through platforms like IBM Qiskit, AWS Braket, or Microsoft Azure Quantum) will be crucial for benchmarking. By using these platforms, we can generate results to compare against our own simulations, helping us evaluate the performance and accuracy of our Pauli paths implementation. This comparison will be key for identifying areas where our simulator can be optimized or refined.
For hardware we will be using the GHC machines for both their multi-core CPUs and NVIDIA GPUs. We plan to write our entire project in C++ and CUDA