Project Proposal

Parallelizing Pauli Paths (P3)

Creators: Arul Rhik Mazumder (arulm), Daniel Ragazzo (dragazzo)

Summary

We will implement a parallelized version of the Pauli path algorithm for quantum circuit simulation, focusing on parallelizing the computation over the number of Pauli words. The goal is to accelerate quantum circuit simulation by performing independent Pauli word computations concurrently on NVIDIA GPUs. We will use CUDA programming to exploit GPU parallelism and optimize the simulation for larger quantum systems.

Background

Motivation

In quantum computing, a central task is to estimate expectation values for an observable \( \hat{H} \) given a quantum circuit that prepares a quantum state \( \lvert \psi \rangle \). Specifically, for a quantum system that evolves under a quantum circuit \( U \), the expectation value of an observable \( \hat{H} \) is given by:

\[ \langle \hat{H} \rangle = \langle \psi \lvert \hat{H} \lvert \psi \rangle = \langle \psi_0 \lvert U^\dagger \hat{H} U \lvert \psi_0 \rangle \]

This expectation value is fundamental in various quantum algorithms and simulations, such as in quantum chemistry for determining energy levels, or in general quantum information processing tasks.

However, simulating quantum systems and computing such expectation values classically is not easy. The quantum state of a system grows exponentially with the number of qubits, making direct simulation of large systems infeasible. Thus, methods that exploit the structure of quantum circuits and observables are essential for efficient simulation.

Naive Statevector Simulation

The most direct and intuitive classical simulation method for quantum circuits is statevector simulation. In this approach, the quantum state \( \lvert \psi \rangle \) is represented as a vector in a complex vector space, and quantum gates are applied directly to this state. The evolution of the system is tracked by applying the corresponding unitary matrices of the quantum gates to the statevector. This can be mathematically described as:

\[ \lvert \psi' \rangle = U \lvert \psi \rangle \]

where \( U \) is the unitary matrix corresponding to a gate in the circuit.

While statevector simulation works well for small systems, it faces exponential scaling with the number of qubits. For a system of \( n \) qubits, the quantum statevector has \( 2^n \) complex amplitudes. As the number of qubits grows, the computational resources required to store and manipulate the statevector grow exponentially, making it infeasible for larger systems.

Hamiltonian Decomposition into Pauli Words

In quantum simulations, a more scalable approach for simulating circuits and computing expectation values is to represent the observable \( \hat{H} \) as a sum of Pauli words. Pauli words are tensor products of Pauli matrices \( \{ I, X, Y, Z \} \), where each matrix acts on one or more qubits. As such, an observable can be written as:

\[ \hat{H} = \sum_i c_i P_i \]

Here, \( P_i \) represents a Pauli word (such as \( X \otimes Z \otimes I \)), and \( c_i \) is the corresponding coefficient. Pauli words are particularly useful because they provide a structured and efficient way to represent the observable in quantum simulations. In practice, most quantum systems are modeled using observables that are sums of Pauli words.

Schrödinger Picture vs Heisenberg Picture

In the Schrödinger picture, we track the evolution of the quantum state under the action of quantum gates. However, in the Heisenberg picture, the state remains fixed, and instead, we describe the system's evolution by the time evolution of observables.

In this framework, we focus on how the Pauli words, which represent the observable \( \hat{H} \), evolve as quantum gates are applied. The key difference between the two pictures is:

In this approach, we track the evolution of observables—specifically Pauli words—instead of the quantum state. Pauli propagation focuses on how these Pauli words evolve under quantum gates. One challenge in simulating quantum circuits is the exponential growth of Pauli words as the number of qubits increases. Pauli Paths offer an efficient way to manage this complexity by tracking the evolution of Pauli operators, rather than the full state vector. In the Schrödinger picture, simulating state evolution can be computationally expensive. In contrast, the Heisenberg picture allows us to focus on the evolution of Pauli words, where each gate acts on them deterministically. This significantly reduces the computational cost and avoids the exponential blow-up that comes with tracking the full quantum state.

Truncation of Pauli Words for Scalability

A major obstacle in simulating quantum circuits is the rapid growth in the number of Pauli words as the number of qubits increases. If we were to track all possible Pauli words during the simulation, the computational cost would become prohibitive. To mitigate this, truncation techniques are often applied to limit the number of Pauli words tracked at any given time.

Truncation involves discarding Pauli words that exceed a certain weight threshold. The weight of a Pauli word refers to the number of qubits on which it has a non-trivial component (such as \( X \), \( Y \), or \( Z \), rather than the identity \( I \)). By setting a maximum weight, we can discard Pauli words that would otherwise require excessive computational resources.

Algorithm Overview: Pauli Propagation for Expectation Value Estimation

The following pseudocode outlines the key steps involved in estimating expectation values using Pauli propagation. This approach efficiently tracks the evolution of Pauli words in the Heisenberg picture without the need to simulate the full quantum state vector.

                # Pseudocode for Pauli Propagation Simulation

                # Step 1: Decompose observable into Pauli words
                observable = decompose_hamiltonian(H)
                # observable is a list of Pauli words {P1, P2, ..., Pn}
                # where each Pauli word Pi has a corresponding coefficient ci

                # Step 2: Initialize Pauli words and their coefficients
                Pauli_words = {P1: c1, P2: c2, ..., Pn: cn}

                # Step 3: Iterate over quantum gates applied to the circuit
                for each gate in quantum_circuit:
                    # Step 3a: Apply gate transformation to each Pauli word
                    for each Pauli_word in Pauli_words:
                        Pauli_words[Pauli_word] = apply_gate_to_pauli(gate, Pauli_word)
                    
                    # Step 3b: Truncate Pauli words based on weight threshold
                    Pauli_words = truncate_pauli_words(Pauli_words, max_weight)

                # Step 4: Compute expectation value by contracting evolved observable with initial state
                expectation_value = 0
                for each Pauli_word, coefficient in Pauli_words:
                    expectation_value += coefficient * compute_expectation(Pauli_word, initial_state)

                return expectation_value
            

Leveraging Parallelism for Efficient Simulation

Simulating quantum circuits involves managing large-scale computations, especially when tracking Pauli words, which grow exponentially with the number of qubits. The challenge of simulating these systems can be mitigated by parallelizing the computation across multiple processing units, such as GPUs. Certain aspects of the Pauli path propagation algorithm are well-suited to parallelism, which can significantly speed up the computation and make simulating larger quantum circuits feasible.

Parallelism in Pauli Word Computations

One of the key aspects of the Pauli path propagation algorithm is that each Pauli word evolves independently under the action of quantum gates. This independence makes Pauli word computations an ideal candidate for parallelization. By mapping each Pauli word to a separate processing thread, we can perform these operations concurrently, dramatically reducing the overall time required for the simulation.

How GPU Parallelism Works in Pauli Path Propagation

In our parallelized implementation, each Pauli word is treated as an independent task, and we assign each task to a separate thread on the GPU. The main steps that benefit from parallelism include:

The Challenge

Simulating quantum circuits, especially for large-scale quantum systems, presents significant challenges due to the exponential growth of the number of Pauli words as the number of qubits increases. The core difficulty lies in efficiently managing the evolution of a growing number of Pauli words, while also ensuring that the computation remains scalable and performant when parallelized.

Exponential Growth of Pauli Words

The most significant challenge in parallelizing this workload is the exponential growth of the number of Pauli words with respect to the number of qubits in the quantum system. As more qubits are added, the size of the state space increases exponentially, which results in a proportional increase in the number of Pauli words needed to represent observables. This means that mapping each Pauli word to an independent GPU thread becomes more difficult as the number of Pauli words increases, since we may quickly exceed the number of available threads or run into memory limitations.

For instance, with just 5 qubits, there are \( 4^5 = 1024 \) possible Pauli words, but with 10 qubits, this number grows to \( 4^{10} = 1,048,576 \). As the qubit count rises, it becomes increasingly difficult to allocate enough threads to manage all the Pauli words without significant overhead or memory constraints.

Workload Characteristics

The computational workload associated with evolving Pauli words is inherently parallel, as each Pauli word can evolve independently under the action of quantum gates. However, the following characteristics of the workload make mapping it to a parallel system challenging:

System Constraints

Several system constraints make mapping the workload of Pauli word simulation to a parallel architecture challenging:

What We Hope to Learn

By undertaking this project, we aim to understand how to efficiently scale the simulation of quantum circuits by managing Pauli word evolution through parallelism. Key aspects of the problem we hope to address include:

Through experimentation and optimization, we hope to find solutions that balance these challenges, enabling efficient and scalable simulation of large quantum circuits.

Resources

The primary starting point for this project will be existing open-source codebases and simulators. These include:

We plan to start by implementing C++ versions of the key functionalities found in the PauliPropagation.jl and Qiskit/pauli-prop libraries. These repositories provide a foundation for Pauli path simulation, and translating the existing implementations into C++ will provide both performance improvements and more flexibility for adapting the algorithms to different use cases. C++ will be chosen for its high-performance capabilities, especially for numerical computations, which will be crucial for handling large-scale quantum systems efficiently.

Starting Point: Papers and References

The primary theoretical framework for this project comes from the paper:

“Simulating Quantum Dynamics with Pauli Paths” by L. J. S. Ferreira, D. E. Welander, and M. P. da Silva (2020).

Additional reference material and technical details will be obtained from the upcoming paper:

“A Practical Guide to using Pauli Path Simulators for Utility-Scale Quantum Experiments” by H. Gharibyan, S. Hariprakash, M. Z. Mullath, and V. P. Su, arXiv:2507.10771 (2025). Available: https://arxiv.org/abs/2507.10771

Additional Resources

While the above resources will provide most of the theoretical and practical tools needed for this project, there may be some additional computational resources required. Specifically, access to specialized quantum computing hardware or simulators (such as those available through platforms like IBM Qiskit, AWS Braket, or Microsoft Azure Quantum) will be crucial for benchmarking. By using these platforms, we can generate results to compare against our own simulations, helping us evaluate the performance and accuracy of our Pauli paths implementation. This comparison will be key for identifying areas where our simulator can be optimized or refined.

Goals and Deliverables

Plan to Achieve:

Hope to Achieve:

Poster Demo Deliverables:

Schedule

Week 1 (11/17–11/23)
  • Develop a working minimal C++/CUDA Pauli Path simulator for tiny circuits
  • Establish baseline timings for later speedup comparisons
Week 2 (11/24–11/30)
  • Add and test truncation mechanisms
  • Begin major optimization work
Week 3 (12/1–1/8)
  • Finalize GPU optimizations
  • Run full benchmark suite and generate plots
  • Produce final project poster

Platform Choice

For hardware we will be using the GHC machines for both their multi-core CPUs and NVIDIA GPUs. We plan to write our entire project in C++ and CUDA