Stochastic Spintronics for Scalable AI Acceleration

The significant energy consumption and latency of AI processing can be primarily attributed to the von Neumann memory bottleneck in today’s computers. Analog In-Memory Computing (IMC) demonstrates high efficiency and parallelism in processing the dominating matrix-vector multiplications (MVM). However, large-scale implementations of IMC have been hindered by the high hardware overhead (area, energy, and latency) of the peripheral Analog-Digital Conversion (ADC), which is required to ensure a robust interface between analog MVM and other digital components. Moreover, the standard IMC based on grid-like tiling of crossbars lacks the flexibility to emulate the complex connectivity and topology in biological neural systems. Further, additional operations beyond MVM, such as the attention layer in large language models (LLM), take significant computing resources in many top-performing AI. The complexity of the arithmetic operations in attention makes it difficult to realize hardware acceleration using emerging architectures. We are exploring how to integrate spintronic stochastic building blocks into next-generation neuro-inspired architectures for AI hardware platforms. A "CMOS+Spintronics" system is envisioned to enable efficient MVM processing cores with ultra-efficient peripherals. We propose to exploit circuit-system-algorithm co-design to address the well-known latency challenge due to sampling lengths of stochastic computation.