Bit-Sparsity Aware Acceleration With Compact CSD Code on Generic Matrix Multiplication
-
作者
Zhu, Zixuan; Zhou, Xiaolong; Wang, Chundong; Tian, Li; Huang, Zunkai; Zhu, Yongxin
-
刊物名称
IEEE TRANSACTIONS ON COMPUTERS
-
年、卷、文献号
2025, 2,
-
关键词
Zhu, Zixuan; Zhou, Xiaolong; Wang, Chundong; Tian, Li; Huang, Zunkai; Zhu, Yongxin
-
摘要
The ever-increasing demand for matrix multiplication in artificial intelligence (AI) and generic computing emphasizes the necessity of efficient computing power accommodating both floating-point (FP) and quantized integer (QINT). While state-of-the-art bit-sparsity-aware acceleration techniques have demonstrated impressive performance and efficiency in neural networks through software-driven methods such as pruning and quantization, these approaches are not always feasible in typical generic computing scenarios. In this paper, we propose Bit-Cigma, a hardware-centric architecture that leverages bit-sparsity to accelerate generic matrix multiplication. Bit-Cigma features (1) CCSD encoding, an optimized on-chip sparsification technique based on canonical signed digit (CSD) representation; (2) segmented dot product, a multi-stage exponent matching technique for long FP vectors; and (3) the versatility to efficiently process both FP and QINT data types. CCSD encoding halves the cost of CSD encoding while achieving optimal bit-sparsity, and segmented dot product improves both accuracy and throughput. Bit-Cigma cores are implemented using 65 nm technology at 1 GHz, demonstrating substantial gains in performance and efficiency for both FP and QINT configurations. Compared to state-of-the-art Bitlet, Bit-Cigma achieves 3.2x performance, 6.1x area efficiency, and 15.3x energy efficiency when processing FP32 data while ensuring zero computing error.