DESIGN AND ANALYSIS OF ENERGY-EFFICIENT SRAMBASED ON-CHIP IN-MEMORY COMPUTATION FOR MACHINE LEARNING APPLICATIONS

Kumar, Saragada Prasanna

Please use this identifier to cite or link to this item: http://localhost:8081/jspui/handle/123456789/20173

Title:	DESIGN AND ANALYSIS OF ENERGY-EFFICIENT SRAMBASED ON-CHIP IN-MEMORY COMPUTATION FOR MACHINE LEARNING APPLICATIONS
Authors:	Kumar, Saragada Prasanna
Issue Date:	Oct-2023
Publisher:	IIT Roorkee
Abstract:	The ongoing revolution of machine learning (ML) algorithms plays an important role in the computation of various data-intense tasks such as speech detection, image recognition, and image classification. Conventional von-Neumann system with separate memory and processing unit fails to execute these data-intense ML workloads in an efficient manner. The in-memory computation (IMC) using static random access memory (SRAM) is proposed to minimize the energy and latency of the ML workloads by performing computation directly (near) the place where the data resides. The fundamental operation in any ML algorithm is multiply-and-accumulate (MAC) or XNOR-and-accumulate (XAC) operations. Many SRAM-based IMC architectures are recently proposed to perform IMC-MAC and IMC-XAC operations with higher throughput and energy efficiency than digital accelerators. However, the IMC architectures are analog in nature and have non-linearity issues, which need to be addressed. We proposed three SRAM-based IMC architectures such as (i) Thermometric Code-based IMC (TC-IMC), (ii) Pulse amplitude modulation (PAM)-and-Thermometric-based IMC (PT-IMC), and (iii) Configurable SRAM-based Hardware Accelerator (CS-HA) to improve the linearity, throughput, and energy-efficiency of the IMC operations. In the TC-IMC architecture, we proposed an input sparsity-aware compact thermometric code to improve the throughput of the IMC-MAC operation. In addition, we proposed an optimal sampling time to improve the linearity of the IMC-MAC operation. The test chip measurement results in the TSMC 180-nm CMOS process show that the proposed TC-IMC has 72% better linearity than the traditional IMC. The measured results on MNIST and CIFAR-10 test images show an accuracy of 97% and 87%, respectively. In addition, the proposed TC-IMC architecture achieves (i) MAC compute latency of 25 ns, (ii) 29.8 GOPS/Kb, and (iii) energy efficiency of 2.3 TOPS/W. In the PT-IMC architecture, we perform 4b x 4b MAC operations with improved linearity. We proposed a configurable current steering thermometric digital-to-analog converter (CST-DAC) to provide the PAM signals with various dynamic ranges and non-linear gaps that are required to improve the linearity and signal margin of the IMC operations. The PT-IMC architecture is implemented in the TSMC 180-nm CMOS process and the post-silicon calibration of the design point provides the maximum signal margin, which is close to the ideal simulation results. In addition, the PT-IMC architecture achieves (i) integral non-linearity (INL) of 0.35 LSB, (ii) a peak throughput of 12.41 GOPS, (iii) normalized energy efficiency of 30.64 TOPS/W, (iv) a loss in the signal margin of 8.3%, and (v) a computational error of 0.41%. Furthermore, the PT-IMC architecture performs MNIST and CIFAR-10 data set classification with an accuracy of 98% and 88%, respectively. In the CS-HA architecture, we proposed a configurable 10T SRAM bit cell that can perform IMC-MAC and IMC-XAC operations. In addition, we proposed an optimal scaled voltage approach to improve the linearity and signal margin of the pulse count modulation (PCM)-based IMC-MAC operation. Furthermore, the IMC-XAC operation is performed using the proposed single capacitor discharge approach with various advantages such as (i) no deterministic error in XAC output, (ii) low latency, and (iii) less variation than the traditional charge-sharing-based XAC operation. In IMC-MAC mode, we achieve a throughput of 54.6 GOPS, energy efficiency of 273 TOPS/W, and classification accuracy of 98.67%/88.72% on the MNIST/CIFAR-10 dataset. In IMC-XAC mode, we achieve a throughput of 3276.8 GOPS, energy efficiency of 1092.2 TOPS/W, and classification accuracy of 97.12% on the MNIST dataset. In addition, the impact of the bit-width on the classification accuracy and the throughput of the IMC architecture is analyzed. We conclude that the classification accuracy increases at the cost of the throughput of the IMC architecture. Subsequently, we analyzed the impact of the sampling time on the ADC resolution and classification accuracy of the IMC architecture.
URI:	http://localhost:8081/jspui/handle/123456789/20173
Research Supervisor/ Guide:	Das, Bishnu Prasad
metadata.dc.type:	Thesis
Appears in Collections:	DOCTORAL THESES (E & C)

Files in This Item:

File	Description	Size	Format
2023_SARAGADA PRASANNA KUMAR.pdf		29.38 MB	Adobe PDF	View/Open

Show full item record