The implementation of the energy-efficient processor was based on the basic structure of the brain, which comprises biological synapses numbering on the order of 10
15 connected to neurons on the order of 10
11.
[6] The data in the form of synaptic weight (
w) is transferred neuron-to-neuron through the synapses in parallel. When the sum of the weights in the neuron exceeds a certain threshold, the neuron responds by generating signals and passing them to other synapses. Because of the parallel-connected synaptic configuration, high-level cognitive functions in the brain can be performed by consuming only tens of watts.
[7] Based on the expectation of attractive low-power benefits, understanding the brain's structure and essential roles has initiated the development of neuromorphic algorithms through the building of artificial neural networks.
[8-12] The input and output neurons in each layer are linked through hidden layer neurons, which is a perceptron structure.
[13] The basis of the neural network algorithms is to classify specific outputs by multiplying input vector signals and synaptic weights through forward propagation. Each neuron plays a role in linear (or binary) classification to determine whether to continuously process the signal based on the sum of the calculations. By inserting more hidden layers to perform additional perceptron processes, the multilayer neural network enables the solution of complex problems and extension of the functionalities to logical functions such as Boolean logic. Thus, such deep neural network (DNN) algorithms outperform conventional methods specifically in case of recognition and classification tasks to determine the desired output from unknown inputs.
The algorithm relies substantially on iterative arithmetic calculations such as vector–matrix multiplication (VMM), or multiply–accumulate (MAC) operation, which runs on graphics processing unit (GPUs)-based platforms
[9] that are appropriate for parallel processing or application of specific integrated circuits.
[14] The time-consuming computation is architecturally accelerated by using the cross-point array architecture in which synaptic elements are positioned between lines carrying input and output signals crossing each other.
[15] Neuromorphic hardware systems are simply described as having multiple synaptic arrays as weight matrix blocks, as shown in Figure 2.
[16] Neurons located on the edge of each array convey inputs and outputs to communicate with other segments. The voltage inputs via the word lines (WLs) in parallel reach the synapses and are subsequently multiplied by the stored synaptic weight encoded in the form of the conductance (
G), according to Ohm's law. Unlike normal memory operations in the cross-point array that read conductance at a single selected cell, the multiplication takes place at every cross. The weighted sum current as a result of the sum of each output along the bit line (BL), based on Kirchhoff's current law, is fed to peripheral circuitries (e.g., analog–digital converters and multibit sense amplifiers) serving as the neuronal element. When the output results differ from the expected values, the signal moves back to the synaptic array, and the synaptic weights are adjusted using a gradient descent method to reduce errors based on the back-propagation algorithm,
[8] which is the method used by the neuromorphic system to learn newly acquired information and provide accurate inferences. VMM operations are performed where the weights are physically stored, alleviating memory wall problems.
[17] Therefore, for the in-memory computing platform based on the cross-point array architecture,
[18] selecting the appropriate devices as the fundamental building blocks for synaptic and neuronal elements is important for implementing the neuromorphic systems in hardware.
Recently, significant advances in neuromorphic hardware have been successfully reported and demonstrated. Most studies used static random-access memory (SRAM) with eight transistors arranged as the synaptic device.
[19-21] However, the SRAM with digital synaptic weights “0” and “1” is unable to satisfy the numerous parameters used in the algorithms.
[21] Although the single transistor unit has been significantly reduced to a few nanometers of technology nodes,
[22] the large footprint occupied by multiple transistors creates an area overhead. This problem has garnered significant attention to emerging memory technologies for compact and analog weight storage.
[23-29] Notably, the newly available memory options are based on resistance changes, in contrast to the conventional storing of charges in a capacitor or floating gate.
[30, 31] Most resistive memories are thus essentially simple metal–insulator–metal systems, which allow the highest memory capacity in the lowest occupied cell area. The specific denotation of each resistive memory is determined by how the material systems respond to external electrical stimuli. Magnetic random access memory (MRAM)
[32] utilizes the orientation of the spin while the rotating objects become dipoles in ferroelectric memory devices.
[33] The reversible phase transition between amorphous and crystalline states in chalcogenide materials leads to a difference in resistance, known as phase change memory (PCM).
[34] Ion migration in most nonstoichiometric materials, driven locally or globally by an electric field, enables the resistance change as in resistive switching RAM (RRAM)
[35, 36] or electrochemical RAM (ECRAM).
[37] State-of-the-art resistive memory technologies, excluding the ECRAM, have been integrated into ≈20 nm nodes.
[38-41] For a fair and systematic comparison, the latest SRAM is assumed to scale up to a few tens of nanometer nodes.
[42] The accelerator performances are benchmarked while considering end-to-end design options from the device- and circuit- to algorithm-level. Unlike the SRAM, the assigned multiple weights are retained even when the power supply is turned off, thereby minimizing standby leakage power.
[43] This implies that by using the resistive synapses that function optimally with the cross-point array architecture, the entire system can afford superior throughput and energy efficiency.
The neuron node adjacent to the synaptic array is often neglected in the neuromorphic system study. After the analog computation in the cross-point array, the weighted sum current at the end of each BL should be processed (e.g., converted to voltage spike or digital pulse),
[44] which is a vital role of the biological neuron that receives the current from the synapses and thereafter decides whether to activate an action potential to the next neurons in the neural network. Typically, the silicon complementary metal–oxide–semiconductor (CMOS)-based neuronal circuits comprising tens of transistors with a capacitor are used for implementing the integrate-and-fire neuron model.
[45] The weighted sum current is first integrated into the capacitor placed at the end of the BL. When the charged voltage exceeds the threshold, digitized output voltage spikes are generated through the circuitry. By counting the number of the output spikes that are designed to be proportional to the amplitude of the read-out current, the neuron node is capable of determining the output firing strength following activation functions such as sigmoid, tanh, softmax, and rectified linear unit.
[46] However, the complex neuronal circuit with a capacitor clearly occupies a substantially larger footprint than the BL pitch of the cross-point array. The pitch mismatch problem inevitably causes a single neuron node to be shared with multiple BLs, which implies that the weighted currents computed in parallel from the synaptic arrays have to be sequentially processed.
Herein, we first discuss the advances in the PCM and RRAM, where significant progress has been achieved, to address the requirements of the neuromorphic synaptic devices. Recent strategies based on the prominent specific characteristics of other candidates such as ECRAM, ferroelectric memory, and MRAM to overcome relevant challenges have also been explored. Next, we have introduced studies that explored compact neuromorphic neuronal devices based on either two-terminal switches or volatile memories, highlighting the advantages of these devices from an area and energy perspective, as shown in Figure 3. Finally, we have concluded the article by indicating future study based on the current status to boost neuromorphic system performances.