Novel Design, Architecture, and Optimization of Content Addressable Memory (CAM)
Student thesis: Doctoral Thesis
Related Research Unit(s)
Content-addressable memory (CAM) is a high-speed lookup memory, which searches the entire memory in parallel and provides address of the input search word. In field-programmable gate arrays (FPGAs), CAMs are emulated using static random-access memory (SRAM) blocks, flip-flops (FF), and lookup tables (LUTs). FPGA-based CAM research focuses on three parameters of the CAM design on FPGA resources, i.e., hardware utilization, power consumption, and speed. Four independent but closely related projects have been completed so far. Zi-CAM and RPE-TCAM have improved the power consumption while D-TCAM and MUX-Update has improved the speed compared to the existing CAM architectures. Zi-CAM: Power consumption of FPGA-based binary CAM is reduced by proposing a novel architecture using LUTs. The whole architecture is divided into RAM block and LUT block. Only one block is activated for a given input search key which results in a reduction of power consumption of 30% compared to the state-of-the-art FPGA based CAMs. D-TCAM: A high-throughput (speed) ternary CAM is proposed in this work which exploits the LUT-FF pair nature of Xilinx FPGAs. One D-CAM block implements a 48-bytes TCAM using 64 lookup tables (LUTs), that is cascaded horizontally and vertically to increase the width and depth of TCAM, respectively. The proposed TCAM architecture improves the throughput by 58.8% without any additional hardware cost. Traditional FPGA-based TCAMs have an update-latency of N clock cycles compared to the lookup-latency of one clock cycle, where N is the depth of TCAM. we presented two mechanisms for updating FPGA-based TCAM and successfully implemented on Xilinx Virtex-6 FPGA: an accelerated MUX-Update mechanism and a cost-effective LUT-Update mechanism. MUX-Update provides an update-latency of W+1 clock cycles by using only three Input/output (I/O) pins, whereas W is the width of TCAM. A novel power-aware reconfigurable FPGA-based TCAM architecture is proposed that enables only a portion of the hardware to perform the search operation. We performed an extensive design space exploration to find the optimal number of banks on Xilinx FPGAs, which provides the maximum power saving. Moreover, we propose a solution to bank overflow using backup CAM (BUC) to handle the overflowed CAM entries. Future work includes exploring these memory architectures to accelerate machine learning and further optimizing their implementation on FPGAs.
- memory, Content-addressable memory, Field programmable gate arrays, Embedded Systems, Computer architecture, Digital Systems Design