(*Volume3*, *Issue5*) Available online at: <a href="www.ijarnd.com">www.ijarnd.com</a> # Nonvolatile field-programmable gate array with high-reliability and high intensity using 1D2R RRAM array Abinaya. S<sup>1</sup>, Gayathri. J<sup>2</sup>, Giridharan. S<sup>3</sup>, Swaminathan. M<sup>4</sup> <sup>1,2,3</sup>Student, SNS College of Technology, Coimbatore, Tamil Nadu <sup>4</sup>Lecturer, SNS College of Technology, Coimbatore, Tamil Nadu ## **ABSTRACT** The huge area overhead of the interconnect is one of the serious issues in static random access memory (SRAM)-based field-programmable gate arrays (FPGAs), resulting in high power consumption and slow operating speed. Another major issue is the volatile feature of the SRAM, which results in high standby leakage current and long power-ON time. Resistive random access memory (RRAM) which has a high resistance ratio and zero standby power holds great potential in the FPGA application. The conventional RRAM-based nonvolatile FPGAs (NVFPGAs) can use one-transistor 2-RRAM (1T2R) storage element to replace the SRAM or the one RRAM (1R) cell to swap both the nMOS switch and SRAM. However, those NVFPGA schemes may suffer from the problems of low reliability, high configuration power, and high active leakage power. In this paper, we recommend a novel element [one-diode two RRAM (1D2R) cells] to swap the nMOS switch and 6 Transistors (6T) SRAM. In the meantime, the novel block structures of the logic block, connection block, switch block, and the FPGA architecture based on the 1D2R element are recommended. Compared with the conventional 1T2R-based NVFPGA, our novel structure could improve the operational speed by 53% with a 40.5% lower operating power. Compared with the conventional 1R-based NVFPGA, the recommended scheme could greatly reduce the write error rate by eight orders with greater than 20 times minimum write power. **Keywords:** Crossbar, Field-programmable gate arrays (FPGAs), High reliability, Low power, Nonvolatile, One-diode two-RRAM (1D2R), Resistive random access memory (RRAM), Write error rate. # 1. INTRODUCTION STATIC random access memory (SRAM)-based field programmable gate arrays (FPGAs) have been quickly developed and widely used in the system and prototype designs in the past 2 decades[1]–[4] due to their post fabrication reconfigurability, minimum time to market, and low development risk and cost. SRAMs are used to configure routing and logics information to realize the required functionalities. However, the interconnect area of the recent FPGA is still four times the logic area [5], resulting in high dynamic power and the slow operation speed of the FPGA. In addition, when powered OFF SRAMs lose configuration information. Hence, SRAM-based FPGAs have to perform an initialization to load the configuration data from external nonvolatile memory (NVM) to internal SRAMs after powering ON. Moreover, the high standby power has become one of the serious issues in SRAM-based FPGAs, as the semiconductor industry enters 90-nm technology node and beyond [6]–[8]. As a result, SRAM-based FPGAs suffer from long configuration loading time and excessive standby leakage power. The recent development in resistive NVM technologies, including resistive random access memories (RRAMs) [9]–[13], spin-torque transfer magnetic RAMs [14]–[16], and phase change memories [17]–[20], provides an admirable opportunity to achieve high-speed, high-density, instant power-ON, and superior energy-efficient FPGAs. RRAM becomes the front-runner among resistive NVMs due to its high switching speed (<10 ns) [12], small cell size (4F2) [21], high resistance ratio [22], low switching voltage [23] and current [24], and compatible to current CMOS processes. RRAM is basically a metal–insular–metal structure that exhibits hysteretic characteristics or a reversible resistance-switching between low-resistance state (LRS) and high-resistance state (HRS). Numerous works have been reported in [25]–[32] to integrate RRAM cells to achieve low-power and high-performance nonvolatile FPGAs (NVFPGAs). The most direct way to integrate RRAM in FPGAs is to replace the conventional 6T SRAM with RRAM-based 1T2R configuration element, as stated in [26]–[28]. Despite area efficiency, the designs in [26]–[28] may suffer from less data retention, since dc-biased RRAM cells can switch their states during the FPGA operation. Another problem is the high active leakage power due to the insufficient high OFF resistance of the RRAM cell. Moreover, the nMOS switches may be overdriven to reduce the ON resistance, which weakens the reliability and active leakage power. Another method is to directly replace both 6T SRAM and nMOS switch with one RRAM (1R) cell in switch blocks (SBs) and connection blocks (CBs) [28]–[30]. A key task in this scheme is data integrity of the interconnect configuration due to the high leakage current in the sneak path. Fig 1: Simple island-style SRAM-based FPGA layout We propose a novel NVFPGA architecture based on the developing RRAM technology. With the full utilization of high resistance ratio, excellent scalability, and high density, RRAM is organized in a one-diode two-RRAM (1D2R) cell structure. The proposed 1D2R element is used to swap both SRAM and nMOS switch, which will address the sneak path issue, and significantly improve the write reliability. Meanwhile, the novel logic block (LB), CB, and SB structures, and NVFPGA architectures are proposed based on the 1D2R element. In our proposed NVFPGA, the diode of 1D2R is used only during configuration. During normal operation, the diode is turned OFF and the interconnect becomes a diodeless crossbar array. By stacking RRAM cells on top of CMOS circuitries, our proposed NVFPGA architecture can exhibit less footprint (70% smaller), higher performance (63% faster), and low power consumption (43.6% lower) than the SRAM-based FPGA. It also increases the speed of the 1T2R-based NVFPGA by 53% with a 40.5% low operation power. Compared with the 1R-based NVFPGAs, the write reliability is significantly increased by more than 90 million times. This paper is ordered as follows. The introduction to the background of RRAM, access device, the related works of RRAM-based FPGAs in Section II. In Section III the novel 1D2R structure and its implementation in the FPGA is presented. The detailed design of the NVFPGA, including SB, CB, and a lookup table (LUT) is given Section IV. Section V proposes the routing implementation and also discusses its impact on area size. Section VI analyzes the reliability, speed, power consumption among different FPGA schemes based on the simulation results. Finally, in Section VII the conclusion is drawn. # 2. BACKGROUND # A. Baseline 2-D FPGA In Fig.1, it is shown as a traditional 2-D island FPGA architecture taken from [33] is used as the baseline in this paper. It contains a number of tiles. Each tile contains one SB, two CBs, and one LB, and each LB contains some native routing structures (local interconnect) to route input signals to several basic logic elements (BLEs) and also connect the BLEs' outputs to their inputs. LBs are coupled to the routing channels through CBs. Architectural parameter Fc is controlled by the number of routing tracks to the LB(ratio of routing tracks to the LB inputs/outputs and the channel width W). The global routing structure contains 2-D segmented interconnect channels connected by programmable SBs. Fig. 2: Possible combinations of set and reset I-V curves. The combinations can be positive set, positive reset; positive set, negative reset; and negative set, negative reset # **B. RRAM** The elementary idea of the RRAM switch mechanism is that a dielectric, which is normally insulating, can be made to conductive through a filament or a conduction path. The RRAM can be reversibly switched between HRS (filament broken) and LRS (filament reformed) by applying a suitable voltage. Numerous possible combinations of set and reset curves are shown in Fig. 2. For unipolar switching, the low voltage acts as reset and the high voltage in the similar direction acts as set, whereas for bipolar switching only negative set, positive reset (eightwise) or positive set, negative reset (counter eightwise) is possible [34], [35]. Owing to the write scheme used in our proposed NVFPGA to remove the sneak path, the positive set, positive reset unipolar switching behavior is used in this NVFPGA design. #### C. Access Device A significant difficulty to integrate the RRAM cells as switches in the FPGA is the sneak path problem, which occurs in passive CBs, SBs, and local interconnects. To evade the sneak path and attain the high density, the diode is used as the access device because it is the backend of line friendly [36] [39]. Furthermore, it can also offer high driven current and large ON/OFF ratio. For example, International Business Machines Corporation has demonstrated a novel diode based on Cu-ion motion in Cu-containing mixed ionic-electronic conduction (MIEC) materials, which supports tremendously high current densities (>50 MA/cm2) and large ON/OFF ratios (≥107) [39]. Stacking RRAM and diode on the upper of the FPGA CMOS circuits can significantly minimize the FPGA area and delay, thus greatly improving the FPGA performance. # **D. Related Works** In Fig. 3(a), as shown in the conventional SRAM-based FPGA is used as an SRAM cell to configure the nMOS switch. The RRAM-based NVFPGA can be used as a similar structure to configure the nMOS switch, as shown in Fig. 3(b). Another similar RRAM-based NVFPGA integration scheme which is shown in Fig. 3(c) totally replace both SRAM and nMOS switch. However, both RRAM integration tactics suffer from various feebleness that limits the feasibility of the NVFPGA implementation. Fig. 3: (a) Conventional SRAM storage element to configure FPGAs (SRAM). (b) Nonvolatile storage element to configure the switch transistor in FPGAs (1T2R). (c) Nonvolatile storage element to replace the switch transistor and SRAM (2T1R or 1R) The 1T2R scheme which is shown in Fig. 3(b) was reported [26] [28] to replace the conventional SRAM cell with the RRAM-based storage element to have the benefits of instant power-ON and zero standby power. Unfortunately, it has low-reliability issue, which retards its application in FPGAs. The low reliability is produced by the low retention of RRAM cells with a bias voltage of VDD during operation. Another problem is the high active leakage power due to the insufficient high OFF resistance of the RRAM cell. Moreover, the V<sup>th</sup> drop of the nMOS switches also significantly reduces the speed performance of the FPGA. Thus, the nMOS switches can be overdriven to reduce the ON resistance, which weakens the reliability and active leakage power of the 1T2R element. In Fig. 3(c) the 2T1R (1R) scheme was shown, which is suggested in [28] [30] to replace the nMOS switch and SRAM cell to achieve high speed and density. However, the Vth drop on the programming transistors highly reduces the write current. Majorly, it suffers from significant low write reliability and high write power which is caused due to the high leakage current in the sneak path. For example, to program RRAM cell RNW between nodes N and W that has been shown in Fig. 4(a), the potential on N is at Vset or Vreset (where Vset and Vreset are the RRAM set and reset switching voltages, respectively) and the potential on node W is the ground. However, if RNW, RSN, and RSW are at HRS, LRS, and LRS, respectively, the majority of current goes through RSN and RSW, resulting in an extremely high leakage current, since the resistance of RRAM cells in HRS and LRS has several orders change. Therefore, RNW can have insufficient current to be switched. The write disturbance may get worse due to the write reliability. As shown in Fig. 4(b), if RNW, RSN, and RSW are at high, low, and high states, respectively, the potential on RNW and RSW is just the same. Both RNW and RSW can be switched as a result. Though biasing the unselected device at half (V/2 scheme) or one-third (V/3 scheme) of the programming voltage may limit the write disturbance, the leakage current still gets affected severely due to the configuration data integrity [40]. As the equivalent circuit that is shown in Fig. 5, when RRAM cells are unselected at LRS, the sneak path may be considered as equivalent resistors paralleled to the cell under programming. Fig. 4: (a) High leakage current issue and (b) write disturbance issue in the conventional RRAM-based nonvolatile SP. The en-dash lines are the paths to program the RRAM cells, and dashed-dotted lines are the sneak paths Fig. 5: Equivalent circuit of a diodeless crossbar array. Rcell is the RRAM cell resistance under programming, RL is the resistance of RRAM cells in LRS, M is the dimension size of the array, Rp0 is the input parasitic resistance of the switch, metal, etc., Rp1 is the paralleled input parasitic resistance, which is Rp0/(M-1) for V/2 or V/3 write scheme and infinite for floating scheme, and Vw, and Vb0 and Vb1 are the writing voltage, and biasing voltages for the unselected word lines and bit lines, respectively. For example, if the V/2 scheme is used, the paralleled resistance between the write voltage Vw and the ground is about 2(RL + Rp0)/(M-1). As a result, the most of the current goes to the sneak paths, and the parasitic resistance Rp0 may suppress the total equivalent resistance between Vw and the ground. Increasing Vw to compensate the drop of the write voltage that will make the RRAM suffer from high breakdown risk because the voltage on Rcell can be excessively high if most of the unselected cells are at HRS. Moreover, the unselected cells may suffer from high write trouble, because they are biased at the half of the write voltage. The 1D1R or 1T1R structure will help to reduce the sneak path leakage current. However, the transistor and diode cannot be embedded in the FPGA routing path. Otherwise, they will increase the voltage drop and also the delay. When nonlinearity is applied to the RRAM cell or embedded a nonlinear selector in series it may help to reduce the sneak patch current and voltage drop. However, during FPGA operation, the potential on the ON RRAM cell has to be zero. Therefore, the ON resistance could be significantly high due to the nonlinearity, which struggles the low ON resistance requirement to minimize the RC delay of the interconnect in FPGAs. #### 3. PROPOSED STORAGE ELEMENT In view of the above, the access device is indispensable to minimize the sneak path current and improve the reliability, but it cannot be embedded in the FPGA routing lines. Fig. 6: (a) Proposed nonvolatile element to replace the FPGA routing switch and 6T SRAM. Adjacent nonvolatile elements connecting to A or B share the same diodes. (b) 3-D schematic of the proposed nonvolatile element. Metal line A or B may be routed at different layers depending on the routing direction We propose a 1D2R-based nonvolatile element to swap both 6T SRAM and FPGA routing switch, as shown in Fig. 6. It contains of two RRAM cells and one diode. The two RRAM cells at the same time are programmed to either low or high. In the FPGA operation mode, the diodes are deactivated and the two RRAM cells are working as a routing switch in the NVFPGA: to propagate the signal, the switch is turned OFF due to RRAM's high resistance when both are at HRS; the switch is turned ON when both are at LRS. Our proposed 1D2R-based nonvolatile element works as a 1D2R memory cell in a crossbar array, in the FPGA configuration mode. Two additional diodes at nodes A and B are used in the place of the CMOS, as reported in the [41]. The diode could supply a larger current density than CMOS transistors. Most importantly, they can be placed between metals, as discussed in Section II-C, to reduce both the area and routing complexity. These two diodes are used to program RRAM cells, and they are shared for the adjacent nonvolatile elements that connect to A or B. During programming, the node L is pulled down to the ground and the node H is pulled up to Vset or Vreset, depending upon the FPGA configuration information. As both A and B are pulled to the ground, there is no dc loop to resist adjacent nonvolatile elements at the FPGA configuration. In the FPGA operation mode, the diodes are enabled by pulling L and H to VDD and the ground, respectively. The proposed 1D2R structure doubles the number of RRAM cells and to an extent increases the propagation delay. However, it is worthful because the data integrity of the configuration information in RRAM cells are improved significantly, which is more important than the speed performance of FPGAs. Moreover, when compared with the 1R scheme, our proposed structure could significantly reduce the write power in the FPGA configuration mode based on the simulation outputs shown in Section VI. A 3-D implementation of our proposed nonvolatile element is shown in the Fig. 6(b). The RRAM cells and diode (MIEC material is used in this example) will be inserted between the metals upon the CMOS circuits. All RRAM cells are in the identical layer, and their pitch can be as small as 2F. Therefore, the area of the diode can be planned at least $3F \times 1F$ to provide adequate current. The programming metal is the bit line in the crossbar array. If they have different routing directions the metal line A or B may be routed at different metal layers. Fig. 7: (a) Top-view structure of the proposed stacking RRAM-based NVFGPA. (b) Schematic of the memory in our proposed NVFPGA system The RRAM cells are arranged in a 1D2R crossbar array structure. #### 4. PROPOSED NONVOLATILE FPGA In our proposed NVFPGA, there is no CMOS circuitry in SBs and CBs excluding buffers. We also propose to stack the RRAM on top of CMOS circuitries, which can significantly reduce the area when compared with the traditional SRAM based FPGAs. A similar FPGA architecture is used, as shown in the Fig. 7(a). A local interconnect is placed in the center of the tile for such scheme. All CB shares the area between two adjacent tiles on the edge, and every SB shares the area among four adjacent tiles at the corner. In our proposed NVFPGA, the area is mainly resolution by the CMOS circuitries. The RRAM cells are organized as a 1D2R RRAM crossbar array, as shown in Fig. 7(b). Each diode connects to the one-bit line (Hi, where *i* is the natural number) and two RRAM cells. The other node of the RRAM cell connects to the word line (Li). To program one diode pair all the two-word lines are enabled simultaneously. The RRAM cells are programmed in the FPGA configuration mode. Our proposed NVFPGA has the FPGA operation mode and the FPGA configuration mode. The FPGA configuration mode is to program the RRAM cells or write configuration information to the RRAM cells. Unlike the SRAM-based FPGA, our proposed NVFPGA only requires one-time configuration and does not need to be reconfigured each time after powering. Thus, the power-ON time and energy are significantly reduced. The routing in our proposed NVFPGA is the diodeless crossbar array in the FPGA operation mode that activates high speed and the 1D2R crossbar array, as shown in Fig. 7(b), in the FPGA configuration mode that minimizes write error rate. A simple connection diagram of a tile in the NVFPGA, where I and N represent the number of inputs and K-input LUT (KLUT) in one LB is shown in the figure. Each LB has I general inputs, one clock input, and O outputs. The output corresponds to one flip-flop and a two-to-one multiplexer. The LUT inputs can come either from the inputs to the LB or from the output of other LUTs within the same LB via a full crossbar array (local interconnect). The major difference between our proposed NVFPGA and the architecture in [33] is that a crossbar structure of the interconnect is used in the place of the multiplexer structure. Fig. 8: Schematic of our proposed 1D2R-based NVFPGA. The crossbar structure is used for both CB and local interconnect Fig. 9: Schematic of 1D2R-based (a) nonvolatile crossbar array structure and (b) nonvolatile SP. The nonvolatile crossbar array is used in the CB and local interconnect The crossbar structure could significantly reduce the delay since the multiplexer has several transistors in series in the routing path. The details of each block will be discussed in the following. # A. Proposed Crossbar Array and Switch Point Based on the 1D2R nonvolatile element discussed in Section III, we propose the stacking RRAM-based schemes for both the nonvolatile crossbar array and switch point (SP), as shown in the Fig. 9(a) and (b). The CBs connects the channel wires to the pins of LBs. There are two main properties that can affect the routing flexibility of a design: 1) the flexibility of the CB, Fc and 2) the CB topology, which is the form of switches which makes the connection. With the high-density benefit of RRAM cells, the crossbar topology, as shown in Fig. 9(a), could be used to increase Fc and routing flexibility. In such scheme, each LB pin can be fully connected to the wires in the adjacent channel, and the delay on the switch could also be greatly reduced. The conventional 1R approach has the sneak path issue, which majorly increases the power and degenerates the configuration reliability. To address sneak path limitation, we use 1D2R design at each cross point to replace the conventional 1R structure. To restrict the voltage drop on the FPGA routing, the access devices, i.e., diodes, are not embedded in the routing wires. Therefore, routing wires and programming wires have a variety of metal layers. The RRAM cells could be detached from some of the cross points to attain difference Fc parameters. If channel width is W, LB cluster size is N, Fig. 10: Our proposed 1D2R-based nonvolatile LUT. It is an example of a two-input LUT, and it can be extended to the other LUT size LB input is I, LB output is O, and the flexibility of the CB is Fc, the number of RRAM cells in the CB is $$RCB = 2W(I + O)Fc.$$ (1) The required number of diodes in CB to program RRAM cells is $$DCB = W(I + O)Fc + W + I + O.$$ (2) To reduce the diode size, each time only one cross point in the CB is under the configuration. Therefore, two-word lines (Li) are pulled to the ground, and only one-bit line (Hi) is pulled up to Vset or Vreset. For example, to program top left cross point, the two RRAM cells R0a and R0b are under programming. Hence, L0 and L1 are at the ground, and H0 is at Vset or Vreset. With the minimized diode size, the leakage current of the diode is also decreased when the NVFPGA is in the usual operation phase. However, to limit the wire area, we connect difference Hi to the same bit line. For example, H1 and H3 connect to the same bit line. The details will be discussed in Section V. The SB has an alike structure as the CB. As shown in Fig. 9(b), there are two RRAM cells between every two nodes. Therefore, there are 12 RRAM cells in one SP. The total RRAM cells in one SB are $$RSB = 12W.$$ (3) In the same SP, each RRAM cell pair is programmed sequentially to reduce the diode size as discussed before. The RRAM cells in different SPs may be programmed in parallel to minimize the FPGA configuration time. The required number of diodes in SB program RRAM cells is $$DSB = 10W.$$ (4) #### **B. Proposed Lookup Table** We propose a novel nonvolatile LUT, as shown in Fig. 10. Our proposed 1D2R-based LUT is using a complementary structure where left side RRAM cells and their corresponding right side RRAM cells are programmed to the opposite RRAM states. For example, when the right side RRAM cells with the address A B are programmed to HRS, the left side RRAM cells with the address AB will be programmed to LRS. The output of the LUT is 0 when the input AB is 2b11. The LUT in Fig. 10 has only two inputs, but it can be extended to four, six, and other LUT size. Fig. 11: SB and CB structures used in the proposed NVFPGA. The switch box is based on universal architecture. To simplify, the 1D2R storage elements show only two RRAM cells in the dashed line boxes There are $4 \times 2K$ RRAM cells and $4 \times 2K$ diodes in a K-LUT. Each output of the LB may have a two-to-one multiplexer, and a totally connected crossbar local interconnect needs 2K N(I + O) RRAM cells. Therefore, the total RRAM cells in one LB are $$RLB = 2K N(I + O) + 4N \times 2K + 4O.$$ (5) The essential number of diodes in LB to program RRAM cells is $$DLB = K N(I + O) + 4N \times 2K + 6O + I + K N.$$ (6) During the normal FPGA operation phase, the top and bottom lines are connected to VDD and ground, respectively. Both top and bottom lines are connected to the word lines during the FPGA configuration phase. Only two of the word lines (L0 and VDD, or L1 and the ground) are enabled at the same time. The nodes Hi may share the same bit lines to minimize the wire area. For example, H0 and H1 connect to the same bit line. Besides the benefit of the smaller size and lower leakage power, the propagation delay is also reduced since there is no Vth drop from the storage element to the output. # 5. LAYOUT AND AREA ESTIMATION ## A. Routing of the RRAM Cells in the Proposed NVFPGA The layout of our proposed NVFPGA will be very diverse from the conventional SRAM-based FPGA layout to achieve the high density. The top-level floor plan of our proposed NVFPGA has been discussed in Section IV. In this section, we provide an RRAM-friendly layout design for both SBs and CBs to fit into the footmark of the CMOS transistors below the RRAM layer. Currently, the most widely used switch box structures are divided into [42], universal [43], [44], Hyper-Universal Switch Box (HUSB) [45], [46], and Wilton [47]. Disjoint is the classical Xilinx-style SB, which is also named as the subset SB [48]. Similar to the layout in [30], the universal type SB is used for the RRAM-friendly layout design in this paper. As shown in Fig. 11, two RRAM cells are placed at various SB edges. The SB elasticity Fs is set to three for the universal type SB, thus there are three rows/columns of RRAM cells at each edge of the SB. Fig. 12: (a) Cross-sectional view of the switch in CB. (b) Our proposed crossbar routing architecture to program the RRAM cells. The diodes are placed above the routing metals of the SB to select RRAM cells for programming. We have to pay attention to the connection of the programming wires. Line ① is pulled up to the write voltage, the other dashed lines should not be enabled to minimize the access diode and avoid write disturbance as shown in the Fig. 11. In other words, all dashed lines should be connected to different bit lines. Therefore, there are at least 12-bit lines in one SB. A fully connected ( $F_c = 1$ ) CB layout is shown in Fig. 11, where each cross point of the CB has two RRAM cells. As can be observed from Fig. 12, one of the RRAM cells is connected to the metal in the *x*-direction, whereas the other one is connected to the metal in the *y*-direction. The cross-section layout of one cross point switch is shown in Fig. 12(a), where the metal for channel routing may be placed below the metal for connecting to the pins of the LB. Since the metals in both *x* and *y* directions are used for the word lines (*L*), we use a third direction for the bit lines (*H*), as demonstrated in Fig. 12(b). Therefore, each time only one cross point switch is selected if two-word lines (one in the *x*-direction and one in the *y*-direction) and one-bit line are enabled. The bit lines should be alternatively routed in the different metal layers If we want to achieve the smallest space between two-bit lines,. Otherwise, their spaces should be $\sqrt{2}F$ . The tile area of the RRAM layer is attained by the CB channel width W, feature size F, logic cluster size N, LUT inputs K, LB inputs I, and outputs I. If the pitch between two metal wires is I, the width of the CB is $2\sqrt{2W}F$ . The minimum length of the CB is attained by the LB input *I* and LB output *O*. Since the length of each 1D2R cell is 4F, the minimum length of CB is $L_{CB} = 2(I + O)F$ . Therefore, the minimum area of the CB is $$SCB = 4\sqrt{2(I+O)W}F2$$ . (7) As there are three row/column RRAM cells at each edge of the SB, the width of the SB is $2\sqrt{2(W+6)}F$ . As a result, the area of the SB is $$S_{\rm SB} = 8(W+6)^2 F^2$$ . (8) The SB area is only around 2/9 of the SB area that was suggested in [49]. The minimum area of the RRAM layer $$S_{LB} = 4(2KN(I+O) + 4N \times 2^{K} + 4O)F^{2}.$$ (9) Therefore, the side length of a square-shaped LB is $L_{LB} = 2(2K N(I + O) + 4N \times 2^K + 4O)^{1/2}F$ . The area required, diodes, and RRAM cells of each FPGA block are tabulated in Table I. The total tile area of the RRAM layer has two cases. It may be attained by the LB area and the CB area when the length of LB is shorter and longer than that of CB. The total tile area of the proposed NVFPGA is given in the following: $$S_{\text{tile}} \qquad (10)$$ $$(2\sqrt{2}(W+6)F + L_{\text{CB}})^2 = \int_{\text{if } L_{\text{LB}} > L_{\text{LB}}} (2\sqrt{2}(W+6)F + L_{\text{LB}})^2$$ # **B.** Area Estimation The tile area has to be projected before the evaluation of the FPGA performance. Only the tile area is estimated in this paper, and the other common overheads such as FPGA programming circuits are not taken into consideration in the comparison. To compare the relative merits of our proposed 1D2R-based FPGA scheme, the conventional NVFPGA schemes, and the CMOS-based FPGA scheme, we perform area calculations with LUT input size K = 6, logic cluster size N = 10, LB inputs I = 40, LB outputs O = 20, a fixed routing channel width O = 100, O = 100, and O = 100, where O = 100, where O = 100, where O = 100 is and outputs, respectively. There are two flipflops in one BLE. We estimate the footprint of a baseline CMOS FPGA tile to be 33355 T for the above parameters. Table I: Number of RRAM cells and the RRAM area partition of each FPGA block | Blocks | LB | СВ | SB | |------------|-----------------------------------------|-------------------------|---------------| | RRAM Cells | $2KN(I + O) + 4N \times 2^K + 4O$ | $2W(I + O)F_c$ | 12W | | Diodes | $KN(I+O) + 4N \times 2^K + 6O + I + KN$ | $W(I+O)F_c + W + I + O$ | 10W | | Area | $4(2KN(I + O) + 4N \times 2^K + 4O)F^2$ | $8\sqrt{2}(I+O)WF^2$ | $8(W+6)^2F^2$ | TABLE II FABRICATION COST COMPARISON BETWEEN THE PROPOSED NVFPGA AND THE SRAM-BASED FPGA. THE TILE AREA OF THE CMOS-BASED FPGA IS NORMALIZED TO ONE | Scheme | Mask | Tile Area | IO Area | Total Cost | |----------|------|-----------|---------|------------| | Proposed | 45 | 0.299 | 0.3 | 26.96 | | SRAM | 40 | 1 | 0.3 | 52 | Fig. 13: Area consumptions of the SRAM-based FPGA tile, 1R-based FPGA minimum width transistor area of $60 \times F2$ and a feature size of the tile, 1T2R-based FPGA tile, and our proposed 1D2R-based FPGA tile. The switch and RRAM area in our proposed 1D2R-based scheme are negligible 45 nm gives us SRAM-based FPGA tile area of 4052.6 $\mu$ m<sup>2</sup> because they are placed on top of the CMOS circuits. Using A area, and the switch and SRAM in the global interconnect occupy around 29.4% of the total tile area. The size of the access transistors in the 1R-based FPGA distresses the tile area. The estimated 1R tile area is 2091.083, 1651.523, 1431.743, and 1211.96 $\mu$ m<sup>2</sup> when the access transistor size is 4×, 2×, 1×, and 0×, respectively. We also estimated the 1T2R-based FPGA using the architecture as the SRAM-based FPGA. The size of the access transistor in the 1T2R will also affect the tile area. The estimated 1T2R tile area is 3292.0425, 2703.9825, 2409.9525, and 2074.6125 $\mu$ m<sup>2</sup> when the access transistor size is 4×, 2×, 1×, and 0×, respectively. Even if the 0× access transistor size is used, its area reduction is only around 50%. Since the area reduction is 40%, as reported in [26], the 1× size access transistor is used to estimate the tile area of the 1R- and 1T2R-based NVFPGAs. By stacking RRAM cells and diodes on the top of the CMOS circuitries, the area of the tile is greatly reduced. Crossbar architecture is used for local interconnect and global interconnect. Since the complementary LUT structure is used, the input buffer size of the LUT doubles. Therefore, there are 360 minimum width transistors in one LUT. The CMOS area of the proposed 1D2R-based NVFPGA tile is 9975 minimum width transistors (34.8 $\mu$ m × 34.8 $\mu$ m). Based on the FPGA parameters used in this paper, the lengths of LB and CB in the proposed NVFPGA are 200*F* and 120*F*. Therefore, $L_{LB} > L_{CB}$ , and the tile area of the RRAM layer is 502.6 $\mu$ m<sup>2</sup> (22.42 $\mu$ m × 22.42 $\mu$ m), which is smaller than the CMOS layer. Without stacking RRAM cells on top of the CMOS circuitries, the tile area is 48.3 $\mu$ m × 48.3 $\mu$ m. By other means, stacking RRAM cells on top of CMOS circuitries further reduces 27.56% of the tile area. As shown in the figure 13 the detailed area breakdown of our proposed NVFPGA tile can be partitioned. The percentage of the interconnect and SRAM area is minimized from 84.4% in the SRAM-based FPGA tile to 46.1% in our proposed 1D2R-based FPGA tile. The tile area is minimized from 4052.6 to 1211.96 $\mu$ m<sup>2</sup> (3.34× area reduction). The fabrication of the proposed NVFPGA is as same as the SRAM-based FPGA with five extra masks for 1D2R cells. Fig. 14: Simulation diagram of the diodeless or transistor-free crossbar array with parasitic resistance (Rp) in the word lines and bit lines. Fig. 15: (a) Normalized write voltage across the selected RRAM cell. (b) Normalized driven current at the bit line or word line. (c) Analysis of the write current in different RRAM array schemes. (d) Normalized total write power. All results are normalized to a single RRAM cell. The mask cost dominates the initial investment in chip fabrication. Thus, the numerous number of masks can be used to calculate the cost involved. For a standard 45-nm CMOS circuit, it needs 40 masks [50]. Therefore, the mask cost of our proposed NVFPGA is increased to 12.5%. We assume the area of the FPGA peripheral circuits and IOs occupies 30% of the tile area, thus the proposed NVFPGA is still remained 54% smaller than the SRAM-based FPGA. As a result, our proposed NVFPGA saves the total cost by around 48.2%. The details of the fabrication cost are tabulated in Table II. #### 6. SIMUSLATION RESULTS In this section, we first evaluate the write reliability of the proposed scheme and the conventional 1R-based NVFPGA scheme. After that, we provide the SPICE simulation results based on the schematic in Fig. 8 and the LUT performance comparison. Finally, the speed and power of four FPGA schemes are evaluated by the Versatile Place and Route (VPR) software [51]. The RRAM parameters are extracted from the measurement results of the RRAM cells fabricated by the process in [52]. Its low resistance ( $R_L$ ) and high resistance ( $R_H$ ) are $10^3$ and $10^9$ , respectively. # A. Write Power and Reliability A SPICE model with parasitic resistors in both bit lines (H) and word lines (L) is used to simulate the write voltage distribution, write power, and write error rate as shown in Fig. 13. In this simulation, copper is used for the bit lines and word lines, and the thickness of the metal is four times the width of the metal. Therefore, the square sheet resistance is about 0.1, and the parasitic resistance between two adjacent cells with 2F pitch is 0.2. In this simulation, all unselected RRAM cells are set to LRS (worst case of the leakage current) the write voltage on the selected RRAM cell, external driven current, and write power of various array sizes from $1\times1$ to $128\times128$ were evaluated, as shown in Fig. 15. It can be observed from Fig. 15(a) that the write voltage on the selected cell with the V/2, V/3, and floating schemes drops to 25% when the array size is $128 \times 128$ (M = 128) due to the sneak path leakage current. The proposed diode based scheme has less than 3% voltage drop on the selected RRAM cell, as the leakage current is nearly isolated by the OFF-state diodes. The small voltage drop is mainly due to the IR drop in the H lines and L lines. In the V/2, V/3, and floating schemes, if all deselected RRAM cells are at HRS, the standardized write voltage on the selected cell is closed to one. As a result, the write voltage on the selected cell has a very wide distribution (0.25-1). Increasing the input driven voltage to increase the write voltage on the selected cell leads to much higher write energy, breakdown risk, and write disturbance in the unselected cells. To switch a cell, the normalized input write driven current at the selected bit line is shown in the Fig. 15(b). Compared with the three diodeless schemes, the proposed diode-based scheme consumes over 100 times less write current when M > 100. The proposed diode-based scheme with different array sizes has a constant write current requirement. Since the write current to switch an RRAM cell is fixed, the total current of the diodeless array will be extremely large. The high write current not only increases the write power but also requires a large area of the write drivers and wires. As shown in Fig. 15(c), the diodeless schemes spend a very large portion of the write current on the unselected cells. The V/3 scheme is even worse since all unselected cells are biased at one-third of the write voltage. In comparison, almost all of the write current goes to the selected RRAM cell in the diode-based scheme. Fig. 15(d) provides the total power consumption with a fixed input write voltage at the bit line. The results show that the write power of the diode-based scheme with different array sizes is constant. However, the write power is linearly increased in the V/2 and floating schemes and exponentially increased in the V/3 scheme when the array size increases. The diodes scheme not only requires a large area and high write power but also has an extremely low write reliability. Because the local interconnect is a $60 \times 60$ crossbar array, we choose $64 \times 64$ array with V/2 write scheme as an example to explain its low write reliability. All unselected RRAM cells are still set to LRS. As shown in Fig. 16(a), the voltage drop gets worse from bottom left to top right, since the write drivers are located at the left and bottom sides of the array. Longer metal lines result in a much lower voltage across the selected cell. The histogram of Fig. 16(a) is illustrated in Fig. 16(b). The normalized write voltage on the selected RRAM cell is spread between 0.6 and 1. Most of the voltage on the selected Table III: Simulation results of the *RC* delay among our proposed scheme, the conventional 1R, 1T2R, and SRAM schemes based on the schematic in fig. 8 | Delay (ps) | $A{ ightarrow} B$ | $B{ ightarrow} C$ | C→D | D→E | E→out | $E{ ightarrow}D$ | A→out | |------------|-------------------|-------------------|--------|--------|-------|------------------|--------| | Proposed | 52.96 | 45 | 43.2 | 272.27 | 28.29 | 41.8 | 441.72 | | 1R | 51.64 | 42.64 | 41.1 | 257.93 | 25.7 | 41.3 | 419.47 | | 1T2R | 168.1 | 140.7 | 103.9 | 391 | 35 | 98.3 | 838.7 | | SRAM | 183.7 | 152.77 | 117.93 | 446 | 39.1 | 106.24 | 939.5 | Fig. 16: (a) Write voltage distribution in a $64 \times 64$ diodeless crossbar RRAM array due to the parasitic resistance in the word lines and bit lines. (b) Histogram plot of the normalized write voltage distribution in a $64 \times 6464$ diodeless crossbar RRAM array. Black color represents successfully diodeless crossbar RRAM array. (c) Programming results in the $64 \times 6464$ programmed cells and white color represents unprogrammed cells. (d) Write error rate comparison between V/2 write scheme and the proposed scheme using a diode as the selector. # **B. RC Delay Simulation Results** The *RC* delay was simulated based on the schematic in Fig. 8. One line is enabled from the input of SB (*A*) to the output of LB (out). *RC* model is inserted at each node, i.e., an *RC* delay of the metal in SB, CB, local interconnect, and so on. The parasitic resistance and capacitance are estimated based on the area evaluation results in Section V. The space and width of the wires between two channels are set to equal value. We set the space between two metals to 1*F* in SB, CB, and the local the SRAM-based LUT, which is because of the higher input loading capacitance. The proposed scheme also reduces 12% leakage power from the 1R-based scheme. The area of the projected 1D2R-based LUT is 27% and 38% smaller than the 1R- and SRAM-based LUTs, respectively. The 1T2R-based LUT is 28% smaller than the proposed LUT, but it has 29 times higher leakage power and also 43.6% longer delay. #### C. LUT Comparison We further calculated the area, speed (inverse of the delay), and power of our proposed LUT, the 1R-, 1T2R-, and SRAM based LUTs. The inputs of the LUTs have been set to six. The 1R scheme uses the same LUT structure, as shown in Fig. 10, but by replacing all 1D2R with 1R. The simulation results are therefore summarized in Table IV. Compared to the SRAM based LUT, our proposed LUT improves the speed and leakage power by 39% and 25%, respectively. The speed is thus improved due to no $V_{th}$ drop in the LUT. The leakage power is improved by exchanging the SRAM cells with RRAM cells. The dynamic power of the proposed LUT is 41.8% higher than Estimating our proposed 1D2R-based FPGA scheme is supported by the VPR software, which is very flexible to compare the newly developed FPGA architecture and several other different FPGA architectures [51] local interconnect (after RRAM) are 20, 15, 20, 15, and 7 $\mu$ m, respectively. TABLE IV: Speed, power, and area comparison among different LUT schemes | | Delay | Dynamic | Leakage | Number of | |----------|-----------|----------|---------|-------------| | | | Power | Power | Transistors | | Proposed | 272.27 ps | 10.28 fJ | 2.25 nJ | 360 | | 1R | 257.93 ps | 10.99 fJ | 2.55 nJ | 490 | | 1T2R | 391 ps | 7.74 fJ | 65 nJ | 260 | | SRAM | 446 ps | 7.75 fJ | 2.99 nJ | 580 | Fig. 17: (a) Delay simulation results. (b) Power simulation results. The three schemes are simulated based on 20 MCNC test benches with VPR. Therefore, the estimated capacitances in the SB, CB, CB to LB, and the local interconnect before RRAM and after RRAM are 4, 3, 1.35, 1, and 1.41 fF, respectively. The *RC* delay simulation results are used in the VPR simulation. As shown in Fig.17 (a), it is compared with the SRAM-based FPGA, the speed of our proposed 1D2R-based FPGA improves from 1.5× in the ex1010 benchmark to the 1.9× in the different benchmark. The averaged speed is thus improved by 1.63times. As shown in Fig. 17(b), the power of our proposed 1D2R-based FPGA decreases from 35.6% in the bigkey benchmark to the 53.4% in the frisc benchmark. The average power reduction is estimated to be about 43.6%. Compared with 1T2R-based NVFPGA, our proposed NVFPGA accelerates the average speed by 1.53 times with a lower power of 40.5%. The delay and dynamic power are greatly decreased due to the much shorter routing length and the improved LUT architecture. Compared with the 1R-based NVFPGA, the proposed 1D2R-based NVFPGA has shown slight low performance in terms of speed and power consumption, i.e., 6.3% decreases in speed and 1.8% increases in power because of the doubling of the switch resistance. However, such loss is worthy because the write fault in 1D2R is significantly reduced, which is the most important factor in FPGA design. We define the figure of merit (FOM) of the FPGAs by not only considering power and delay but also including write error rate and area, as given in the following: $$FOM = \frac{Power X AreaX Write Error rate}{Speed}$$ (11) The minimum value of the defined FOM shows the better design of the FPGA. The proposed NVFPGA has the FOM as small as $8.9 \times 10^{-10}$ , which is 4.1 times and $9.8 \times 10^{7}$ times smaller than 1T2R- and 1R-based NVFPGAs, respectively. The large FOM of 1R-based NVFPGA is majorly caused by its high write error rate. The speed, power, area, write error rate, and FOM results of different NVFPGAs has been tabulated in table V. Table V: Speed, power, area, and write error rate comparison among different NVFPGAs. The results of the speed, area, and power have been normalized to the SRAM-based FPGA. | | FPGA<br>Speed | FPGA<br>Power | Tile<br>Area | Write Er-<br>ror Rate | FOM | |----------|---------------|---------------|--------------|-----------------------|-----------------------| | Proposed | 1.63 | 0.564 | 0.3 | 8.6×10 <sup>-9</sup> | $8.9 \times 10^{-10}$ | | 1R | 1.74 | 0.554 | 0.35 | 0.784 | $8.7 \times 10^{-2}$ | | 1T2R | 1.06 | 0.94 | 0.6 | 8.6×10 <sup>-9</sup> | $4.6 \times 10^{-9}$ | # 7. CONCLUSION In this paper, we have proposed a 1D2R-based nonvolatile storage element and 1D2R-based NVFPGA architecture. On comparison with the SRAM-based FPGA, our proposed 1D2R scheme has greatly reduced the area and power by 70% and 43.6%, and improved the speed by 63%. On Comparison with the 1T2R-based NVFPGA, our proposed NVFPGA accelerates the average speed by 53% with a 40.5% lower power. On comparison with the conventional 1R-based NVFPGA, it has significantly improved the write consistency with only a 6.3% performance reduction. The results have shown that the write error rate is as low as $8.6 \times 10^{-9}$ in the $64 \times 64$ crossbar array. The results suggest that our proposed 1D2R-based scheme is an auspicious solution to attain low-power, high-speed, and high-reliability NVFPGAs. # 8. REFERENCES - [1] S. Brown, J. Rose, and Z. Vranesic, "A detailed router for field programmable gate arrays," in Proc. IEEE Int. Conf. Compute.-Aided Design, Nov. 1990, pp. 382–385. - [2] P. Chow, S. O. Seo, J. Rose, K. Chung, G. Paez-Monzon, and I. Rahardja, - "The design of an SRAM-based field-programmable gate array. I. Architecture," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 7, no. 2, pp. 191–197, Jun. 1999. - [3] I. Kuon, R. Tessier, and J. Rose, "FPGA architecture: Survey and challenges," Found. Trends Electron. Design Autom., vol. 2, no. 2, pp. 135–253, 2008. - [4] T.-J. Lin, W. Zhang, and N. K. Jha, "SRAM-based NATURE: A dynamically reconfigurable FPGA based on 10T low-power SRAMs," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 11, pp. 2151–2156, Nov. 2012. - [5] C. C. Wang, F.-L. Yuan, T.-H. Yu, and D. Markovic, "A multi-granularity FPGA with hierarchical interconnects for efficient and flexible mobile computing," in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2014, pp. 460–461. - [6] V. De and S. Borkar, "Technology and design challenges for low power and high performance [microprocessors]," in Proc. Int. Symp. Low Power Electron. Design, 1999, pp. 163–168. - [7] T. Tuan and B. Lai, "Leakage power analysis of a 90 nm FPGA," in Proc. IEEE Custom Integr. Circuits Conf., Sep. 2003, pp. 57–60. - [8] T. Tuan, S. Kao, A. Rahman, S. Das, and S. Trimberger, "A 90 nm low-power FPGA for battery-powered applications," in Proc. ACM/SIGDA 14th Int. Symp. Field Program. Gate Arrays, 2006, pp. 3–11. - [9] Y. Hosoi et al., "High speed unipolar switching resistance RAM (RRAM) technology," in Proc. IEEE Int. Electron Devices Meeting (IEDM), Dec. 2006, pp. 1–4. - [10] H. Y. Lee, "Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM," in Proc. IEEE Int. Electron Devices Meeting (IEDM), Dec. 2008, pp. 1–4. - [11] D. Halupka et al., "Negative-resistance read and write schemes for STTMRAM in 0.13 $\mu$ m CMOS," in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2010, pp. 256–257. - [12] S.-S. Sheu et al., "A 4 Mb embedded SLC resistive-RAM macro with 7.2 ns read-write random-access time and 160 ns MLC-access capability," in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2011, pp. 200–202. - [13] X. Yang and I.-W. Chen, "Dynamic-load-enabled ultra-low power multiple-state RRAM devices," Sci. Rep., vol. 2, no. 744, Oct. 2012. - [14] K. Huang, N. Ning, and Y. Lian, "Optimization scheme to minimize reference resistance distribution of spin-transfer-torque MRAM," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 5, pp. 1179–1182, May 2014. - [15] K. Huang and Y. Lian, "A low-power low-VDD nonvolatile latch using spin transfer torque MRAM," IEEE Trans. Nanotechnol., vol. 12, no. 6, pp. 1094–1103, Nov. 2013. - [16] K. Huang, R. Zhao, N. Ning, and Y. Lian, "A low power localized 2T1R STT-MRAM array with pipelined quad-phase saving scheme for zero sleep power systems," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 61, no. 9, pp. 2614–2623, Sep. 2014. [17] H.-S. P. Wong et al., "Phase change memory," Proc. IEEE, vol. 98, no. 12, pp. 2201–2227, Dec. 2010. - [18] R. Simpson et al., "Interfacial phase-change memory," Nature Nanotechnol., vol. 6, no. 8, pp. 501–505, 2011. - [19] K. Huang, Y. Ha, R. Zhao, A. Kumar, and Y. Lian, "A low active leakage and high reliability phase change memory (PCM) based non-volatile FPGA storage element," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 61, no. 9, pp. 2605–2613, Sep. 2014. - [20] T. H. Lee, D. Loke, K.-J. Huang, W.-J. Wang, and S. R. Elliott, "Tailoring transient-amorphous states: Towards fast and power-efficient phase-change memory and neuromorphic computing," Adv. Mater., vol. 26, no. 44, pp. 7493–7498, 2014. - [21] Y.-C. Chen et al., "An access-transistor-free (0T/1R) non-volatile resistance random access memory (RRAM) using a novel threshold switching, self-rectifying chalcogenide device," in Proc. IEEE Int. Electron Devices Meeting (IEDM), Dec. 2003, pp. 37.4.1–37.4.4. - [22] Y.-B. Kim et al., "Bi-layered RRAM with unlimited endurance and extremely uniform switching," in Proc. IEEE Symp. VLSI Technol. (VLSI), Jun. 2011, pp. 52–53. - [23] M. Chang et al., "A 0.5 V 4 Mb logic-process compatible embedded resistive RAM (ReRAM) in 65 nm CMOS using low-voltage current mode sensing scheme with 45 ns random read time," in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2012, pp. 434–436. - [24] C. H. Cheng, A. Chin, and F. S. Yeh, "Stacked GeO/SrTiOx resistive memory with ultralow resistance currents," Appl. Phys. Lett., vol. 98, no. 5, pp. 052905-1–052905-3, Jan. 2011. - [25] M. Liu and W. Wang, "FDA: CMOS-nano hybrid FPGA using RRAM components," in Proc. IEEE Int. Symp. Nanoscale Archit., Jun. 2008, pp. 93–98. - [26] Y. Y. Liauw, Z. Zhang, W. Kim, A. E. Gamal, and S. S. Wong, - "Nonvolatile 3D-FPGA with monolithically stacked RRAM-based configuration memory," in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2012, pp. 406–408. - [27] Y.-C. Chen, W. Wang, H. Li, and W. Zhang, "Non-volatile 3D stacking RRAM-based FPGA," in Proc. 22nd Int. Conf. Field Program. Logic Appl., 2012, pp. 367–372. - [28] W. Wang, T. T. Jing, and B. Butcher, "FPGA based on integration of memristors and CMOS devices," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May/Jun. 2010, pp. 1963–1966. - [29] S. Tanachutiwat, M. Liu, and W. Wang, "FPGA based on integration of CMOS and RRAM," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 11, pp. 2023–2032, Nov. 2011. - [30] J. Cong and B. Xiao, "mrFPGA: A novel FPGA architecture with memristor-based reconfiguration," in Proc. IEEE/ACM Int. Symp. Nanoscale Archit. (NANOARCH), Jun. 2011, pp. 1–8. - [31] Y.-C. Chen, W. Zhang, and H. Li, "A look up table design with 3D bipolar RRAMs," in Proc. 17th Asia South Pacific, Design Autom. Conf. (ASPDAC), Jan./Feb. 2012, pp. 73–78. - [32] Y.-C. Chen, H. Li, and W. Zhang, "A novel peripheral circuit for RRAMbased LUT," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2012, pp. 1811–1814. - [33] V. Betz, J. Rose, and A. Marquardt, Architecture and CAD for DeepSubmicron FPGAS. Boston, MA, USA: Kluwer, 1999. - [34] R. Muenstermann, T. Menke, R. Dittmann, and R. Waser, - "Coexistence of filamentary and homogeneous resistive switching in Fedoped SrTiO3 thin-film memristive devices," Adv. Mater., vol. 22, no. 43, pp. 4819–4822, 2010. - [35] H. Okushi, A. Matsuda, M. Saito, M. Kikuchi, and Y. Hirai, "Polarized (letter '8') memory effects in hetero-systems and non hetero-systems," Solid State Commun., vol. 11, no. 1, pp. 283–286, 1972. - [36] M.-J. Lee et al., "2-stack 1D-1R cross-point structure with oxide diodes as switch elements for high density resistance RAM applications," in Proc. IEEE Int. Electron Devices Meeting (IEDM), Dec. 2007, pp. 771–774. - [37] Y. Sasago et al., "Cross-point phase change memory with 4F2 cell size driven by low-contact-resistivity poly-Si diode," in Proc. IEEE Symp. VLSI Technol. (VLSIT), Jun. 2009, pp. 24–25. - [38] G. Tallarida et al., "Low temperature rectifying junctions for crossbar non-volatile memory devices," in Proc. IEEE Int. Memory Workshop, May 2009, pp. 1–3. - [39] K. Gopalakrishnan et al., "Highly-scalable novel access device based on mixed ionic electronic conduction (MIEC) materials for high density phase change memory (PCM) arrays," in Proc. IEEE Symp. VLSI Technol. (VLSIT), Jun. 2010, pp. 205–206. - [40] C. Xu, X. Dong, N. P. Jouppi, and Y. Xie, "Design implications of memristor-based RRAM cross-point structures," in Proc. Design, Autom., Test Eur. Conf. Exhibit. (DATE), Mar. 2011, pp. 1–6. - [41] J. Cong and B. Xiao, "FPGA-RPI: A novel FPGA architecture with RRAM-based programmable interconnects," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 4, pp. 864–877, Apr. 2014. - [42] The Programmable Logic Data Book, Xilinx, San Jose, CA, USA, 1994. - [43] Y.-W. Chang, D.-F. Wong, and C. K. Wong, "Universal switch modules for FPGA design," ACM Trans. Design Autom. Electron. Syst., vol. 1, no. 1, pp. 80–101, 1996. - [44] M. Shyu, G.-M. Wu, Y.-D. Chang, and Y.-W. Chang, "Generic universal switch blocks," IEEE Trans. Comput., vol. 49, no. 4, pp. 348–359, Apr. 2000. - [45] H. Fan, J. Liu, Y.-L. Wu, and C.-C. Cheung, "On optimum switch box designs for 2-D FPGAs," in Proc. Design Autom. Conf. (DAC), 2001, pp. 203–208. - [46] H. Fan, J. Liu, and Y.-L. Wu, "General models and a reduction design technique for FPGA switch box designs," IEEE Trans. Comput., vol. 52, no. 1, pp. 21–30, Jan. 2003. - [47] S. J. E. Wilton, "Architectures and algorithms for fieldprogrammable gate arrays with embedded memory," Ph.D. dissertation, Dept. Elect. Comput. Eng., Univ. Toronto, Toronto, ON, Canada, 1997. - [48] H. Schmit and V. Chandra, "FPGA switch block layout and evaluation," in Proc. ACM/SIGDA 10th Int. Symp. Field Program. Gate Arrays, 2002, pp. 11–18. - [49] C. Dong, S. Chilstedt, and D. Chen, "Reconfigurable circuit design with nanomaterials," in Proc. Design, Autom., Test Eur. Conf. Exhibit. (DATE), Apr. 2009, pp. 442–447. - [50] B. P. Wong, A. Mittal, G. W. Starr, F. Zach, V. Moroz, and A. Kahng, Nano-CMOS Design for Manufacturability: Robust Circuit and Physical Design for Sub-65 nm Technology Nodes. New York, NY, USA: Wiley, 2008. - [51] J. Luu et al., "VTR 7.0: Next generation architecture and CAD system for FPGAs," ACM Trans. Reconfigurable Technol. Syst., vol. 7, no. 2, Jun. 2014, Art. ID 6. - [52] W. Guan, S. Long, Q. Liu, M. Liu, and W. Wang, "Nonpolar nonvolatile resistive switching in Cu doped ZrO2," IEEE Electron Device Lett., vol. 29, no. 5, pp. 434–437, May 2008. - [53] S. Yang, Logic Synthesis and Optimization Benchmarks User Guide Version 3.0. Research Triangle Park, NC, USA: MCNC, 1991.