Modern electronic designs such as mobiles, laptops, cloud computing, and networking demand very high performance. Apart from processor speed/performance, memories play a very critical role in overall system performance. The Double Data Rate (DDR) memories have been a common choice of designers in all complex devices due to its low latency, bigger storage size, and low power consumption.
We will discuss DDR memories focusing more on the following topics:
Introduction to memory
Memories are the data storage devices in electronic products. They store the processed information/data and make it available to the controller whenever requested. At high level, memories are categorized into primary memory and secondary memory.
Primary memory is further categorized into Random Access Memory (RAM) and Read-Only Memory (ROM). RAM is a volatile memory and data will be lost after the power is switched off. ROM memory retains the data even after power is turned off. As the VLSI technology advanced, memory construction, memory chip density, size, speed, and communication interface have been improved to a great extent.
Difference between SRAM and SDRAM
RAM is further categorized into SRAM and SDRAM. SRAM is Static RAM and SDRAM is Synchronous Dynamic RAM. The architectural difference between the two is that DRAM uses 1 transistor and 1 capacitor per memory bit, whereas SRAM uses 1 flip-flop (~ 6 transistors per flip-flop) to form one bit. The SDRAMs are slightly slower due to higher access time compared to SRAMs. As the size of a capacitor/transistor is smaller than that of a flip-flop, the memory density of SDRAM is higher compared to SRAM. SDRAM is dynamic storage since the capacitors tend to discharge over time, and unless they are refreshed periodically SDRAM will not retain the stored data.
SRAM has been a default choice for cache memory as they are extremely fast and have very low access time. They reside internally in the processor or can be interfaced externally. The cache acts as a buffer between external RAM and the processor. This memory stores the frequently used data/instructions and makes them immediately available to the processor whenever requested. In general, cache memory reduces the average time to access data from the Main memory.
As stated earlier, SDRAM stands for synchronous dynamic RAM where the I/O, internal clock and bus clock are synchronous. For instance, the PC133 I/O, internal clock and bus clock are all 133MHz. Single Data Rate signifies that SDR SDRAM can only read/write one time per clock cycle. The SDR SDRAM needs to wait for the completion of the previous command to be able to do another read/write operation.
What are the different types of DDR RAMs?
The demand for higher data rates and larger data densities lead to the evolution of SDR into the DDR concept. The demand for higher data rates and larger data densities lead to the evolution of SDR into the DDR. In the DDR SDRAM, the data is clocked at both edges – positive as well as negative edge – which results in doubling the data rate. In this way, DDR achieves greater bandwidth compared to SDR SDRAM; it doubles the transfer rate without increasing the frequency of the clock.
Over the last few decades, many memory improvements have occurred in DDR technology. The DDR has become extremely popular in the market and is used extensively in notebooks, laptops, servers, and embedded computing systems. The DDR offers many enhancements such as increased speed of operation, improved storage densities, reduced power consumption, and adds error detection functions such as CRC, reduced SSN noise by implementing the bit inversion concept. In the following section, we will discuss the evolution of DDR memories and their benefits.
First generation – DDR SDRAM
The first generation of DDR memory had a prefetch buffer of 2-bit, which is twice that of SDR SDRAM. The internal clock speed of 133 ~ 200MHz gave the transfer rate of DDR1 as 266 to 400 MT/s (Million Transfer Per Second). The DDR1 ICs were released in the market in the year 1998.
Second-generation – DDR2 SDRAM
DDR2 operates with an external data bus twice as fast as DDR1 SDRAM. This is achieved by the improved bus signal. The prefetch buffer of DDR2 is 4-bit which is double that of DDR SDRAM. DDR2 memory has the same internal clock speed (133 ~ 200 MHz) as the DDR memory. However, the DDR2 memory has improved transfer rate (533 ~ 800 MT/s) and the I/O bus signal. DDR2-533 and DDR2-800 memory types were released in the market in the year 2003.
Third-generation – DDR3 SDRAM
DDR3 operates at double the speed of DDR2. This is achieved by further improvements in the bus signal. DDR3’s prefetch buffer width is 8-bit which is double of DDR2. The transfer rate of DDR3 is 800 ~ 1600 MT/s. DDR3 operates at a low voltage of 1.5V compared with DDR2’s 1.8V which results in 40% less power consumption. The DDR3 has two added functions namely – ASR (Automatic Self-Refresh) and SRT (Self-Refresh Temperature).
The DDR3 ICs were released in the market in the year 2007.
Fourth-generation – DDR4 SDRAM
DDR4 operates with double the speed of DDR3. DDR4 operates on low operating voltage (1.2V) and higher transfer rate. The transfer rate of DDR4 is 2133 ~ 3200MT/s. DDR4 adds four new bank groups technology. Each bank group has the feature of a single-handed operation. The DDR4 can process 4 data within a clock cycle, so DDR4’s efficiency is better than DDR3. DDR4 has some additional functions such as DBI (Data Bus Inversion), CRC (Cyclic Redundancy Check) on data bus, and Command/Address parity. These functions can enhance DDR4 memory’s signal integrity and improve the stability of data transmission/access. Independent programming of individual DRAMs on a DIMM allows better control of on-die termination.
The DDR4 ICs were released in the market in the year 2014.
Fifth-generation – DDR5 SDRAM
DDR5 operates with double the speed of DDR4. The transfer rate of DDR5 is 3200 ~ 6400 MT/s.The DDR5 specification was released in Nov 2018 and ICs are expected to be in the market by 2022.
Enhancements in DDR5 Vs DDR4
We will now discuss the most significant changes in DDR5 versus DDR4.
- Improved clock speed (1.6GHz to 3.2GHz)
- Improved data speed (3.2Gbps to 6.4 Gbps)
- Inclusion of new feature such as Decision Feedback Equalization (DFE)
Lower voltage level:
Operating voltage VDD is changed from 1.2V to 1.1V, which reduces power consumption. On the other side, lower VDD means a smaller margin for noise immunity.
Introduction of power architecture for DDR5:
With the addition of DDR5 DIMMs, power management moves from the motherboard to the DIMM itself. DDR5 DIMMs have 12V power management IC which allows better granularity of system power loading and helps with signal integrity and noise issues.
Improvement in channel architecture DDR5:
DDR4 DIMMs have a 72-bit bus comprising 64 data bits plus eight ECC bits (Error Correcting Code). In DDR5, each DIMM will have two 40-bit channels(32 data bits and 8 ECC bits). While the data width is the same (64-bits total) having two smaller independent channels improves memory access efficiency. The benefit of this is higher MT/s is amplified by greater efficiency.
In DDR4, the RCD Registered Clock Driver (RCD) provides two output clocks per side. The RCD in DDR5 provides 4 output clocks per side, providing each lane an independent clock. This improves signal integrity and helps us to address the lower noise margin issue that occurs due to reducing the VDD.
Enhanced burst length to 16 in DDR5:
The DDR4 burst length is eight whereas, for DDR5 the burst length will be extended to eight and sixteen to increase burst payload. Burst length of 16 (BL16) enables a single burst to access 64 bytes of data. This results in a significant improvement in concurrency and with two channels, greater memory efficiency.
Supports higher capacity DRAM:
DDR5 buffer chip DIMMs enable the system designers to use densities of up to 64 Gb DRAMs in a single-die package. DDR4 provides 16 Gb DRAM in a single-die package.
In the following table, we have compared some of the critical features of various generation DDR RAMs for a better understanding.
Voltage (Core, I/O)
Chip Storage Densities
4 -> 8
Page Size x4/x8/16
100 to 166 MHz
133 to 200
133 to 200 MHz
133 to 200 MHz
Clock DLL -enabled
DQ Bus voltage
150, 75, 50 Ohms
Memory data transfer speed
The speed of the memory data transfer determines how fast programs will execute. The importance of transfer rate is clearly realized when you are running multiple software applications simultaneously or an imaging application. The memory transfer rate is determined by three factors such as memory bus clock rate, the type of transfer process, and the number of bits transferred.
I/O bus clock
n = data bits
Data Transfer rate = 2 * I/O clock
= data Trnsfr * 8 Byte
Brief working operation of DDR4
The interface between memory and the processor for DDR4 standard is shown in the following figure. This interface consists of group signals which include data, address, clock, and control signals.
Below table lists some of the basic and important signals used in data transfer between processor and SDRAM memory.
Chip select is active low signal, this signal enables the memory IC for read/write operation
Clock Enable. HIGH enables the internal clock signals device input buffers and output drivers.
Clock is a differential signal. All address and control signals are sampled at the crossing of posedge and negedge of clock.
DQ(single)/ DQS_t/DQS_c (Differential)
Data Bus is single-ended signal whereas Data Strobe is differential signals. Data is read or written in memory with respect to strobe signal. It acts as data valid flag.
These are dual function inputs.
When ACT_n and CS_n signals are LOW, these are interpreted as Row Address Bits.
When ACT_n is HIGH, these are interpreted as command pins to indicate READ, WRITE or other commands.
Activate command input, when this signal is low data on RAS/CAS/WE acts as a command
Bank Group, Bank Address
Data bit inversion
Data Bus configuration
DDR memories are available with data bus widths such as DQ [0:3], DQ [0:7], and DQ [8:15]. In the case of DIMM, the total maximum data bus width is either 32 bits or 64 bits depending upon the processor. In the DDR4 version, additional 8-bits are allocated for error control hence total bus width becomes 40 bits/72 bits.
Data Bus Inversion (DBI) function:
DDR4 deploys Data Bus Inversion to mitigate simultaneously switching noise, due to which power noise improvement and intermittent reduction in IO power are observed. DBI# is an active low and bidirectional signal. During the write operation, If DBI# is sampled low, DRAM inverts the write data received on the DQ inputs. If DBI# is high, DRAM leaves the data received on the DQ inputs non-inverted. During the read operation, the read data on its DQ outputs are inverted by DRAM. The DBI# pin is driven low when the number of ‘0’ data bits within a given byte lane is greater than 4; else the read data will not be inverted by the DRAM and drives the DBI# pin HIGH.
Write leveling for better DQ timing
In DDR4, memories are routed in Fly-by topology rather than Tree-topology; this was done specially to reduce the reflection caused during high-speed data transfer. The clock (and address) signals in Fly-By routing begins at the controller and establish the main channel to all the DRAMs. The DRAMs are connected to the main path by means of a very short stub from the main routed signal. However, this creates a problem with respect to the clock-to-DQS requirement at the DRAM. The DRAM closest to the controller will receive the clock and address signals before the last DRAM.
Mismatch in length and signals of address/clock paths in memories can create issues during the read cycle hence “Write leveling” solutions are implemented in DDR4. These solutions enable the controllers to automatically detect the flight-time difference between the clock signals of different DRAMs. Later, it delays the data lanes appropriately so that they reach the DRAMs as the clock/address signals reach the DRAMs. This process of detecting the required delays is called “training.” It is also possible to delay each DQ bit within a lane with regards to its strobe to perfectly center the strobe around the DQ signal.
READ operation with a burst length of 8 (BL8)
- In the beginning, the processor sends an ACT command, the value on the address bus at this time indicates the row address. The ACT command is clocked at the first posedge of the clock along with row address. Remember, command lines are multiplexed with address bus.
- Next step is that the processor sends RDA (Read with Auto-Precharge). The value on the address bus at this time indicates the column address. At third clock edge this is latched into the memory.
- Next, DQS signal clock is the output from the memory. During read operation both the edges are aligned to the data.
- Succeeding this, the processor sends RDA command which is interpreted by DRAM to automatically PRECHARGE the bank after the read is complete.
Write operation with a burst length of 8 (BL8)
- In the beginning, the processor sends an ACT command, the value on the address bus at this time indicates the row address. The ACT command is clocked at the first posedge of the clock along with row address.
- The next step is that the processor sends 2 WRITE commands. The first one indicates the COL address and the second one refers to COL+8. At the third clock edge, this is latched into the memory.
- The second write operation does not need an ACT command. This is because the row we intend to write to is already active in the Sense Amps.
- Next, the strobe DQS signal clock is the output from the processor during write operation along with data at both edges of the strobe signal.
- Also, note that the first command is a plain WR, so this leaves the row active. The second command is a WRA which deactivates the row after the write completes.
In order to increase the overall memory size in terms of capacity and bandwidth, DDR memories are combined on a single PCB which is called a module. Each DIMM can have multiple chips numbering from 4-16 ICs placed in both the sides of PCB to create 2GB, 4GB, 8GB, 16GB, 32GB memory modules. For example, the 64-bit data bus for DIMM requires eight numbers of 8-bit chips, addressed in parallel. A set of DRAM chips with the common address lines are called a memory rank. All the ranks will be connected to the same memory bus. The chip select signal can be used to issue commands to a specific rank.
Based on the size they have three form factors, they are DIMM (Dual In-line Memory Module), SODIMM, and MICRODIMM.
Type of DIMM
All DDR memories modules are mutually incompatible. The DDR2/3/4 DIMMs can only be used on their respective DDR2/3/4 sockets, with the notches differently located. The power supply for each variant also changes: for DDR1 DIMMs, it’s 2.5V; DDR2, 1.8V; DDR3, 1.5V; and DDR4, 1.2V. Hence the user needs to take precautions to check the memory version before ordering a memory module for their computer/device.
DDR modules slot
Applications of DDR memories and selection parameters
Memories are used in the following devices/systems:
- Computers, laptops, supercomputers, servers, etc.
- Mobile devices, tablets
- Gaming devices
- Verify your processor memory controller interface and support to DDR4 interface
- Type of memory required in the design
- Memory size
- Clock frequency requirements
- Interfacing speed
- Rise time of clock and data bus
- Access time
- Write cycle time
- Setup and hold time values command, address, control
- Core and IO voltages requirement
- VIO/VIH and VOL/VOH of signals
- Type of package
Widely used packages for higher size memories are BGA, FBBGA, WFBGA, TFBGA, QFPN etc.
Micron Technology, ISSI, Winbond, Cypress, STMicroelectronics, Alliance Memory, etc.
Cypress, Kingmax, Micron, Electronics, Centon Electronics, etc.
Key challenges in PCB routing for DDR memories
High speed PCB design typically needs to meet certain timing requirements in order to perform properly. However, in case of DDR4 RAM, this timing is tightened due to higher data rates. Design will fail if timings are not met resulting in higher bit error rate/data corruption. The designers are expected to carry out board simulation for signal integrity at every step.
In case of DDR memories, because of data transfer at both edges of the clock/strobe and high clock rate 1.6GHz, the setup and hold times are shorter and therefore timings delays play a significant role. The methods to achieve the requirements are discussed below.
The major challenges in routing DDR4 SDRAM interface with Gigabit transmission include:
- The method to maximize the timing margin of the data transmission; controlling proper Setup/Hold time
- The routing topology and proper termination scheme for nets with multiple receivers
- Routing technique to minimize crosstalk
- Method to mitigate the impedance discontinuity due to imperfect vias.
- Clean supply voltages
- Trace length matching
To know more about PCB routing read our article 11 Best High-Speed PCB Routing Practices.
General rules for PCB design requirements for DDR memories
Proper set up time and hold time
Adjusting the trace length of the clock with respect to data lines and control signals so as to comply with the setup and hold times of memory IC. The clock signal can be delayed by routing it in a serpentine fashion.
Clean and stable reference voltages
DDR4 requires extremely clean and stable voltages since it runs on low voltages (1.2V). The supply voltage is delivered through the power plane to the memory and to the termination resistors on the same side of the board, thus eliminating the impedance due to vias. Capacitors should also be evenly placed, which creates a consistent, clean bypassed reference. The following pictures display the placement of the capacitors in processor/FPGAs.
Placement of capacitors in Processor/FPGA
Placement of capacitors in DDR
Decap and bypass capacitors placements
- The fanout scheme creates a four-quadrant structure that facilitates the placement of decoupling and bulk capacitors on the bottom sides of the PCB.
- Usually 0210 or customized 0402 package capacitors should be mounted as close as possible to the power vias. The distance between them should be less than 50 mils.
- An additional bulk capacitor can be placed near the edge of the BGA via array.
- Placing the decoupling capacitors close to the power ball is critical to minimize the inductance. This also ensures high-speed transient current demand by the processor.
- Choosing a proper via size helps the designers in preserving adequate routing space.
- The recommended geometry for the via pad is: pad size is 18 mils and drill 8 mils.
- Place the largest capacitance in the smallest packages that budget and the manufacturing can support. For high-speed bypassing, select the required capacitance with the smallest packages (For example, 0.22uF and package 0201)
- Minimize trace length (inductance) to small caps.
- Series inductance cancels out capacitance.
- Tie caps to GND plane directly with a via.
DDR placement and routing rules
- The ultimate purpose of the placement is to limit the maximum trace lengths and allow proper routing space.
- The placements do not restrict the side of the PCB on which the device are mounted.
- Fly-by topology is preferred over the tree topology based on the study conducted by the researchers. It has shown improvement in the data eye diagram. The advantage of fly-by topology is that it supports higher-frequency operation, reduces the quantity and length of the stubs, and consequently improves the signal integrity and timing on heavily loaded signals.
Routing traces for on-board DDR and DIMM is shown below.
Bus topologies—On-board two-UDIMM
Before routing the DDR signals we have to group the signal first in a specific way,
Group should include 8 bits of data, data strobes and mask data/Data inversion which makes 11 signals. For example, signals are as follows:
- BYTELANE0 DQ [7:0], DM0/DBI0, DQS_P0, DQS_N0
- BYTELANE1 DQ [15:8], DM1/DBI1, DQS_P1, DQS_N1
- The address can be grouped separately including control, commands and related clock signals.
- Once data grouping is done and it should be routed as short as possible to the processor.
- The address, command, and control signals operate at half the bandwidth of the data, so they are expected to be longer.
- Data lines are routed as short as possible from the processor.
- Address lines, command, control lines are operating at half the bandwidth of the data, so they expected to be longer.
- The data byte and control line from the same channel/group should be routed on the same layer.
- Address/command/control/differential clock should be routed on the same layer, but if space issue arises, they can be routed on the different layers also.
Address/command/control/differential clocks are route topology
- Daisy-chain(Fly-by) Topology
- Route from controller starting with chip 0 thru chip n routing in order by byte lane numbers. Chip 0 is the lower data bit (Bytelane0)/chip. ‘n’ represents the upper data bit (Bytelane3).
Advantages of fly-by topology:
- Supports high-frequency operation
- Reduces the quantity and length of the stubs consequently improves the signal integrity and timing on heavily loaded signals.
- Reduces simultaneous switching noise (SSN) by causing flight-time skew between address and the point to point topology signals of the data group.
- Fly-by topology has very short stubs. This eliminates reflections.
- In T-topology, the trace stub is lengthened with an increase in the number of memory device loads.
- In T-topology, the clock traces need to be routed with a prolonged delay than the strobe traces per byte lane.
- In most of the cases DDR2 and its previous classes follow the T-topology routing.
- DDR3 and the next generations all of its classes follow Fly-by topology routing.
Trace length matching
The length matching is done in groups. Common length matching route groups are:
- Clock-to-Address/Command Group
- Clock-to-Strobes Group
- Strobe-to-Data Group
Match the lengths of signals using a serpentine path. Follow routing guidelines for data byte lanes as per the manufacturing datasheet. The maximum skew among all the signals shall be less than +/-2.5% of the clock period driven by the memory controller. The clock traces should be slightly longer than data/strobe/control lines so that clock arrives at the destination later than the data/strobe/control.
For example, if the normal FR4 material with a dielectric constant of 4 is used on the PCB, at a different clock rate of 1.2GHz (i.e., 833ps clock period), the maximum skew shall be +/-125mil(25psec) among all the signals.
Relative propagation delay
- 1 –5mils between all members inside of byte lane
- 100 –200mils between controller to first memory IC
- Clock to address/command +/- 25psec (125 mills)
- 10 –20mils between memory ICs
- 1500 –1750mils from controller to first memory IC.
- 650 –750mils between memory ICs.
Differential phase tolerance
- 1 –5mils for all data strobe and clock differential pairs
Data bus termination
- Series resistor termination can be used when point-to-point connection is in 2” to 2.5” range. Resistors to be located at the center of the transmission line.
- Use DRAM termination with direct connect using on-die termination (ODT) to improve the signal quality and cheaper price.
- 100 ohms differential terminator at the last SDRAM device in the chain is placed. This is 100 Ohms differential signal.
Characteristics impedance of traces
For single-ended traces such as data lines characteristics impedance is 50Ω and differential traces such as Clock and Strobe it is 100Ω.
- Single-ended impedance = 50Ω
- Smaller trace widths (5–6 mils) can be used.
- Spacing between like signals should extend to 3x (for 5 mils) or 2.5x (for 6 mils), respectively.
DOWNLOAD OUR CONTROLLED IMPEDANCE DESIGN GUIDE:
The crosstalk issue becomes more severe, especially in HDI PCBs, when traces run at high frequency and high edge rate. In order to minimize the coupling effect from the aggressor to the victim the spacing between two adjacent signal traces shall be at least 2 times and typically 3 times the trace width. However, the large trace spacing is tough to be implemented on the PCB due to space constraints.
To obtain best performance, a layout designer needs to carry out board simulation for signal integrity and cross talk. Designers should ensure that optimal termination values, signal topology, and trace lengths are determined through simulation for each signal group in the memory implementation. To learn how to get rid of cross talk issues in HDI PCBs, read our article How to Avoid Crosstalk in HDI Substrate?
General design guidelines for PCB
- Plan the board stack-up.
- Plan the trace width and spacing between traces.
- Plan the component placement carefully, such as memory IC’s in daisy chain topology, and place termination resistor, filter capacitors etc as per the guidelines
- Plan the ground and power plane as per the digital and analog components.
- Measure the lengths of Nets/signals of byte lanes.
- Import the right IBIS (Input/Output Buffer Information specification) model given by IC manufacturer.
- Place the components and power plane as per plan and route some of the critical single-ended and differential signals first(clock/strobe) and measure the length of the first/last bit of bus for length matching.
- Add delays in lines wherever necessary using additional serpentine traces.
- Measure the via impedance.
- Designers need to work back and forth between layout and simulation while doing the routing.
- Verify compliance with setup/hold times/skew/delays.
- Check crosstalk between the signals.
- No compromise on signal integrity is the key solution to get optimum performance.
For more design information, check with our DESIGN SERVICE team.
DOWNLOAD OUR HIGH-SPEED PCB DESIGN GUIDE: