The FIFO

Posted by: repair  :  Category: Memory
The memory devices discussed thus far are essentially linear arrays of bitssurrounded by a minimal quantity of interface logic to move bits between theport(s) and the array. First-in-?rst-out (FIFO) memories are special-purposedevices that implement a basic queue structure that has broad application incomputer and communications architecture. Unlike other memory devices,a typical FIFO has two unidirectional ports without address inputs: one forwriting and another for reading. As the name implies, the ?rst data written isthe ?rst read, and the last data written is the last read. A FIFO is not a randomaccess memory but a sequential access memory. Therefore, unlike a conventionalmemory, once a data element has been read once, it cannot be read again,because the next read will return the next data element written to the FIFO.By their nature, FIFOs are subject to over?ow and under?ow conditions.Their ?nite size, often referred to as depth, means that they can ?ll up if readsdo not occur to empty data that has already been written. An over?ow occurswhen an attempt is made to write new data to a full FIFO. Similarly, an emptyFIFO has no data to provide on a read request, which results in an under?ow.
A FIFO is created by surrounding a dual-port memory array generally SRAM,but DRAM could be made to work as well for certain applications with a writepointer, a read pointer, and control logic as shown in Fig. 4.18.

FIGURE 4.18 Basic FIFO architecture.

FIGURE 4.18 Basic FIFO architecture.
A FIFO is not addressed in a linear fashion; rather, it is made to form a continuousring of mem- ory that is addressed by the two internal pointers. The fullness of theFIFO is determined not by the absolute values of the pointers but by their relativevalues. An empty FIFO begins with its read and write pointers set to the samevalue. As entries are written, the write pointer increments. As entries are read,the read pointer increments as well. If the read pointer ever catches up to thewrite pointer such that the two match, the FIFO is empty again. If the read pointerfails to advance, the write pointer will eventually wrap around the end of thememory array and become equal to the read pointer. At this point, the FIFO isfull and cannot accept any more data until reading resumes. Full and empty ?agsare generated by the FIFO to provide status to the writing and reading logic.Some FIFOs contain more detailed fullness status, such as signals that representprogrammable fullness thresholds.
The interfaces of a FIFO can be asynchronous (no clock) or synchronous(with a clock). If syn- chronous, the two ports can be designed to operate witha common clock or different clocks. Al- though older asynchronous FIFOs arestill manufactured, synchronous FIFOs are now more common. SynchronousFIFOs have the advantage of improved interface timing, because ?ops placedat a devices inputs and outputs reduce timing requirements to the familiar setup,hold, and clock-to-out speci?cations. Without such a registered interface,timing speci?cations become a function of the devices internal logic paths.
One common role that a FIFO ?lls is in clock domain crossing. In such anapplication, there is a need to communicate a series of data values from ablock of logic operating on one clock to another block operating on a differentclock. Exchanging data between clock domains requires special attention,because there is normally no way to perform a conventional timing analysisacross two differ- ent clocks to guarantee adequate setup and hold times atthe destination ?ops. Either an asynchronous FIFO or a dual-clock synchronousFIFO can be used to solve this problem, as shown in Fig. 4.19.
The dual-port memory at the heart of the FIFO is an asynchronous elementthat can be accessed by the logic operating in either clock domain. A dual-clocksynchronous FIFO is designed to handle arbitrary differences in the clocksbetween the two halves of the device. When one or more bytes are writtenon clock A, the write-pointer information is carried safely across to the clockB domain within the FIFO via inter-clock domain synchronization logic. Thisenables the read-control inter- face to determine that there is data waiting tobe read. Logic on clock B can read this data long after it has been safelywritten into the memory array and allowed to settle to a stable state.
Another common application for a FIFO is rate matching where a particulardata source is bursty and the data consumer accepts data at a more regular rate.One example is a situation where a se- quence of data is stored in DRAM andneeds to be read out and sent over a communications interface one byte at a time.The DRAM is shared with a CPU that competes with the communicationsinterface for memory bandwidth. It is known that DRAMs are most ef?cientwhen operated in a page-mode burst. Therefore, rather than perform acomplete read-transaction each time a single byte is needed for thecommunications interface, a burst of data can be read and stored in a FIFO.Each time the interface is ready for a new byte, it reads it from the FIFO.In this case, only a single-clock FIFO is required, because these devicesoperate on a common clock domain. To keep this process running smoothly,control logic is needed to watch the state of the FIFO and perform a new burstread from DRAM when the FIFO begins to run low on data. This scheme isillustrated in Fig. 4.20.

FIGURE 4.19 Clock domain crossing with synchronous FIFO.

FIGURE 4.19 Clock domain crossing with synchronous FIFO.
For data-rate matching to work properly, the average bandwidth over time ofthe input and output ports of the FIFO must be equal, because FIFO capacity is?nite. If data is continuously written faster than it can be read, the FIFO willeventually over?ow and lose data. Conversely, if data is continuously read fasterthan it can be written, the FIFO will under?ow and cause invalid bytes to beinserted into the outgoing data stream. The depth of a FIFO indicates how largea read/write rate disparity can be tolerated without data loss. This disparity isexpressed as the product of rate mismatch and time. A small mismatch can betolerated for a longer time, and a greater rate disparity can be tolerated fora shorter time.
In the rate-matching example, a large rate disparity of brief duration is balancedby a small rate disparity of longer duration. When the DRAM is read, a burstof data is suddenly written into the FIFO, creating a temporarily large ratedisparity. Over time, the communications interface reads one byte at a time whileno writes are taking place, thereby compensating with a small disparity over time.
DRAM reads to re?ll the FIFO must be carefully timed to simultaneously preventover?ow and under?ow conditions. A threshold of FIFO fullness needs to beestablished below which a DRAM read is triggered. This threshold must guaranteethat there is suf?cient space available in the FIFO to accept a full DRAM burst,avoiding an over?ow. It must also guarantee that under the worst-case responsetime of the DRAM, enough data exists in the FIFO to satisfy the communicationsinterface, avoiding an under?ow. In most systems, the time between issuing aDRAM read request and actu- ally getting the data is variable. This variabilityis due to contention with other requesters (e.g., the CPU) and waiting foroverhead operations (e.g., refresh) to complete.

FIGURE 4.20 Synchronous FIFO application: data rate matching.

FIGURE 4.20 Synchronous FIFO application: data rate matching.
By : E-book Complete_Digital_Design

Multiport Memory

Posted by: repair  :  Category: Memory
Most memory devices, whether volatile or nonvolatile, contain a single interfacethrough which their contents are accessed. In the context of a basic computersystem with a single microprocessor, this single-port architecture is well suited.There are some architectures in which multiple microprocessors or logic blocksrequire access to the same shared pool of memory. A shared pool of memorycan be constructed in a couple of ways. First, conventional DRAM or SRAMcan be combined with ex- ternal logic that takes requests from separate entities(e.g., microprocessors) and arbitrates access to one requestor at a time. Whenthe shared memory pool is large, and when simultaneous access by multiplerequesters is not required, arbitration can be an ef?cient mechanism. However,the complexity of arbitration logic may be excessive for small shared-memorypools, and arbitration does not enable simultaneous access. A means of sharingmemory without arbitration logic and with simultaneous access capability is toconstruct a true multiport memory element.
A multiport memory provides simultaneous access to multiple external entities.Each port may be read/write capable, read-only, or write-only depending onthe implementation and application. Multiport memories are generally kept relativelysmall, because their complexity, and hence their cost, increases signi?cantly asadditional ports are added, each with its own decode and control logic.Most multiport memories are dual-port elements as shown in Fig. 4.16.
A true dual-port memory places no restrictions on either ports transactions atany given time. It is the responsibility of the engineer to ensure that one requesterdoes not con?ict with the other. Con-?icts arise when one requester writes amemory location while the other is either reading or writing that same location.If a simultaneous read/write occurs, what data does the reader see? Is it the databefore or after the write? Likewise, if two writes proceed at the same time,which one wins? While these riddles could be worked out for speci?c applicationswith custom logic, it is safer not to worry about such corner cases. Instead,the system design should avoid such con?icts unless there is a strong reason tothe contrary.
One common application of a dual-port memory is sharing information betweentwo micropro- cessors as shown in Fig. 4.17. A dual-port memory sits betweenthe microprocessors and can be par- titioned into a separate message bin, ormemory area, for each side. Bin A contains messages written by CPU A andread by CPU B. Bin B contains messages written by CPU B and read by CPU A.Noti?cation of a waiting message is accomplished via a CPU interrupt, therebyreleasing the CPUs from having to constantly poll the memory as they wait formessages to arrive. The entire process might work as follows:
1. CPU A writes a message for CPU B into Bin A.2. CPU A asserts an interrupt to CPU B indicating the a message is waiting in Bin A.3. CPU B reads the message in Bin A.4. CPU B acknowledges the interrupt from CPU A.5. CPU A releases the interrupt to CPU B.
An implementation like this prevents dual-port memory con?icts because one CPUwill not read a message before it has been fully written by the other CPU and neitherCPU writes to both bins.

FIGURE 4.16 Dual-port memory.

FIGURE 4.16 Dual-port memory.

FIGURE 4.17 Dual microprocessor message passing architecture.

FIGURE 4.17 Dual microprocessor message passing architecture.
By : E-book Complete_Digital_Design

Asynchronous DRAM

Posted by: repair  :  Category: Memory
SRAM may be the easiest volatile memory to use, but it is not the least expensivein signi?cant densities. Each bit of memory re-quires between four and six transistors.When millions or billions of bits are required, the complexity of all those transistorsbecomes substantial. Dynamic RAM, or DRAM, takes advantage of a verysimple yet fragile storage component: the capacitor. A capacitor holds an electricalcharge for a limited amount of time as the charge gradually drains away. As seenfrom EPROM and ?ash devices, capacitors can be made to hold charge almostinde?nitely, but the penalty for doing so is signi?cant complexity in modifyingthe storage element. Volatile memory must be both quick to access and not besubject to write-cycle limitations both of which are restrictions of nonvolatilememory technologies. When a capacitor is designed to have its charge quicklyand easily manipulated, the downside of rapid discharge emerges. A very ef?cientvolatile storage element can be created with a capacitor and a single transistoras shown in Fig. 4.9, but that capacitor loses its contents soon after being charged.This is where the term dynamic comes from in DRAM the memory cell is indeeddynamic under steady- state conditions. The solution to this problem of solid-stateamnesia is to periodically refresh, or update, each DRAM bit before it completelyloses its charge.

FIGURE 4.9 DRAM bit structure.

FIGURE 4.9 DRAM bit structure.
As with SRAM, the pass transistor enables both reading and writing the state ofthe storage element. However, a single capacitor takes the place of a multitransistorlatch. This signi?cant reduction in bit complexity enables much higher densities andlower per-bit costs when memory is implemented in DRAM rather than SRAM.This is why main memory in most computers is implemented using DRAM.The trade-off for cheaper DRAM is a degree of increased complexity in thememory control logic. The number one requirement when using DRAM isperiodic refresh to maintain the contents of the memory.
DRAM is implemented as an array of bits with rows and columns as shown inFig. 4.10. Unlike SRAM, EPROM, and ?ash, DRAM functionality from anexternal perspective is closely tied to its row and column organization.
SRAM is accessed by presenting the complete address simultaneously. A DRAMaddress is presented in two parts: a row and a column address. The row andcolumn addresses are multiplexed onto the same set of address pins to reducepackage size and cost. First the row address is loaded, or strobed, into the rowaddress latch via row address strobe, or RAS*, followed by the column addresswith column address strobe, or CAS*. Read data propagates to the output aftera speci?ed access time. Write data is presented at the same time as the columnaddress, because it is the column strobe that actually triggers the transaction,whether read or write. It is during the column address phase that WE* andOE* take effect.

FIGURE 4.10 DRAM architecture.

FIGURE 4.10 DRAM architecture.
Sense ampli?ers on the chip are necessary to detect the minute charges that areheld in the DRAMs capacitors. These ampli?ers are also used to assist in refreshoperations. It is the memory controllers responsibility to maintain a refresh timerand initiate refresh operations with suf?cient frequency to guarantee data integrity.Rather than refreshing each bit separately, an entire row is refreshed at the sametime. An internal refresh counter increments after each refresh so that all rows,and therefore all bits, will be cycled through in order. When a refresh begins,the refresh counter enables a particular memory row. The contents of the roware detected by the sense ampli?ers and then driven back into the bit array torecharge all the capacitors in that row. Modern DRAMs typically require acomplete refresh every 64 ms. A 64-Mb DRAM organized as 8,388,608words ? 8 bits (8 MB) with an internal array size of 4,096 ? 2,048 byteswould require 4,096 refresh cycles every 64 ms. Refresh cycles need not beevenly spaced in time but are often spread uniformly for simplicity.
The complexity of performing refresh is well worth the trouble because of thesubstantial cost and density improvements over SRAM. One downside of DRAMthat can only be partially compensated for is its slower access time. A combinationof its multiplexed row and column addressing scheme plus its large memory arrayswith complex sense and decode logic make DRAM signi?cantly slower thanSRAM. Mainstream computing systems deal with this speed problem byimplementing SRAM- based cache mechanisms whereby small chunks of memoryare prefetched into fast SRAM so that the microprocessor does not have to waitas long for new data that it requests.
Asynchronous DRAM was the prevailing DRAM technology until the late 1990s,when synchro- nous DRAM, or SDRAM, emerged as the dominant solution tomain memory. At its heart, SDRAM works very much like DRAM but with asynchronous bus interface that enables faster memory transactions. It is usefulto explore how older asynchronous DRAM works so as to understand SDRAM.SDRAM will be covered in detail later in the book.
RAS* and CAS* are the two main DRAM control signals. They not only tellthe DRAM chip which address is currently being asserted, they also initiaterefresh cycles and accelerate sequential transactions to increase performance.A basic DRAM read works as shown in Fig. 4.11. CE* and OE* areboth assumed to be held active (low) throughout the transaction.
A transaction begins by asserting RAS* to load the row address. The strobesare falling-edge sen- sitive, meaning that the address is loaded on the fallingedge of the strobe, sometime after which the address may change. AsynchronousDRAMs are known for their myriad detailed timing requirements. Everysignals timing relative to itself and other signals is speci?ed in great detail, andthese parameters must be obeyed for reliable operation. RAS* is kept lowfor the duration of the transac- tion. Assertion of CAS* loads the columnaddress into the DRAM as well as the read or write status of the transaction.Some time later, the read data is made available on the data bus. Afterwaiting for a suf?cient time for the DRAM to return the read data, thememory controller removes RAS* and CAS* to terminate the transaction.

FIGURE 4.11 Basic DRAM read (CE* = 0, OE* = 0).

FIGURE 4.11 Basic DRAM read (CE* = 0, OE* = 0).
Basic writes are similar to single reads as shown in Fig. 4.12. Again, CE* isassumed to be held active, and, being a write, OE* is assumed to be heldinactive throughout the transaction.
Like a read, the write transaction begins by loading the row address. From thisit is apparent that there is no particular link between loading a row address andperforming a read or a write. The iden- tity of the transaction is linked to thefalling edge of CAS*, when WE* is asserted at about the same time thatthe column address and write data are asserted. DRAM chips require a certainsetup and hold time for these signals around the falling edge of CAS*. Oncethe timing requirements are met, address can be deasserted prior to the risingedge of CAS*.
A read/write hybrid transaction, called a read-modify-write, is also supportedto improve the ef?- ciency of the memory subsystem. In a read-modify-write,the microprocessor fetches a word from memory, performs a quick modi?cationto it, and then writes it back as part of the same original transaction. This is anatomic operation, because it functions as an indivisible unit and cannot beinterrupted. Figure 4.13 shows the timing for the read-modify-write. Note thatCAS* is held for a longer period of time, during which the microprocessormay process the read-data before asserting WE* along with the new data tobe written.
Original DRAMs were fairly slow. This was partly because older silicon processeslimited the decode time of millions of internal addresses. It was also a result of thefact that accessing a single lo- cation required a time-consuming sequence of RAS*followed by CAS*. In comparison, an SRAM is quick and easy: assert theaddress in one step and grab the data. DRAM went through an architecturalevolution that replaced the original devices with fast-page mode (FPM) devicesthat allow more ef?cient accesses to sequential memory locations. FPM DRAMsprovide a substantial increase in usable memory bandwidth for the most commonDRAM application: CPU memory. These devices take advantage of the tendencyof a microprocessors memory transactions to be sequential in nature.

FIGURE 4.12 Basic DRAM write (CE* = 0, OE* = 1).

FIGURE 4.12 Basic DRAM write (CE* = 0, OE* = 1).

FIGURE 4.13 Read-modify-write transaction.

FIGURE 4.13 Read-modify-write transaction.
Software does occasionally branch back and forth in its memory space. Yet, on thewhole, software moves through portions of memory in a linear fashion. FPM devicesenable a DRAM controller to load a row-address in the normal manner using RAS*and then perform multiple CAS* transactions using the same row-address.Therefore, DRAMs end their transaction cycles with the rising edge of RAS*,because they cannot be sure if more reads or writes are coming until RAS* rises,indicating that the current row-address can be released.
FPM technology, in turn, gave way to extended-data out (EDO) devices that extendthe time read data is held valid. Unlike its predecessors, an EDO DRAM does notdisable the read data when CAS* rises. Instead, it waits until either the transactionis complete (RAS* rises), OE* is deasserted, or until CAS* begins a newpage-mode access. While FPM and EDO DRAMs are distinct types of devices,EDO combines the page-mode features of FPM and consequently became moreattractive to use. The following timing discussion uses EDO functionality as theexample.
Page-mode transactions hold RAS* active and cycle CAS* multiple times to performreads and writes as shown in Figs. 4.14 and 4.15. Each successive CAS* fallingedge loads a new column ad- dress and causes either a read or write to be performed.In the read case, EDOs bene?t can be properly observed. Rather than read databeing removed when CAS* rises, it remains asserted until just after the next fallingedge of CAS* or the rising edge of RAS* that terminates the page-mode transaction.

FIGURE 4.14 Page-mode reads.

FIGURE 4.14 Page-mode reads.

FIGURE 4.15 Page-mode writes.

FIGURE 4.15 Page-mode writes.
There are some practical limits to the duration of a page-mode transaction. First,there is an absolute maximum time during which RAS* can remain asserted.The durations of RAS* and CAS* are closely speci?ed to guarantee properoperation of the DRAM. Operating the DRAM with a minimum CAS* cycletime and a maximum RAS* assertion time will yield a practical limitation on thedata burst that can be read or written without reloading a new row address. Inreality, a common asynchronous DRAM can support over 1,000 back-to-backaccesses for a given row-address.
DRAM provides its best performance when operated in this manner. The longerthe burst, the less overhead is experienced for each byte transferred, becausethe row-address setup time is amortized across each word in the burst.Cache subsystems on computers help manage the bursty nature of DRAM byswallowing a set of consecutive memory locations into a small SRAM cachewhere the microprocessor will then have easy access to them later withouthaving to wait for a lengthy DRAM transaction to execute.
The second practical limitation on page-mode transactions, and all DRAMtransactions in gen- eral, is refresh overhead. The DRAM controller must besmart enough to execute periodic refresh operations at the required frequency.Even if the microprocessor is requesting more data, refresh must take priorityto maintain memory integrity. At any given instant in time, a scheduled refreshoperation may be delayed slightly to accommodate a CPU request, but notto the point where the controller falls behind and fails to execute the requirednumber of refresh operations. There are a variety ways to initiate a refreshoperation, but most involve a so-called CAS-before-RAS signaling wherethe normal sequence of the address strobes is reversed to signal a refresh.Asserting CAS* before RAS* signals the DRAMs internal control logicto perform a row-refresh at the speci?c row indicated by its internal counter.Following this operation, the refresh counter is incremented in preparationfor the next refresh event.
DRAM has numerous advantages over SRAM, but at the price of increasedcontroller complexity and decreased performance in certain applications.DRAMs use multiplexed address buses, which saves pins and enables smaller,less expensive packaging and circuit board wiring. Most DRAMs aremanufactured with data bus widths smaller than what is actually used in acomputer to save pins. For example, when most computers used 8- or 16-bitdata buses, most DRAMs were 1 bit wide. When microprocessors grew to32 and 64 bit data buses, mainstream DRAMs grew to 4- and then 8-bitwidths. This is in contrast to SRAMs, which have generally been offeredwith wide buses, starting out at 4 bits and then increasing to 72 bits inmore modern devices. This width disparity is why most DRAMimplementations in computers involve groups of four, eight, or moreDRAMs on a single module. In the 1980s, eight 64k ? 1 DRAMs createda 64 kB memory array. Today, eight 32M ? 8 DRAMs create a 256 MBmemory array that is 64 bits wide to suit the high-bandwidth 32- or 64-bitmicroprocessor in your desktop PC.
A key architectural attribute of DRAM is its inherent preference for sequentialtransactions and, ac- cordingly, its weakness in handling random single transactions.Because of their dense silicon struc- tures and multiplexed address architecture,DRAMs have evolved to provide low-cost bulk memory best suited to bursttransactions. The overhead of starting a burst transaction can be negligible whenspread across many individual memory words in a burst. However, applicationsthat are not well suited to long bursts may not do very well with DRAM becauseof the constant startup penalty in-volved in fetching 1 word versus 1,000 words.Such applications may work better with SRAM. Planning memory architectureinvolves making these trade-offs between density/cost and performance.
By : E-book Complete_Digital_Design