By Margaret Piemonte [PC Magazine]
Memory: We know we have it, and we know we need it. We've heard that the more a PC has, the better its performance will be. This section will explore the different types of main memory (called DRAM, or dynamic random access memory) and processor cache memory (called Level 2 or L2 cache) available for a PC today. By varying the amounts and types of DRAM and L2 cache in your system, you can get different performance returns.
As CPU speeds increase, so does the need for faster system components. Traditionally, the memory bus runs much slower than the CPU. But with Intel's newest PCI chip sets, the 430HX and 430VX, faster memory technologies can be implemented, closing the gap between the speed of the memory bus and the speed of the CPU.
Fast-page-mode (or FPM) DRAM used to be standard issue on mainstream PCs. But the market has recently seen a surge in the availability of newer and faster memory types, which have all but succeeded in displacing FPM DRAM as the memory of choice. In fact, in our last roundup of 101 Pentium-class PCs ("Pentium Classic: Still the One," June 25, 1996), only three PCs used FPM DRAM.
FPM-memory read accesses begin with the activation of a row in the DRAM array, then move on to the activation of the first column of a memory address location that contains the data that you want. Each piece of information needs to be validated, then the data needs to be latched back to the system. Once the correct piece of information is found, the column deactivates and gets ready for the next cycle. This introduces a wait state, because nothing is happening while the column is deactivating. (The CPU must wait for the memory to complete the cycle). The data output buffer is turned off until either the next cycle begins or the next piece of information is requested. In fast page mode, the next column in the row activates in anticipation of the fact that the next piece of data you need is in the memory location adjacent to the previous piece. This activation of the next column works well only with sequential reads from memory in a given row.
Ideally, a read from 50-nanosecond FPM memory can achieve a burst cycle timing as fast as 6-3-3-3 (6 clocks for the first data element, followed by 3 clocks each for the next three data elements). The first phase includes the overhead created by activating the row and column. Once they have been activated, the memory can transfer the data in as few as three clock cycles per piece of data.
Extended data out (EDO, sometimes called hyper-page-mode) DRAM and burst EDO (BEDO) DRAM are two memory technologies based on the fundamentals of page-mode memory. EDO was introduced into mainstream PCs about a year ago and has since become the main memory of choice for many system vendors. BEDO is relatively new and has not yet caught the market's attention to the extent that EDO has.
EDO works much like FPM DRAM: A row of memory is activated, and then the column is activated. But when the piece of information is found, instead of deactivating the column and turning off the output buffer (which is what FPM DRAM does), EDO memory keeps the output data buffer on until the next column access or next read cycle begins. By keeping the buffer on, EDO eliminates wait states, and burst transfers happen more quickly.
EDO also enjoys a faster ideal burst read cycle timing than FPM DRAM: 6-2-2-2 versus FPM's 6-3-3-3. This ultimately saves three clock cycles in a burst of four data elements from DRAM on a 66-MHz bus. EDO is easy to implement, and with virtually no price difference between fast page mode and EDO, there is no reason not to choose EDO.
BEDO DRAM improves cycle times over FPM much more than EDO does. Since most PC applications access memory in four-cycle bursts to fill cache memory (system memory will burst its data into L2 cache, or to the CPU in the absence of L2 cache), once the first address is known, the next three can quickly be provided by the DRAM. The essential enhancement that BEDO offers is the addition of an address counter on the chip to keep track of the next addresses.
BEDO also adds a pipelined stage that allows the page-access cycle to be divided into two components. For a memory read operation, the first component accesses the data from the memory array to the output stage (second latch), while the second component drives the data bus from this latch at the appropriate logic level. Since the data is already in the output buffer, faster access time is achieved. BEDO can achieve a maximum burst timing of 5-1-1-1 (with 52-ns BEDO and a 66-MHz bus), saving an additional three clocks over optimally designed EDO memory.
Intel's 430VX chip set supports a new type of memory technology called synchronous DRAM (SDRAM). A key feature of SDRAM is its ability to synchronize all operations with the processor clock signal. This makes the implementation of control interfaces easier, and it makes column (but not row) access time quicker. SDRAM includes an on-chip burst counter that can be used to increment column addresses for very fast burst accesses, similar to BEDO's. This means that SDRAM allows new memory accesses to be initiated before the preceding access has been completed.
SDRAM can achieve a burst timing of 5-1-1-1 with a 66-MHz bus in a well-designed and well-tuned PC. The SDRAM's burst length and latency are fully programmable via an on-chip mode register.
When we read about cache, we are usually reading about Level 2 cache, or external cache. L2 cache has been the domain of a very fast and expensive memory type called SRAM (static RAM) that holds data frequently used by the CPU so that the CPU doesn't have to rely solely on slower DRAM. Since fast types of DRAM are available, some vendors offer cacheless PCs to hit a lower price range. Through our testing, however, we've found that the performance levels achieved by cacheless PCs can't match the performance of a PC with L2 cache.
The simplest form of SRAM uses an asynchronous design, in which the CPU sends an address to the cache and the cache looks up the address, then returns the data. An extra cycle is required at the beginning of each access for the tag lookup. Thus, asynchronous cache's response time can be as fast as 3-2-2-2 on a 66-MHz bus, although 4-2-2-2 is much more common.
Synchronous cache buffers incoming addresses to spread the address-lookup routine over two or more clock cycles. SRAM stores the requested address in a register during the first clock cycle. During the second, it retrieves the data and delivers it. Since the address is stored in the register, synchronous SRAM can then receive the next data address internally while the CPU is reading the data from the previous request. Synchronous SRAM can then "burst" subsequent data elements without receiving and decoding additional addresses from the chip set. Response time can be reduced--optimally--to a 2-1-1-1 timing on a 66-MHz bus.
Another type of synchronous SRAM is called pipelined burst. Pipelining essentially adds an output stage that buffers data reads from the memory locations so that subsequent memory reads are accessed quickly, without the latency incurred by traveling all the way into the memory array to get the next data element. Pipelining works most effectively with sequential access patterns, such as cache linefills.
Standardizing our test-bed was the key to consistent results. We used a Pentium/166 PC, from Dell Computer Corp., with an Intel 430FX chip set, a Seagate 2GB Fast ATA-2 hard disk, and a Number Nine Imagine 128 graphics card. To ensure a consistent set of test criteria, all of the memory used for our tests was supplied by a single company, Kingston Technology Corp. (800-337-8410; http://www.kingston.com).
The most significant change in performance in all the testing scenarios was the increase in Winstone 32 scores when a machine's EDO DRAM was raised from 8MB to 16MB. Conversely, there was very little or no change in performance in CPUmark32 scores in the same scenario, because CPUmark stresses the CPU, L2 cache, and memory speeds.
The numbers also show little performance gain in Winstone 32 scores when comparing fast-page-mode DRAM to EDO with any amount or type of cache. When there is no L2 cache, however, EDO shows its biggest performance gain over FPM: 15 percent with 8MB and 18 percent with 16MB.
For a machine with 16MB of EDO DRAM and no L2 cache, a big performance boost was achieved by adding 256K of pipelined-burst L2 cache: 47 percent on CPUmark32, and 36 percent on Winstone 96. In general, adding cache or changing from asynchronous to synchronous pipelined-burst cache will result in a bigger increase in CPU/memory performance (as measured by CPUmark32) than in overall system performance (as measured in Winstone 32).
Copyright © 1996 Ziff-Davis Publishing Co. All rights res erved. Reproduction in whole or in part in any form or medium without express wr itten permission of Ziff-Davis Publishing Company is prohibited.