Direct Memory Access

Introduction

Direct Memory Access, or DMA, is an absolutely essential part of any modern computing architecture. DMA allows the CPU to offload intensive memory access tasks to other components. This then frees the CPU from these menial chores and provides more cycles to more complex tasks for which it is better suited.

For example, perhaps you would like to save this article on your hard drive for future reference. Once you’ve chosen to do this and you’ve chosen a location to save the article, is the CPU really necessary? Are you performing any calculations? Ignoring any progress indicators or file system updating for now, there’s really not much going on here. The data is received by your network card and is then routed to the desired location on your hard drive. Simple. The capabilities of the CPU would certainly be overkill for this type of operation.

Though the basic idea of, and motivation for, direct memory access is quite simple, operations that access memory directly can be quite complex. As you can imagine, when multiple devices (or peripherals) are all attempting to access memory locations trouble can quickly ensue. This is the reason or the requirement for a DMA controller. The DMA controller is the device which controls all DMA operations.

To control the DMA operations, the DMA controller first needs to be setup with information related to the upcoming operation. This information includes things like the source and destination addresses, the mode, and the size of the data to be transferred. The DMA controller is then armed with the knowledge of what to transfer, where to transfer it to, how to do it, and how long to do it for. With this knowledge the DMA controller can then request control of the memory bus from the CPU. When the CPU is ready to relinquish control of the memory bus, the CPU sends an acknowledge single in response to the DMA controller’s request signal.

Figure 1: Direct Memory Access interactions. Courtesy of www.eetimes.com

Burst or Single-cycle

What happens after the DMA controller gains control of the memory bus depends on the mode in which the DMA controller has been instructed to operate in. There are two general modes of operation for DMA controllers. The first of these is referred to as burst. When a DMA controller is operating in burst mode it retains control of the memory bus for the duration of the memory transfer. The negative aspect of operating in burst mode is that the CPU will not be able to access the memory bus until the DMA controller completes the memory transfer. While the CPU will be able to access its L1 and L2 cache, it won’t be able to access any other memory; this will obviously limit what the CPU can accomplish and may result in the CPU having to wait until the DMA controller has completed the memory transfer and relinquished control of the memory bus back to the CPU.

To avoid the unfortunate situation where the CPU is left waiting for the memory transfer to complete, the DMA controller can operate in another mode called single-cycle. While operating in single-cycle mode, the DMA controller relinquishes control of the memory bus after transferring each block of memory. The size of this block is typically 256 or 512 bytes. This then gives the CPU the opportunity to utilize the memory bus for its own purposes without having to wait for any significant amount of time. Unfortunately there is a down side to the DMA operating in single-cycle mode. When the DMA controller relinquishes control of the memory bus it must then send a request to the CPU to regain control of the bus and must then wait for an acknowledge signal from the CPU before regaining control of the bus for another transfer of a memory block. Thus, having to repeat the request/acknowledge sequence many times the total time taken to complete the memory transfer is increased.

The single-cycle operating mode is the most commonly used mode, though most DMA controllers are capable of both modes. The optimum block length which the DMA will transfer before relinquishing control of the bus and re-requesting control is a matter quite complex. One major factor in determining the optimal size of the block is the observed error rate. As errors are observed in the transfer of memory (quite common when the memory is transferred over a communications link) retransmission of the data must occur. For this reason when there is a high rate of error smaller block lengths are optimal. However, when there are few errors a small block length will result in a increase in request/acknowledge cycles between the DMA controller and the CPU thus decreasing the total memory transfer time. How these decisions are implemented is dependent upon each DMA controller manufacturer (or more precisely the engineer who designed it!) and is not widely distributed. If you’d like to know how this is determined for a specific DMA controller, you may be able to find it somewhere in the documentation, otherwise it might require a query to the company.

When the DMA controller has completed the transfer of a block and relinquished its control of the memory bus back to the CPU the CPU can then access the bus for its own purposes. In the example I used above, and in many other examples, this provides the CPU with an opportunity to update any progress indicators and to update the file system with any new information relevant to the DMA operation being performed (the existence of a new file for instance).

Cache Coherency

Another issue which arises during DMA operations is called cache coherency. When the CPU accesses a memory location, the value of that location is stored in the CPU’s cache. If a DMA operation were to involve this memory location then the value in the CPU’s cache may not match the value at the true memory location. For further reading on cache coherency view my previous article here.

To solve this problem there are two possible solutions. Systems that are fully cache coherent implement a hardware solution where the DMA controller sends a signal to the cache controller when it wishes to access a memory location. If the DMA wants to write to that location, the cache controller will invalidate the CPU’s cache value. If the DMA wishes to read the memory location the cache controller will flush the CPU’s cache to ensure that the memory location contains the most up to date value (which would be the value in the CPU’s cache). This method of operation requires some overhead but guarantees the coherency of the CPU’s cache.

In systems that are not fully cache coherent the job of maintaining cache coherency is left to the operating system. The operating system will be required to decide if the cache should be flushed prior to a DMA operation, or invalidated afterwards. Which method is better? I’m not sure there’s a definitive answer, both methods are adequate. However, I personally prefer the elegance of the hardware solution implemented in fully cache coherent systems.