NetBSD Documentation: How SCSI DMA works

How SCSI DMA works

Preface
Notes on SCSI DMA
Case study 1: DECstation IOASIC
Case study 2: DEC3000 TCDS
Case study 3: Magnum 3000 RAMBO ASIC
Case study 4: SPARCstation LSI64854 ASIC
Case study 5: DEC3000 TC SG DMA
Case study 6: ARCS ASIC

How SCSI DMA works

Preface

Last night I compiled a short story about SCSI DMA machinery. It was originally made for the guy in the Southern Hemisphere at 41S 175E who is working with the R3000 Magnum, but I found it valuable also for the general public. It's slightly incomplete and possibly will be added to in the future. Enjoy.

Tohru Nishimura

Nara Institute of Science and Technology

Notes on SCSI DMA

Any DMA transfer poses address constraints as well as minimum and maximum size of transfers. DMA transfer is done in blocks and transfer size must be the multiple of the block size. DMA transaction might not be allowed to run across a particular address boundary because some designs are not flexible enough to cover the arbitrary range of entire address space with given combination of block base address and block counter. It's necessary to take care about the fractional data less than transferring block size. Either of the starting address or ending address is quite likely not aligned to block boundary. The fractional transfer is managed by DMA pointer and special register(s) which hold the residue for alignment fixup task.

Push down to SCSI device

Starting address must be truncated and aligned to the nearest block boundary. DMA pointer must adjust to indicate the correct starting address of transfer. On the other hand, it's unnecessary to take care about the transfer tail because transfer counter in SCSI controller chip stops a transaction when all of outgoing data are sent to SCSI device.
Pull up from SCSI device

Starting address must be truncated and aligned to the nearest block boundary. DMA pointer must adjust to indicate the correct starting address of memory to be overwritten by SCSI data. It's impossible to know how much SCSI data will be transferred from SCSI device in advance. Fractional data less than block size is left unwritten to memory. DMA channel buffering store holds the residue and DMA driver must pick them up and write to the correct memory address to complete the entire transaction.

Case study 1: DECstation IOASIC

DECstation IOASIC DMA channel poses 8B constraint for starting and ending addresses. Fractional data can be managed with a pair of 32bit registers, SDR0 and SDR1, which is considered concatenated to hold 8B of fractional data to be transferred. Fractions are counted in 2B quantity and indicated in SCR register.

Push down to SCSI device

If starting address is not aligned to 8B boundary, SDR0 and SDR1 must hold the entire block of 8B in question. SCR works as DMA pointer to indicate which chunk of 2B quantity in SDR0/SDR1 pair is the first data to be transferred. Starting address of blocked DMA transfer is rounded up to the nearest 8B boundary, and to be instructed by DMAPTR register. It doesn't matter the unaligned ending address because SCSI controller chip counts the total size of transfer and stops the transaction when completed.
Pull up from SCSI device

If starting address is not aligned to 8B boundary, SDR0 and SDR1 must hold the entire block of 8B in question. SCR works as DMA pointer to indicate which chunk of 2B quantity in SDR0/SDR1 pair is to hold the head portion of transferring data. Starting address of DMA transfer is truncated down to the nearest 8B boundary, and to be instructed by DMAPTR. The first block to be written to memory consists with two portions; SDR0/SDR1 data placed unchanged and the head data of SCSI transfer. Fractional transfer tail less than 8B block size is left unwritten to memory and stored in SDR0/SDR1 pair instead. SCR indicates how many 2B chunks is in subject to fixup in the pair. In this case DMAPTR points the address of 8B block yet to have the residue. DMA driver must fixup the transfer tail by writing the 2B chunks in sequence, at most 3 times, to the destination address.

Case study 2: DEC3000 TCDS

DEC3000 TCDS DMA channel poses 4B constraint. Because Alpha processor enforces 4B alignment on any memory references, it's mostly unnecessary to worry about unaligned DMA starting address. Address is likely comfortably aligned for DMA. The hard case which would matter is that pullup transfers from SCSI device might start at unaligned address or end up with leaving fractional residue less than 4B.

Two registers, DUD0 and DUD1, hold 4B quantity respectively. They consist with 1B size indication and 3B worth of fractional data. DUD0 is for unaligned starting address while DUD1 is for unaligned ending address. DUD0 may have an indication at the least significant byte telling which byte of remaining 3B holds fractional residues to be written to memory. As DMA driver knows the starting address of DMA transfer, it's easy to synthesize the destination with fractionals in DUD0. DUD1 may have an indication at the most significant byte telling which byte of remaining 3B holds the fractional residue left unwritten to memory when DMA transaction has stopped. In this case SDA register indicates the address of 4B block yet to have the residue. DMA driver must fixup the transfer tail by synthesizing the destination 4B with fractional residue in DUD1.

Case study 3: Magnum 3000 RAMBO ASIC

RAMBO DMA channel poses 64B constraint. The DMA takes block base address and block count in pair to start DMA transfer. RAMBO DMA manages physically contiguous section of memory. 64B worth of DMA FIFO buffer is designed to handle unaligned DMA starting address and unaligned ending address.

Pull up from SCSI device

If starting address is not aligned to 64B boundary, transfer block base address is truncated and aligned to the nearest boundary. The correct starting address can be instructed by DMA pointer. Pushing 16bit quantities down to DMA FIFO bumps and adjusts DMA pointer by 2B increment. Then, DMA channel starts filling DMA FIFO with transferring data from the location which DMA pointer indicates. The first block of 64B consists with two portions; the pushed down data placed unchanged and the head data of SCSI transfer. Fractional transfer tail less than 64B block is left unwritten to memory. DMA FIFO holds the residue and DMA pointer indicates how many 2B chunk is to be written to the destination, whose address is available in another DMA register.
Push down to SCSI device

If starting address is not aligned to 64B boundary, transfer block base address is truncated and aligned to the nearest boundary.

XXX unable to figure out how the initial block is moved to DMA FIFO XXX

Reading 16bit quantity from DMA FIFO bumps and adjusts DMA pointer by 2B increment. Then, DMA channel starts draining DMA FIFO contents down to SCSI device from the location which DMA pointer indicates. It doesn't matter the unaligned ending address because SCSI controller chip counts the total size of transfer and stops the transaction when completed.

Case study 4: SPARCstation LSI64854 ASIC

XXX

Case study 5: DEC3000 TC SG DMA

High end models of DEC3000 have DMA channel can handle transfers of data populated not contiguously in physical address. Such design is commonly called “scatter-gather DMA”. TC SGMAP is the array store to hold physical addresses, or page frame numbers indeed, of given memory object. DMA driver must fill and prepare the SGMAP array adequately for virtually addressed DMA transfer range prior to DMA operations. Then DMA channel starts and continues transferring looking at SGMAP array entries in sequence. The SGMAP design was inherited by descent generations of Digital models.

Case study 6: ARCS ASIC

This design is unique because the DMA channel can manage virtually addressed DMA transfer. Traditional DMA design can work with physically addressed memory objects because it has no knowledge about address translation scheme of virtually addressed memory objects. In that case DMA driver is in charge of resolving virtually addressed transfer address into physical address prior to DMA operations.

ARCS DMA channel runs virtually addressed DMA transfer by looking at the copy of TLB entries in an array which describes the transferring range to resolve the corresponding physical addresses.

Back to NetBSD Documentation: Kernel