The following was posted in comp.os.cpm by Fred Scipione in January 2005. Fred is referring to Bruce's floppy disk routines, in this package, which is distributed by me, Herb Johnson. I will refer any questions or comments to Fred, or Bruce, if you send them to me. For more information, please check my Web site: http://retrotechnology.com/herbs_stuff/s_drives.html Herb Johnson --------------------------------------------------------------- Reading Bruce's code has inspired me to share the details on using loop un-rolling to accommodate 2 MHz Z80's. This allows 'wait-on-read' programmed I/O floppy controllers to be run (with the proper timing margins to accommodate the speed variations when different drives are used for writing and reading). No code changes are needed for faster clocks :-). The sibling code for write transfers and formatting is left as an exercise for the reader. Similar arguments apply for 1MHz CPUs and single density or other 250k bps arrangements. The following assembly code is best viewed with a fixed-width font - ; Suppose you want the same (ROM) code for both 8080 and Z80 CPUs. ; The necessary tight loop timing for DD disk reads on a 2.0 MHz ; Z80 (w/ 1 clock min. wait for input) can be accomplished through ; loop un-rolling and careful attention to timing. ; ; Using a +/-3% speed tolerance, the read rate for a disk written ; on a slow drive and read on fast drive will be 16uS/byte * 94% ; = 15.04uS/byte. Any loop that averages 15.0 uS/byte or better ; will stay synched through the input read 'wait' signal. ; ; The floppy controller buffers the read bytes, allowing nearly ; one full byte interval for read margin before under-runs. It ; is prudent to subtract 2 floppy bit times (or 4uS) and 8 CPU ; clocks from the minimum byte interval to allow for the ; controller overhead and the CPU buss read state duration w/ ; wait variations. Thus, with 4x loop un-rolling, the maximum ; allowed interval from a synched read to a delayed read over ; 'n' bytes is n*15uS - 4uS - 8/clock_rate. ; ; To accommodate 4x loop un-rolling, sector size must be a ; multiple of 4 (which is always the case). Thus, register 'B' ; is loaded w/ 128 or with 0 for 256 bytes per sector if the ; 'Normal:' entry is used. For 512 or 1024 bytes per sector, ; set B to 127 or 255 and enter at 'Special:'. ; ; For 128 bytes per sector, this code can be used to bypass a ; BIOS buffer and transfer directly to a user DMA target at any ; address. ; Normal: ; enter here for 128 or 256 bytes/sector ; Adjust B for 4 pre-loop reads and loop unroll to 4x - MOV A,B RAR RAR DCR A ANI 03Fh MOV B,A Special: ; enter here w/ special values for B ; 4 pre-loop reads to insure CPU synched to floppy on loop entry - IN fdcport ;FDC port MOV M,A ;sector buffer INX H ;next loc. Strange3: ; enter here w/ B set for 4x + 3 bytes IN fdcport ;FDC port MOV M,A ;sector buffer INX H ;next loc. Strange2: ; enter here w/ B set for 4x + 2 bytes IN fdcport ;FDC port MOV M,A ;sector buffer INX H ;next loc. Strange1: ; enter here w/ B set for 4x + 1 bytes IN fdcport ;FDC port MOV M,A ;sector buffer ;Timing w/ 1 clock min. wait on input reads - Floppy$byte: ; clocks @2mhz @4mhz IN fdcport ;FDC port 11 D 5.50 uS 2.75 uS INX H ;next loc. 7 E 3.50 uS 1.75 uS MOV M,A ;sector buffer 7 F 3.50 uS 1.75 uS INX H ;next loc. 7 G 3.50 uS 1.75 uS IN fdcport ;FDC port 11 H 5.50 uS 2.75 uS MOV M,A ;sector buffer 7 3.50 uS 1.75 uS INX H ;next loc. 7 3.50 uS 1.75 uS IN fdcport ;FDC port 11+w 5.50 uS 2.75 uS MOV M,A ;sector buffer 7 3.50 uS 1.75 uS INX H ;next loc. 7 3.50 uS 1.75 uS IN fdcport ;FDC port 11+w 5.50 uS 2.75 uS MOV M,A ;sector buffer 7 A 3.50 uS 1.75 uS DCR B ;byte counter 5 B 2.50 uS 1.25 uS JNZ floppy$byte 10 C 5.00 uS 2.50 uS ; ;Loop Total 115 57.50 uS 28.75 uS ;Loop per-byte avg. 28.75 14.38 uS 7.19 uS ;Max delay of read (A..H/A..D) 65/33 32.50 uS 8.25 uS ;Max allowed @ 15uS/byte-4uS-8clks 37.00 uS N.A. ;Min under-run margin of delayed reads 4.50 uS N.A. ; ; end/tail of read routine goes here ; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; ; Note that 2x loop un-rolling can be used for 128 and 256 ; byte sectors, if the BIOS sector buffer is aligned to end on ; a page boundary and 'ICR L' is used to replace both 'INX H' ; and 'DCR B' (to decrease the loop overhead). Sectors of ; 512 and 1024 bytes can be handled by a series of 2 or 4 such ; loops with a little 'glue' code between loops. The net code ; size is about 1/2 for 128+256 byte sectors, slightly larger if ; 512 bytes is added, and about 2x larger w/ 1024 bytes included. ; Hardware with weird sector sizes can be accomodated through ; special entry values for HL at the proper entry points. ; ; A 4x loop un-roll with this method can be used for a 1/2 ; color-crystal CPU clock speed of 1.79MHz. ; ; At 23 clocks per byte, total loop un-rolling would support a ; 1.64MHz clock rate, but require 1024 bytes of code for 128 ; byte sectors. ; Rd1024: ; enter here for 1024 byte sectors ; 2 pre-loop reads to insure CPU synched to floppy on loop entry - IN fdcport ;FDC port MOV M,A ;sector buffer ICR L ;next loc. Weird3: ; entry for odd sizes 771..1023 IN fdcport ;FDC port MOV M,A ;sector buffer ICR L ;next loc. Loop8x: IN fdcport ;FDC port MOV M,A ;sector buffer ICR L ;next loc. Weird3a: ; entry for odd size 769? IN fdcport ;FDC port MOV M,A ;sector buffer ICR L ;byte counter JNZ Loop8x ; use 2 reads in-line to limit delay while advancing H - Weird2a: ; entry for even sizes 516..768? IN fdcport ;FDC port (33 clock delay here) ICR H ;next page MOV M,A ;sector buffer Weird2: ; entry for odd sizes 515..767 IN fdcport ;FDC port (33 + 23 ==> re-synched) ICR L ;next loc. MOV M,A ;sector buffer ICR L ;next loc. Loop6x: IN fdcport ;FDC port MOV M,A ;sector buffer ICR L ;next loc. Weird2b: ; entry for odd size 513? IN fdcport ;FDC port MOV M,A ;sector buffer ICR L ;byte counter JNZ Loop6x ; use 2 reads in-line to limit delay while advancing H - IN fdcport ;FDC port (33 clock delay here) ICR H ;next page MOV M,A ;sector buffer IN fdcport ;FDC port (33 + 23 ==> re-synched) ICR L ;next loc. MOV M,A ;sector buffer ICR L ;next loc. ; fall through to 512 byte entry - ; Rd512: ; enter here for 512 byte sectors ; 2 pre-loop reads to insure CPU synched to floppy on loop entry - IN fdcport ;FDC port MOV M,A ;sector buffer ICR L ;next loc. Weird1: ; entry for odd sizes 259..511 IN fdcport ;FDC port MOV M,A ;sector buffer ICR L ;next loc. Loop4x: IN fdcport ;FDC port MOV M,A ;sector buffer ICR L ;next loc. Weird1a: ; entry for odd size 257? IN fdcport ;FDC port MOV M,A ;sector buffer ICR L ;byte counter JNZ Loop4x ; use 2 reads in-line to limit delay while advancing H - IN fdcport ;FDC port (33 clock delay here) ICR H ;next page MOV M,A ;sector buffer IN fdcport ;FDC port (33 + 23 ==> re-synched) ICR L ;next loc. MOV M,A ;sector buffer ICR L ;next loc. ; fall through to 256 byte entry - ; Read2x: ; enter here for 128 and 256 byte sectors ; 2 pre-loop reads to insure CPU synched to floppy on loop entry - IN fdcport ;FDC port MOV M,A ;sector buffer ICR L ;next loc. OddSize: ; enter here with HL set for odd sizes 3..255 IN fdcport ;FDC port MOV M,A ;sector buffer ICR L ;next loc. ;Timing w/ 1 clock min. wait on input reads - Loop2x: ; clocks @2mhz @4mhz IN fdcport ;FDC port 11 D 5.50 uS 2.75 uS MOV M,A ;sector buffer 7 3.50 uS 1.75 uS ICR L ;next loc. 5 2.50 uS 1.75 uS IN fdcport ;FDC port 11+w 5.50 uS 2.75 uS MOV M,A ;sector buffer 7 A 3.50 uS 1.75 uS ICR L ;byte counter 5 B 2.50 uS 1.25 uS JNZ Loop2x ; 10 C 5.00 uS 2.50 uS ; ;Loop Total 56 28.00 uS 14.00 uS ;Loop per-byte avg. 28 14.00 uS 7.00 uS ;Max delay of read (A..D) 33 16.50 uS 8.25 uS ;Max allowed @ 15uS/byte-4uS-8clks 22.00 uS N.A. ;Min under-run margin of delayed reads 5.50 uS N.A. ; ; end/tail of read routine goes here ;