Optimizing 8080 FDC code for double density, Submitted by Chuck Guzis ---------------------------------------------------------------------- During June 2006 I had some discussion with Chuck Guzis of Sydex, about optimizing some 8080 code to access a floppy controller chip. The code was provided to me by Bruce Jones, base on his original work with the SD Systems Versafloppy II S-100 card for that company many years ago. Links to Bruce's work and discussion of the Versafloppy II are on the SD Systems page of my S-100 Web site: http://www.retrotechnology.com/herbs_stuff/s_sd.html These notes and other similar notes are linked on that page. Herb Johnson From Chuck Guzis, June 1 2006: ------------------------------ Hi Herb, I was reading your page on PIO floppy transfers on the Versafloppy at http://www.retrotechnology.com/herbs_stuff/sdfdc.html and I noted this item for 8080 data transfers, with a note that DD data transfers with a 2MHz 8080 using programmed I/O wasn't possible. floppy$byte: clocks 2mhz IN fdcport ;FDC port 10 5 uS MOV M,A ;buffer 7 3.5 uS INX H ;next loc. 7 3.5 uS JMP floppy$byte 10 5 uS Total 34 17 uS I'm surprised that you didn't carry this forward with a bit of loop unrolling for this result: floppy$byte: clocks 2mhz IN fdcport ;FDC port 10 5 uS MOV M,A ;buffer 7 3.5 uS INX H ;next loc. 7 3.5 uS IN fdcport ;FDC port 10 5 uS MOV M,A ;buffer 7 3.5 uS INX H ;next loc. 7 3.5 uS IN fdcport ;FDC port 10 5 uS MOV M,A ;buffer 7 3.5 uS INX H ;next loc. 7 3.5 uS IN fdcport ;FDC port 10 5 uS MOV M,A ;buffer 7 3.5 uS INX H ;next loc. 7 3.5 uS JMP floppy$byte 10 5 uS Total 106 53 uS ...or about 13 uS per byte--and just under the window for DD data transfers. Unrolling the transfer loop more produces rapidly decreasing returns. The cost is a modest 12 additional bytes. Yes, I know that I'm beating a dead horse 30 years after the fact, but some basic optimization tricks never change. From Herb Johnson: ------------------ But customers still download this code, and they still buy Versafloppy II controllers, and some comp.os.cpm members still want to build systems from old chips! Bruce wrote only that his particular code would not work. I'll edit his description so that it is specific to his code; and I'll include your code and comments above. Note, however, that every four reads a 10us delay due to the jump occurs, as described by Bruce. That delay can be avoided by unrolling the entire sector read, if you have the code space to do so. Loop unrolling is a useful strategy, but sometimes you have to unroll the WHOLE loop, seems to me. In other code and documents provided by Bruce, he also unrolls loops. And, he goes on to say there are other solutions like using DMA. I suppose you could also change the crystal from 4MHz to say 5MHz, run at 2.5MHz clock, and gain back enough time to make the 17us window? Reply from Chuck: ------------------ Hi Herb, Let's try another variation of the 8080 code by getting rid of that ugly 10-cycle JMP: LXI D,sector$buffer LXI H,floppy$byte floppy$byte: clocks 2mhz IN fdcport ;FDC port 10 5 uS STAX D ;buffer 7 3.5 uS INX D ;next loc. 7 3.5 uS PCHL 5 2.5 uS 29 14.5 uS Can we do any better? Well, believe it or not, the cat's still not bald yet. But we have to resort to a little subterfuge. Still, when one's desperate, anything goes. Let's get rid of the INX instruction and save 6 cycles: LXI H,floppy$byte floppy$byte: clocks 2mhz IN fdcport ;FDC port 10 5 uS PUSH PSW ;store 2 bytes 11 5.5 uS PCHL ;loop 5 2.5 uS 26 13 uS So what does this do? Well, it stores the sector data on the stack in reverse order in every other byte. The price of this is twofold. We need enough stack space to store twice the number of bytes in a sector and we need to follow this up with a loop to clean things up, but that's not timing critical. Something like this would do: LXI H,Buffer+Sector$Length LXI B,Sector$Length Clean$up: DCX H ; Begin at the end of buffer POP PSW ; Get a sector byte MOV M,A ; store it DCX B ; keep count MOV A,B ORA C JNZ Clean$Up ; loop But why waste half the bytes in memory? LXI H,floppy$byte floppy$byte: clocks 2mhz IN fdcport ;FDC port 10 5 uS MOV B,A 5 2.5 uS IN fdcport ;FDC port 10 5 uS MOV C,A 5 2.5 uS PUSH B ;store 2 bytes 11 5.5 uS PCHL ;loop 5 2.5 uS 46 23.0 uS Or, 11.5 uS per byte. If we unroll the loop to transfer a complete 256 byte DD sector, the time per byte drops to 10.25 uS per byte. I don't think that it's possible to do better than that with programmed I/O, but I'm open to a challenge! This being said, the floppy controller on my S-100 box is the one from Don Tarbell, which, while it uses PIO to do data transfer, does not support double density. DMA is definitely the way to go, if you have that capability. Take whatever you want from this little exercise. Enjoy! Chuck Herb replies: ------------- Some of these considerations were also discussed by Fred Scipione, and I've posted his notes at: http://www.retrotechnology.com/herbs_stuff/scipione.txt Also, Bruce reviews similar considerations in text files associated with his Versafloppy II code. Some of that code is on my site in a Zip file at: http://www.retrotechnology.com/herbs_stuff/sdbios.zip So with your permission, I'll post your methods as another note! OK? Herb Johnson