Short: Update to memory speed tester (OS2.0 required) Author: mlelstv@serpens.swb.de Uploader: mlelstv serpens swb de Type: util/moni Architecture: m68k-amigaos bustest is a small utility to measure data transfer speed to and from memory. It requires AmigaOS2.0 or higher. Method: determine a chunk of memory and time read and write operations with the pretty exact CIA timers. After that the overhead of the measurement is compensated. During measurement other tasks are locked out, but _interrupts_ are not. However, interrupt activity shouldn't influence the measurement unless you send lots of data to the serial port or sit on a heavy duty network. What you get ? The performance of a very large sequence of MOVE instructions. Either word or long word moves or bursts of 8 longword moves using the MOVEM instruction are used. Caveats: The 68040 and 68060 usually uses a _copyback_ cache. This means, that _writing_ causes the cache to _read_ the data first (and later _write_ it) thus halving write performance when writing large blocks. Small blocks like local variables usually fit into the cache and are just read once which is the reason for the good performance of the copyback cache. But.. block writes are the worst case. The 68040 has a special MOVE16 instruction to circumvent that problem, but it is not used in the AmigaOS (nor in this test program). The 68040 and 68060 have rather large caches and especially the 68060 uses an effective pseudo-random cache replacement strategy which bias results if the tested memory area isn't much larger than the cache. This is the reason why previous versions of bustest yielded too high transfer rates for cached memory. The 16k area still generated more than 50% cache hits in case of the 68060. How to use: bustest ADDR/K,SIZE/K,CHIP/S,FAST/S,ROM/S,MEGA/S ADDR/K hexadecimal start address of a memory region to test. This address can specify memory that does not appear in the system memory list (like a graphics frame buffer) or memory in the free list. bustest tries to check the address and will refuse to test it when it believes that the memory is used by anything else. BUT: it is possible to specify addresses that are not mapped to anything, resulting in thousands of bus errors. So be careful when using that option. SIZE/K size of the test region in bytes. You can use the suffix 'k' to specify units of 2^10 bytes and 'm' to specify units of 2^20 bytes. The size is rounded up to a multiple of 128 which is the granularity of the test loop. The default is 256 kilobytes. CHIP/S test an AllocMem'd region in chip memory. FAST/S test an AllocMem'd region in fast memory. ROM/S test the region starting at address $00F80000, i.e the kickstart ROM (or the RAM that is mapped there with the MMU if you use something like CPU FASTROM). MEGA/S show data rates in units of 2^20Byte/s instead of the normal 10^6Byte/s. Examples: this test was done on an A3000 with a CyberStorm Mk2 accelerator board. the ROM is actually a copy of the Kickstart in RAM remapped with the rom2fast program. 1> bustest fast chip rom BusSpeedTest 0.19 (mlelstv) Buffer: 262144 Bytes, Alignment: 32768 ======================================================================== memtype addr op cycle calib bandwidth fast $086F8000 readw 71.5 ns normal 28.0 * 10^6 byte/s fast $086F8000 readl 121.7 ns normal 32.9 * 10^6 byte/s fast $086F8000 readm 121.4 ns normal 33.0 * 10^6 byte/s fast $086F8000 writew 86.9 ns normal 23.0 * 10^6 byte/s fast $086F8000 writel 174.7 ns normal 22.9 * 10^6 byte/s fast $086F8000 writem 174.3 ns normal 22.9 * 10^6 byte/s chip $000B0000 readw 1052.4 ns normal 1.9 * 10^6 byte/s chip $000B0000 readl 1051.8 ns normal 3.8 * 10^6 byte/s chip $000B0000 readm 1052.2 ns normal 3.8 * 10^6 byte/s chip $000B0000 writew 568.4 ns normal 3.5 * 10^6 byte/s chip $000B0000 writel 569.0 ns normal 7.0 * 10^6 byte/s chip $000B0000 writem 569.0 ns normal 7.0 * 10^6 byte/s rom $00F80000 readw 71.5 ns normal 28.0 * 10^6 byte/s rom $00F80000 readl 122.6 ns normal 32.6 * 10^6 byte/s rom $00F80000 readm 121.9 ns normal 32.8 * 10^6 byte/s If you reduce the test area you see the speed of the cache. The 68060 can access the cache at every clock for a maximum speed of 50MHz * 4Byte = 200MByte/s. 1> bustest fast size=2k BusSpeedTest 0.19 (mlelstv) Buffer: 2048 Bytes, Alignment: 32768 ======================================================================== memtype addr op cycle calib bandwidth fast $08640000 readw 20.2 ns normal 98.9 * 10^6 byte/s fast $08640000 readl 20.1 ns normal 198.7 * 10^6 byte/s fast $08640000 readm 20.1 ns normal 199.0 * 10^6 byte/s fast $08640000 writew 20.0 ns normal 100.1 * 10^6 byte/s fast $08640000 writel 20.1 ns normal 198.7 * 10^6 byte/s fast $08640000 writem 20.1 ns normal 198.7 * 10^6 byte/s If you increase the size you see partial cache trashing caused by interrupts that reduce the effective bandwidth. Also, in this case bustest cannot effectively compensate overhead but shows a less precise estimate (calib == "biased") 1> bustest fast size=8k BusSpeedTest 0.19 (mlelstv) Buffer: 8192 Bytes, Alignment: 32768 ======================================================================== memtype addr op cycle calib bandwidth fast $08640000 readw 20.7 ns biased 96.6 * 10^6 byte/s fast $08640000 readl 21.3 ns biased 188.2 * 10^6 byte/s fast $08640000 readm 21.7 ns biased 184.6 * 10^6 byte/s fast $08640000 writew 20.9 ns biased 95.6 * 10^6 byte/s fast $08640000 writel 21.3 ns biased 187.4 * 10^6 byte/s fast $08640000 writem 21.9 ns biased 182.3 * 10^6 byte/s If you increase the size further the cache becomes less effective. Here is the result that compares to the old bustest (version 0.07): 1> bustest fast size=16k BusSpeedTest 0.19 (mlelstv) Buffer: 16384 Bytes, Alignment: 32768 ======================================================================== memtype addr op cycle calib bandwidth fast $08640000 readw 54.2 ns biased 36.9 * 10^6 byte/s fast $08640000 readl 87.2 ns biased 45.9 * 10^6 byte/s fast $08640000 readm 88.5 ns biased 45.2 * 10^6 byte/s fast $08640000 writew 64.8 ns biased 30.9 * 10^6 byte/s fast $08640000 writel 122.8 ns biased 32.6 * 10^6 byte/s fast $08640000 writem 124.9 ns biased 32.0 * 10^6 byte/s Here is the result for a test of the motherboard memory (specifying FAST did AllocMem() the test area and the SIMMs on the CyberStorm have a higher priority). 1> bustest addr=07010000 BusSpeedTest 0.19 (mlelstv) Buffer: 262144 Bytes, Alignment: 32768 ======================================================================== memtype addr op cycle calib bandwidth user $07010000 readw 159.3 ns normal 12.6 * 10^6 byte/s user $07010000 readl 299.0 ns normal 13.4 * 10^6 byte/s user $07010000 readm 298.8 ns normal 13.4 * 10^6 byte/s user $07010000 writew 254.5 ns normal 7.9 * 10^6 byte/s user $07010000 writel 511.8 ns normal 7.8 * 10^6 byte/s user $07010000 writem 507.5 ns normal 7.9 * 10^6 byte/s This test couldn't start at $07000000 because the first few bytes are used up by the memory header. If you try you just get an error message: 1> bustest addr=07000000 Address $07000000 is mapped. Test aborted. Problems: Blitter activity can produce pretty bogus results. The test takes more time than the one in bustest 0.07 because of the larger test area Michael van Elst