Short: MuLib based math speedup patch for 40/60 Author: thomas.richter@alumni.tu-berlin.de (Thomas Richter) Uploader: thomas richter alumni tu-berlin de (Thomas Richter) Type: util/boot Version: 41.3 Requires: util/libs/MMULib.lha Kickstart V37, MuLib 68040/68060 lib Architecture: m68k-amigaos >= 2.0.0 ------------------------------------------------------------------------------ MuRedox is a MuLib based "on the fly" speedup patch for 68040 and 68060 based Amiga boards. The 68040 and 68060 do not implement all instructions of the MC 68K family. The unimplemented instructions - mainly FPU instructions - generate an exception and need to be emulated by the 68040 resp. 68060.library. This is the job of the so- called "FPSP routines" (floating point support package) within the CPU libraries. MuRedox detects these instructions as soon as they generate the emulator exceptions, runs a "just-in-time" compiler that generates a "stub replacement routine" for this specific instruction and patches the replacement routine into the running program. Hence, MuRedox replaces the overhead of the emulator trap on the next use of the same instruction sequence. Therefore, MuRedox requires: - At least an 68040 or an 68060, or an 68LC040 or 68LC060. Emulation of a FPU on an 68030 or below is currently not supported by MuRedox, only through SoftIEEE. The 68881/882 does not require software support through MuRedox anyhow. - The mmu.library which is required to setup the special memory mapping. - For the 68040 or 68060, the fpsp.resource is required. This resource contains the program code required for most unimplemented math functions. This resource is made avail- available by the mmu.library-based 68040 and 68060.library. Therefore, installation of the "MuLib" libraries is *required*. Please see the "MMULib.lha" package on Aminet how to install them. - For the 68LC040 or 68LC060, the softieee.resource is required. This resource is created by SoftIEEE, the FPU emulator. It is therefore necessary to start SoftIEEE before MuRedox can be launched. ------------------------------------------------------------------------------ History: Release 41.3: ------------- The release 41.2 forgot to also enable patches for unimplemented integer instructions of the 68LC060. Fixed. If fmovem is used with an unusual register order as indicated by bit 12 of the extension word, MuRedox built the wrong code. It now leaves the emulation of such an instructions to SoftIEEE. Note that the Motorola manual discourages such instructions. Release 41.2: ------------- This release fixes two bugs: First, for word-sized FPU initiated branches, the branch distance was not included correctly as part of the MuRedox cache tag, thus erraneously substituting branches with wrong distance. Second, the jitter created a register conflict on the fmove fp, instruction family such that the destination could not be reached correctly under all circumstances. Release 41.1: ------------- This release of MuRedox now also supports SoftIEEE, and will replace all FPU instructions on a 68LC040 and 68LC060 by corresponding calls into the softieee.library. The old functionality, i.e. bypassing the FPSP emulation on a full 68040 or 68060 continues to work as it used to be. The expected speedup due to MuRedox on due to its "jitter" is approximately 3-fold. To run MuRedox on an LC processor, you first need to run SoftIEEE to get the FPU emulation functions loaded, and then MuRedox to start the jitter. You should be aware of the following restrictions: - MuRedox currently requires a 68040, 68060, 68LC040 or 68LC060 to operate, even though SoftIEEE also supports FPU emulation on all other processors. FPU emulation for 68020 and 68030 processors may follow later. - MuRedox cannot replace 3 instructions: fsave, frestore and ftrapcc. fsave and frestore take only 2 bytes, unlike all other FPU instructions, unfortunately the call to the replacement function takes 4 bytes, so they cannot be patched. However, these functions are used quite rarely, and the only common point of use is the exec scheduler, which is already patched up by SoftIEEE. ftrapcc cannot be replaced because it can issue an exception, but this instruction is also quite rarely used. - Unlike SoftIEEE, the replacement functions do generate exceptions if FPU exceptions are enabled. That is, they will not call overflow/underflow/ branch-on-unordered and related FPU exception vectors if such exceptions are enabled. SoftIEEE includes a full FPU emulation that includes generating such exceptions. Exception generation would require an additional lengthly test at the end of each floating point instruction which would slow down the execution. - MoRedox currently does not emulate the generation of the INEX1 flag when converting from packed decimal correctly. Instead, if the conversion from packed decimal is inaccurate, it sets the INEX2 flag. This flag is rarely used and it makes quite likely no observable difference, but the issue might be fixed in a later release. Note that SoftIEEE itself emulates INEX1 and INEX2 correctly. Release 40.7: ------------- The emulation function for the 64-bit division instruction missing in the 68060 received a very minor improvement in case the divisor is larger than 16 bits. Release 40.6: ------------- MuRedox now also patches the fmovem.l #immed, instruction which is unsupported on the 68060 - but probably so rarely used that it hardly makes any difference. Though complete is complete.... ------------------------------------------------------------------------------ Top reasons why not to use this program: - It is a hack. MuRedox replaces program code on the fly, hoping that all will go well. This need not to be the case - especially commercial programs may keep a checksum over their code and may fail if their code gets altered. MuRedox will perform such code modifications. - MuRedox will therefore not work for all programs - some incompatibilities should be expected. If you need faster programs, you should rather: - Ask the vendor for a 68060 or 68040 specific release of the program that does not require the software emulated instructions of the 68040 resp. 68060. Typically, these versions will run faster than a 68020/68030 version with MuRedox, anyhow. - Remember: Programs are made fast by fast and smart algorithms, not by your favourite speedup-patch. MuRedox will give some speed impact, in realistic situations in the range of about 10%. Specific bench- marks may show more dramatic improvements, but they typically test situations that are untypical in a real-life situation. Motorola choose less frequently used instructions for software emulation in first place, hence improvements are typically marginal. ------------------------------------------------------------------------------ The THOR-Software Licence (v3, January 2nd 2021) This License applies to the computer programs known as "mmu.library", "MuRedox", "FPSPSnoop" and the corresponding documentation, known as ".readme" files. The "Program", below, refers to such program. The "Archive" refers to the package of distribution, as prepared by the author of the Program, Thomas Richter. Each licensee is addressed as "you". The Program and the data in the archive are freely distributable under the restrictions stated below, but are also Copyright (c) Thomas Richter. Distribution of the Program, the Archive and the data in the Archive by a commercial organization without written permission from the author to any third party is prohibited if any payment is made in connection with such distribution, whether directly (as in payment for a copy of the Program) or indirectly (as in payment for some service related to the Program, or payment for some product or service that includes a copy of the Program "without charge"; these are only examples, and not an exhaustive enumeration of prohibited activities). However, the following methods of distribution involving payment shall not in and of themselves be a violation of this restriction: (i) Distributing the Program on a physical data carrier (e.g. CD-ROM, DVD, USB-Stick, Disk...) provided that: a) the Archive is reproduced entirely and verbatim on such data carrier, including especially this licence agreement; b) the data carrier is made available to the public for a nominal fee only, i.e. for a fee that covers the costs of the data carrier, and shipment of the data carrier; c) a data carrier with the Program installed is made available to the author for free except for shipment costs, and d) provided further that all information on said data carrier is redistributable for non-commercial purposes without charge. Redistribution of a modified version of the Archive, the Program or the contents of the Archive is prohibited in any way, by any organization, regardless whether commercial or non-commercial. Everything must be kept together, in original and unmodified form. Limitations. THE PROGRAM IS PROVIDED TO YOU "AS IS", WITHOUT WARRANTY. THERE IS NO WARRANTY FOR THE PROGRAM, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. IF YOU DO NOT ACCEPT THIS LICENCE, YOU MUST DELETE THE PROGRAM, THE ARCHIVE AND ALL DATA OF THIS ARCHIVE FROM YOUR STORAGE SYSTEM. YOU ACCEPT THIS LICENCE BY USING OR REDISTRIBUTING THE PROGRAM. ______________________________________________________________________________ So long, Thomas Richter (January 2023)