Imploder file formats

From ExoticA

These are the file formats used by The Imploder and related utilities.

In this document, all multi-byte values are stored in the big-endian format.

The Imploder

This section has still to be written. The Imploder creates compressed executable files that self-unpack when run. There are several variations: normal Imploder (sub-variants: 3.1, 3.1 pure and 4.0), library Imploder which uses the external "explode.library" (sub-variants: 3.1 and 4.0) and overlayed Imploder which loads the executable at the same time as decrunching it. The compression code is the same as the Disk Imploder and File Imploder, but the Amiga executable file structure has to be reconstituted as well. MC680x0 code to do this can be found here.

DImp

The purpose of the Disk Imploder is to compress the raw disk structure of standard Amiga disks with the Imploder compression algorithm. The file extension ".DMP" is used for a standard Disk Imploder file, the extension ".DEX" is used for a self-extracting Disk Imploder file.

Overall DImp file format

The regular Disk Imploder format is given below. The self-extracting format is simply the same data, preceded by an Amiga executable that will extract the data. An Amiga executable always begins with the 32-bit value 0x3F3 (1011 in decimal). DImp 1.00 self-extracting files have the DImp data at offset 3856 decimal. DImp 2.27 self-extracting files have the DImp data at offset 5796 decimal. For other versions, you should search the entire executable for the "DIMP" identifier of the header.

The regular DImp format comprises the following sections, stored consecutively without gaps, and in the order given.

DImp header

The DImp header has two 32-bit values. First, the identifing value 0x44494D50, or "DIMP" in ASCII. Secondly, the length of the information table to follow, in bytes. It must be between 4 and 404.

DImp information table

The DImp information table has all metadata regarding the compression and disk structure. The overall length of the table is 404 (0x194) bytes. If the table length given in the header section is less than 404, then only that number of bytes should be retrieved from the DImp file, the remaining bytes must be filled in with zeroes. The table format is as follows:

Offset Length Description
0x00 4 This is a checksum of all data in the information table, except this checksum field itself. In other words, the checksum of 400 (0x190) bytes of data from offset 0x004 to offset 0x193, inclusive. See the checksum format for more information.
0x04 2 This is the level of compression used. As all levels can be unpacked with the same code, it is not needed.
0x06 10 This is an 80 bit bitfield, one bit for each possible cylinder on the compressed disk. The most significant bit of byte 0 represents cylinder 0. The least significant bit of byte 0 represents cylinder 7. The least significant bit of byte 9 represents cylinder 79. If a bit is set, the corresponding cylinder is stored in the DImp file. If a bit is not set, the cylinder is not stored in the DImp file.
0x10 28 This is an explosion table. It stores the state required for the decompressor to unpack the text message, if present. The actual structure comprises 8 16-bit values used as "base offsets" and 12 8-bit values used as "number of extra bits to read".
0x2C 28 This is another explosion table, which stores the state required for the decompressor to unpack a cylinder. All cylinders use the same explosion table.
0x48 4 This is the compressed length of the text message. If it is 0, there is no text message present.
0x4C 4 This is the uncompressed length of the text message.
0x50 4 This is a checksum of the text message, when uncompressed. See the checksum format for more information.
0x54 320 This is an array of 80 32-bit values, one for each cylinder. See the cylinder section for more information.

DImp text message

The text message, if present in the DImp file, is simply a stream of Imploder compressed data. The length of this compressed stream is given in the information table at offset 0x4C. The length of the stream when uncompressed is given at offset 0x48. If either of these two values are zero, there is no text message present. The stream should be decompressed with the explosion algorithm, using the explosion table at offset 0x10 in the information table. The resulting uncompressed stream is expected to be printable ISO-8859-1 text, but may feature ANSI codes and Amiga console.device specific escape codes.

DImp cylinder

At this point in the DImp file, anything between 0 and 80 compressed streams are present. Each compressed stream is individually sized and represents one cylinder of an Amiga disk. They are ordered from cylinder 0 to cylinder 79. If a cylinder is not present in the DImp file, it uses no bytes in this section of the DImp file.

An Amiga disk has two sides, 80 tracks, and 11 sectors per track. Each sector is 512 bytes in length. So, a cylinder comprises 22 512-byte sectors, or exactly 11264 bytes of data. The track number is the same for all sectors in a cylinder, and the uncompressed cylinder data is broken into 512 byte sectors in this order: sector 0 on side 0, sector 0 on side 1, sector 1 on side 0, sector 1 on side 1, sector 2 on side 0, sector 2 on side 1, and so on until sector 10 on side 1.

To determine if a cylinder is present in the DImp file, first check the disk bitmap at offset 0x06 in the information table. If the appropriate bit is 0, that cylinder is not present. If the bit is set, then take the appropriate 32 bit entry from the cylinder information array at offset 0x54 in the information table, and interpret it as follows:

  1. If the entry is 0x00000000, then the cylinder is not present in the file, despite what the disk bitmap said. This happens when an error occurs while reading the disk at compression time.
  2. If the entry is 0xFFFFFFFF, then the cylinder comprises nothing but zeros. Assume the cylinder expands to 11264 zero bytes, and does not use any bytes from this part of the DImp file for its definition.
  3. In all other cases, the entry must be broken into the most significant 16 bits and the least significant 16 bits.
    • The most significant 16 bits are the compressed size of this cylinder, in bytes. If this value is more than the uncompressed length of a cylinder, 11264, then something is wrong. The DImp utility exits with the message "wierd info-table entry" in this scenario. If this value is exactly 11264, the cylinder is stored uncompressed. Otherwise, the cylinder data is a stream of Imploder compressed data. The uncompressed length of this data is 11264 bytes. The stream should be decompressed with the explosion algorithm, using the explosion table at offset 0x2C in the information table.
    • The least significant 16 bits are the least significant 16 bits of the checksum on the cylinder's bytes stored in the file. See the checksum format for more information.

DImp checksum format

If the length of the data to be checksummed in bytes is not a multiple of 2, assume that the length is one byte longer, and that byte's value is 0. The byte's location is at the very end of the checksummed data.

To derive the checksum, interpret the data to be checksummed as a contiguous array of 16-bit, unsigned, big-endian values. Compute the sum of all these values, then add 7. The least significant 32 bits of the result are the checksum value.

FImp

FImp compresses a single file into the following format:

Offset Length Description
0x00 4 The identifying value 0x494D5021, or "IMP!" in ASCII. Clones of the FImp format use the IDs "ATN!", "BDPI", "CHFI", "Dupa", "EDAM", "FLT!", "M.H.", "PARA" and "RDC9". [1] [2]
0x04 4 The uncompressed length of the file, in bytes.
0x08 4 The offset of the following compressed data section: endoffset. Always even.
0x0C endoffset - 0x0C The compressed data section.
endoffset 4 Compressed data longword 3.
endoffset + 0x04 4 Compressed data longword 2.
endoffset + 0x08 4 Compressed data longword 1.
endoffset + 0x0C 4 The initial literal run length.
endoffset + 0x10 2 Bit 15 is an indicator of compressed data length; bits 7-0 are the first byte of the compressed data ("initial bit-buffer").
endoffset + 0x12 28 The explosion table (8 16-bit values and 12 8-bit values)
endoffset + 0x2E 4 Unknown; appears to be a checksum of the preceding bytes, but out by a little.

Re-ordering the data for decompression

The compressed data is not immediately decompressable. The format is designed such that you can load the file, including headers, into a single decompression buffer and decompress it in-place. Because of this, it uses the three longwords (4 bytes each) of the header information as a place to put compressed data, rather than "wasting" 12 bytes.

To reconstitute the data so it can be decompressed with the explosion algorithm, order the data as follows:

Offset Length Contents
0x00 4 Compressed data longword 1
0x04 4 Compressed data longword 2
0x08 4 Compressed data longword 3
0x0C endoffset - 0x0C Compressed data section (maybe includes initial bit-buffer)
endoffset 4 Initial literal run length
endoffset + 0x04 1 initial bit-buffer, if not in compressed data section

In a "normal" compressed stream, the first five bytes (at the end of the stream; the stream is read backwards) are the first literal run length and the initial byte for the bit buffer. If the length of the input data is odd, then the 1-byte "initial bit-buffer" is placed after the 4-byte "first literal run" in memory. This way, the 4-byte run is at an even memory address, so it can be read directly by the MC680x0. If the length of the input data is even, then the "initial bit-buffer" comes before the 4-byte "first literal run", so the 4-byte run is still at an even memory address.

In FImp, endoffset is always even, however the length of the compressed data is not always even. So this information is stored in the bit-buffer word. Check the bit-buffer's top bit (bit 15). If it is set, then the length of the compressed data is odd. Place the lower 8 bits of the bit-buffer word as a byte after the initial-literal-run-length and decompress the data with an input length of endoffset + 5. However, if bit 15 is not set, then the length of the compressed data is even, and the final byte of the compressed data section is padding. Write the initial bit-buffer's lower 8 bits into the final byte of the compressed data section (endoffset - 1) and decompress the data with an input length of endoffset + 4.

Example code

The following standard C program will decompress FImp files. It requires the C code of the explosion algorithm, listed below, to be included in a file called "explode.c".

Explosion

The "implosion" (compression) algorithm, common to all three formats, is a LZ77-family compressor with static Huffman coding. It creates Imploder compressed data. The "explosion" algorithm is the decompressor for Imploder compressed data. It will be described in full in a later version of this document. For now, only C source code is available.

Example code