Imploder file formats
These are the file formats used by The Imploder and related utilities.
In this document, all multi-byte values are stored in the big-endian format.
The Imploder
This section has still to be written. The Imploder creates compressed executable files that self-unpack when run. There are several variations: normal Imploder (sub-variants: 3.1, 3.1 pure and 4.0), library Imploder which uses the external "explode.library" (sub-variants: 3.1 and 4.0) and overlayed Imploder which loads the executable at the same time as decrunching it. The compression code is the same as the Disk Imploder and File Imploder, but the Amiga executable file structure has to be reconstituted as well. MC680x0 code to do this can be found here.
DImp
The purpose of the Disk Imploder is to compress the raw disk structure of standard Amiga disks with the Imploder compression algorithm. The file extension ".DMP" is used for a standard Disk Imploder file, the extension ".DEX" is used for a self-extracting Disk Imploder file.
Overall DImp file format
The regular Disk Imploder format is given below. The self-extracting format is simply the same data, preceded by an Amiga executable that will extract the data. An Amiga executable always begins with the 32-bit value 0x3F3 (1011 in decimal). DImp 1.00 self-extracting files have the DImp data at offset 3856 decimal. DImp 2.27 self-extracting files have the DImp data at offset 5796 decimal. For other versions, you should search the entire executable for the "DIMP" identifier of the header.
The regular DImp format comprises the following sections, stored consecutively without gaps, and in the order given.
- DImp header
- DImp information table
- An optional DImp text message
- Zero or more DImp cylinders
DImp header
The DImp header has two 32-bit values. First, the identifing value 0x44494D50, or "DIMP" in ASCII. Secondly, the length of the information table to follow, in bytes. It must be between 4 and 404.
DImp information table
The DImp information table has all metadata regarding the compression and disk structure. The overall length of the table is 404 (0x194) bytes. If the table length given in the header section is less than 404, then only that number of bytes should be retrieved from the DImp file, the remaining bytes must be filled in with zeroes. The table format is as follows:
Offset | Length | Description |
---|---|---|
0x00 | 4 | This is a checksum of all data in the information table, except this checksum field itself. In other words, the checksum of 400 (0x190) bytes of data from offset 0x004 to offset 0x193, inclusive. See the checksum format for more information. |
0x04 | 2 | This is the level of compression used. As all levels can be unpacked with the same code, it is not needed. |
0x06 | 10 | This is an 80 bit bitfield, one bit for each possible cylinder on the compressed disk. The most significant bit of byte 0 represents cylinder 0. The least significant bit of byte 0 represents cylinder 7. The least significant bit of byte 9 represents cylinder 79. If a bit is set, the corresponding cylinder is stored in the DImp file. If a bit is not set, the cylinder is not stored in the DImp file. |
0x10 | 28 | This is an explosion table. It stores the state required for the decompressor to unpack the text message, if present. The actual structure comprises 8 16-bit values used as "base offsets" and 12 8-bit values used as "number of extra bits to read". |
0x2C | 28 | This is another explosion table, which stores the state required for the decompressor to unpack a cylinder. All cylinders use the same explosion table. |
0x48 | 4 | This is the compressed length of the text message. If it is 0, there is no text message present. |
0x4C | 4 | This is the uncompressed length of the text message. |
0x50 | 4 | This is a checksum of the text message, when uncompressed. See the checksum format for more information. |
0x54 | 320 | This is an array of 80 32-bit values, one for each cylinder. See the cylinder section for more information. |
DImp text message
The text message, if present in the DImp file, is simply a stream of Imploder compressed data. The length of this compressed stream is given in the information table at offset 0x4C. The length of the stream when uncompressed is given at offset 0x48. If either of these two values are zero, there is no text message present. The stream should be decompressed with the explosion algorithm, using the explosion table at offset 0x10 in the information table. The resulting uncompressed stream is expected to be printable ISO-8859-1 text, but may feature ANSI codes and Amiga console.device specific escape codes.
DImp cylinder
At this point in the DImp file, anything between 0 and 80 compressed streams are present. Each compressed stream is individually sized and represents one cylinder of an Amiga disk. They are ordered from cylinder 0 to cylinder 79. If a cylinder is not present in the DImp file, it uses no bytes in this section of the DImp file.
An Amiga disk has two sides, 80 tracks, and 11 sectors per track. Each sector is 512 bytes in length. So, a cylinder comprises 22 512-byte sectors, or exactly 11264 bytes of data. The track number is the same for all sectors in a cylinder, and the uncompressed cylinder data is broken into 512 byte sectors in this order: sector 0 on side 0, sector 0 on side 1, sector 1 on side 0, sector 1 on side 1, sector 2 on side 0, sector 2 on side 1, and so on until sector 10 on side 1.
To determine if a cylinder is present in the DImp file, first check the disk bitmap at offset 0x06 in the information table. If the appropriate bit is 0, that cylinder is not present. If the bit is set, then take the appropriate 32 bit entry from the cylinder information array at offset 0x54 in the information table, and interpret it as follows:
- If the entry is 0x00000000, then the cylinder is not present in the file, despite what the disk bitmap said. This happens when an error occurs while reading the disk at compression time.
- If the entry is 0xFFFFFFFF, then the cylinder comprises nothing but zeros. Assume the cylinder expands to 11264 zero bytes, and does not use any bytes from this part of the DImp file for its definition.
- In all other cases, the entry must be broken into the most significant 16 bits and the least significant 16 bits.
- The most significant 16 bits are the compressed size of this cylinder, in bytes. If this value is more than the uncompressed length of a cylinder, 11264, then something is wrong. The DImp utility exits with the message "wierd info-table entry" in this scenario. If this value is exactly 11264, the cylinder is stored uncompressed. Otherwise, the cylinder data is a stream of Imploder compressed data. The uncompressed length of this data is 11264 bytes. The stream should be decompressed with the explosion algorithm, using the explosion table at offset 0x2C in the information table.
- The least significant 16 bits are the least significant 16 bits of the checksum on the cylinder's bytes stored in the file. See the checksum format for more information.
DImp checksum format
If the length of the data to be checksummed in bytes is not a multiple of 2, assume that the length is one byte longer, and that byte's value is 0. The byte's location is at the very end of the checksummed data.
To derive the checksum, interpret the data to be checksummed as a contiguous array of 16-bit, unsigned, big-endian values. Compute the sum of all these values, then add 7. The least significant 32 bits of the result are the checksum value.
FImp
FImp compresses a single file into the following format:
Offset | Length | Description |
---|---|---|
0x00 | 4 | The identifying value 0x494D5021, or "IMP!" in ASCII. Clones of the FImp format use the IDs "ATN!", "BDPI", "CHFI", "Dupa", "EDAM", "FLT!", "M.H.", "PARA" and "RDC9". [1] [2] |
0x04 | 4 | The uncompressed length of the file, in bytes. |
0x08 | 4 | The offset of the following compressed data section: endoffset. Always even. |
0x0C | endoffset - 0x0C | The compressed data section. |
endoffset | 4 | Compressed data longword 3. |
endoffset + 0x04 | 4 | Compressed data longword 2. |
endoffset + 0x08 | 4 | Compressed data longword 1. |
endoffset + 0x0C | 4 | The initial literal run length. |
endoffset + 0x10 | 2 | Bit 15 is an indicator of compressed data length; bits 7-0 are the first byte of the compressed data ("initial bit-buffer"). |
endoffset + 0x12 | 28 | The explosion table (8 16-bit values and 12 8-bit values) |
endoffset + 0x2E | 4 | Unknown; appears to be a checksum of the preceding bytes, but out by a little. |
Re-ordering the data for decompression
The compressed data is not immediately decompressable. The format is designed such that you can load the file, including headers, into a single decompression buffer and decompress it in-place. Because of this, it uses the three longwords (4 bytes each) of the header information as a place to put compressed data, rather than "wasting" 12 bytes.
To reconstitute the data so it can be decompressed with the explosion algorithm, order the data as follows:
Offset | Length | Contents |
---|---|---|
0x00 | 4 | Compressed data longword 1 |
0x04 | 4 | Compressed data longword 2 |
0x08 | 4 | Compressed data longword 3 |
0x0C | endoffset - 0x0C | Compressed data section (maybe includes initial bit-buffer) |
endoffset | 4 | Initial literal run length |
endoffset + 0x04 | 1 | initial bit-buffer, if not in compressed data section |
In a "normal" compressed stream, the first five bytes (at the end of the stream; the stream is read backwards) are the first literal run length and the initial byte for the bit buffer. If the length of the input data is odd, then the 1-byte "initial bit-buffer" is placed after the 4-byte "first literal run" in memory. This way, the 4-byte run is at an even memory address, so it can be read directly by the MC680x0. If the length of the input data is even, then the "initial bit-buffer" comes before the 4-byte "first literal run", so the 4-byte run is still at an even memory address.
In FImp, endoffset is always even, however the length of the compressed data is not always even. So this information is stored in the bit-buffer word. Check the bit-buffer's top bit (bit 15). If it is set, then the length of the compressed data is odd. Place the lower 8 bits of the bit-buffer word as a byte after the initial-literal-run-length and decompress the data with an input length of endoffset + 5. However, if bit 15 is not set, then the length of the compressed data is even, and the final byte of the compressed data section is padding. Write the initial bit-buffer's lower 8 bits into the final byte of the compressed data section (endoffset - 1) and decompress the data with an input length of endoffset + 4.
Example code
The following standard C program will decompress FImp files. It requires the C code of the explosion algorithm, listed below, to be included in a file called "explode.c".
defimp.c |
---|
/* This code is placed in the Public Domain by its author, Stuart Caie */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "explode.c"
char *defimp(unsigned char *in, unsigned int in_len,
unsigned char **out_ptr, unsigned int *out_len_ptr)
{
unsigned int id, end_off, out_len, ok;
unsigned char *out;
*out_ptr = NULL;
if (in_len < 0x30) return "too short to be a FImp file";
id = (in[0x00] << 24) | (in[0x01] << 16) | (in[0x02] << 8) | in[0x03];
out_len = (in[0x04] << 24) | (in[0x05] << 16) | (in[0x06] << 8) | in[0x07];
end_off = (in[0x08] << 24) | (in[0x09] << 16) | (in[0x0A] << 8) | in[0x0B];
/* check for magic ID 'IMP!', or one of the other IDs used by Imploder
clones; ATN!, BDPI, CHFI, Dupa, EDAM, FLT!, M.H., PARA and RDC9 */
if (id != 0x494d5021 && id != 0x41544e21 && id != 0x42445049 &&
id != 0x43484649 && id != 0x44757061 && id != 0x4544414d &&
id != 0x464c5421 && id != 0x4d2e482e && id != 0x50415241 &&
id != 0x52444339)
return "unknown file format";
/* sanity checks */
if (end_off & 1) return "compressed size not even";
if (end_off < 14) return "compressed size too short";
if ((end_off + 0x2E) > in_len) return "not enough input data";
if ((end_off + 0x26) > out_len) return "output length less than input";
/* allocate memory */
if (!(out = malloc(out_len))) return "can't allocate memory";
*out_ptr = out;
*out_len_ptr = out_len;
/* copy input data into output buffer */
memcpy(&out[0x00], &in[end_off + 0x08], 4);
memcpy(&out[0x04], &in[end_off + 0x04], 4);
memcpy(&out[0x08], &in[end_off + 0x00], 4);
memcpy(&out[0x0C], &in[0x0C], end_off - 0x0C);
memcpy(&out[end_off], &in[end_off + 0x0C], 4);
if (in[end_off + 0x10] & 0x80) {
out[end_off + 4] = in[end_off + 0x11];
ok = explode(out, &in[end_off + 0x12], end_off + 5, out_len);
}
else {
out[end_off - 1] = in[end_off + 0x11];
ok = explode(out, &in[end_off + 0x12], end_off + 4, out_len);
}
return ok ? NULL : "can't deplode";
}
size_t filelen;
void *loadfile(char *name) {
void *mem = NULL; FILE *fd;
if ((fd = fopen(name, "rb"))) {
if ((fseek(fd, 0, SEEK_END) == 0) && (filelen = ftell(fd))
&& (fseek(fd, 0, SEEK_SET) == 0) && (mem = malloc(filelen))) {
if (fread(mem, 1, filelen, fd) < filelen) { free(mem); mem = NULL; }
}
fclose(fd);
}
return mem;
}
int savefile(char *name, void *mem, size_t length) {
FILE *fd = fopen(name, "wb");
int ok = fd && (fwrite(mem, 1, length, fd) == length);
if (fd) fclose(fd);
return ok;
}
int main(int argc, char *argv[]) {
unsigned char *in = NULL, *out = NULL;
unsigned int out_len;
char *msg;
if (argc != 3) {
fprintf(stderr, "Usage: %s <input file> <output file>\n", argv[0]);
return 1;
}
/* load file */
if (!(in = loadfile(argv[1]))) {
perror(argv[1]);
return 1;
}
/* deplode file */
if ((msg = defimp(in, filelen, &out, &out_len))) {
fprintf(stderr, "%s: %s\n", argv[1], msg);
free(in); free(out);
return 1;
}
/* save output */
if (!savefile(argv[2], out, out_len)) {
free(in); free(out);
perror(argv[1]);
return 1;
}
free(in); free(out);
return 0;
}
|
Explosion
The "implosion" (compression) algorithm, common to all three formats, is a LZ77-family compressor with static Huffman coding. It creates Imploder compressed data. The "explosion" algorithm is the decompressor for Imploder compressed data. It will be described in full in a later version of this document. For now, only C source code is available.
Example code
explode.c |
---|
/* This code is placed in the Public Domain by its author, Stuart Caie */
/* macro which obtains the next bit from the input bitstream, from MSB to
* LSB, and sets the "bit" variable with the result. If 8 bits have been
* read, fetches another byte from the input bytestream. Equivalent to the
* following M680x0 code:
*
* add.b d3,d3
* bne.b gotbit
* move.b -(a3),d3
* addx.b d3,d3
* gotbit:
*/
#define EXPLODE_GETBIT do { \
bit = bit_buffer & 0x80; \
bit_buffer <<= 1; \
if (!bit_buffer) { \
bit2 = bit; \
bit_buffer = *--i; \
bit = bit_buffer & 0x80; \
bit_buffer <<= 1; \
if (bit2) bit_buffer++; \
} \
} while (0)
static unsigned char explode_literal_base[4] = {
6, 10, 10, 18
};
static unsigned char explode_literal_extrabits[12] = {
1, 1, 1, 1,
2, 3, 3, 4,
4, 5, 7, 14
};
/**
* Decompresses a stream of Imploder-compressed data.
*
* @param buffer a buffer that is large enough to contain all the
* decompressed data. On entry, the buffer should
* contain the entire Imploder-compressed stream at
* offset 0. On successful exit, the buffer will
* contain the decompressed data at offset 0. The
* original buffer contents will be overwritten.
*
* @param table an explosion table, consisting of 8 16-bit
* big-endian "base offset" values and 12 8-bit
* "extra bits" values.
* @param comp_len the compressed length of the data
* @param uncomp_len the decompressed length of the data
*
* @return zero on error, non-zero on success. If successful, the
* buffer contains the decompressed data.
*/
int explode(unsigned char *buffer,
unsigned char *table,
unsigned int comp_len,
unsigned int uncomp_len)
{
unsigned char *i = buffer + comp_len - 5; /* input pointer */
unsigned char *o = buffer + uncomp_len; /* output pointer */
unsigned char *match; /* match pointer */
unsigned char bit_buffer, bit, bit2;
unsigned int literal_len, match_len, selector, x, y;
unsigned int match_base[8];
/* read the 'base' part of the explosion table into native byte order,
* for speed */
for (x = 0; x < 8; x++) {
match_base[x] = (table[x*2] << 8) | table[x*2 + 1];
}
/* get initial bit buffer contents, and first literal length */
if (comp_len & 1) {
bit_buffer = i[4];
literal_len = (i[0] << 24) | (i[1] << 16) | (i[2] << 8) | i[3];
}
else {
bit_buffer = i[0];
literal_len = (i[1] << 24) | (i[2] << 16) | (i[3] << 8) | i[4];
}
while (1) {
/* copy literal run */
if ((o - buffer) < literal_len) return 0; /* enough space? */
while (literal_len--) *--o = *--i;
/* main exit point - after the literal copy */
if (o <= buffer) break;
/* static Huffman encoding of the match length and selector:
*
* 0 -> selector = 0, match_len = 1
* 10 -> selector = 1, match_len = 2
* 110 -> selector = 2, match_len = 3
* 1110 -> selector = 3, match_len = 4
* 11110 -> selector = 3, match_len = 5 + next three bits (5-12)
* 11111 -> selector = 3, match_len = (next input byte)-1 (0-254)
*
*/
EXPLODE_GETBIT;
if (bit) {
EXPLODE_GETBIT;
if (bit) {
EXPLODE_GETBIT;
if (bit) {
selector = 3;
EXPLODE_GETBIT;
if (bit) {
EXPLODE_GETBIT;
if (bit) {
match_len = *--i;
if (match_len == 0) return 0; /* bad input */
match_len--;
}
else {
match_len = 0; EXPLODE_GETBIT; if (bit) match_len++;
match_len <<= 1; EXPLODE_GETBIT; if (bit) match_len++;
match_len <<= 1; EXPLODE_GETBIT; if (bit) match_len++;
match_len += 5;
}
}
else {
match_len = 4;
}
}
else {
selector = 2;
match_len = 3;
}
}
else {
selector = 1;
match_len = 2;
}
}
else {
selector = 0;
match_len = 1;
}
/* another Huffman tuple, for deciding the base value (y) and number
* of extra bits required from the input stream (x) to create the
* length of the next literal run. Selector is 0-3, as previously
* obtained.
*
* 0 -> base = 0, extra = {1,1,1,1}[selector]
* 10 -> base = 2, extra = {2,3,3,4}[selector]
* 11 -> base = {6,10,10,18}[selector] extra = {4,5,7,14}[selector]
*/
y = 0;
x = selector;
EXPLODE_GETBIT;
if (bit) {
EXPLODE_GETBIT;
if (bit) {
y = explode_literal_base[x];
x += 8;
}
else {
y = 2;
x += 4;
}
}
x = explode_literal_extrabits[x];
/* next literal run length: read [x] bits and add [y] */
literal_len = 0;
while (x--) {
literal_len <<= 1;
EXPLODE_GETBIT;
if (bit) literal_len++;
}
literal_len += y;
/* another Huffman tuple, for deciding the match distance: _base and
* _extra are from the explosion table, as passed into the explode
* function.
*
* 0 -> base = 1 extra = _extra[selector + 0]
* 10 -> base = 1 + _base[selector + 0] extra = _extra[selector + 4]
* 11 -> base = 1 + _base[selector + 4] extra = _extra[selector + 8]
*/
match = o + 1;
x = selector;
EXPLODE_GETBIT;
if (bit) {
EXPLODE_GETBIT;
if (bit) {
match += match_base[selector + 4];
x += 8;
}
else {
match += match_base[selector];
x += 4;
}
}
x = table[x + 16];
/* obtain the value of the next [x] extra bits and
* add it to the match offset */
y = 0;
while (x--) {
y <<= 1; EXPLODE_GETBIT; if (bit) y++;
}
match += y;
/* copy match */
if ((o - buffer) < match_len) return 0; /* enough space? */
do { *--o = *--match; } while (match_len--);
}
/* return 1 if we used up all input bytes (as we should) */
return (i == buffer);
}
|