The Forum is up for sale: XeNTaX Forum looking for new owner
The Secret Archives of Tunguska
The Secret Archives of Tunguska
In the game "The Secret Files: Tunguska", there are archives with SPR extension.
They are uncompressed archives, and each one has a header of 0x40(64) bytes. In the begining of the header there are 4 bytes that contains SPCD.
You can download the demo of the game and see.
The first question is: how to know where is the begining and the and of each file in the archives?
In some of the archives there are files that begins with SLZX, which are compressed archives.
So the second question is: how to uncompress them and extract their contents?
There is already a topic about it in "game request" forum, but I thought the conversation has to move here.
They are uncompressed archives, and each one has a header of 0x40(64) bytes. In the begining of the header there are 4 bytes that contains SPCD.
You can download the demo of the game and see.
The first question is: how to know where is the begining and the and of each file in the archives?
In some of the archives there are files that begins with SLZX, which are compressed archives.
So the second question is: how to uncompress them and extract their contents?
There is already a topic about it in "game request" forum, but I thought the conversation has to move here.
-
Strobe
- Moderator
- Posts: 411
- Joined: Mon Oct 24, 2005 8:52 am
- Location: Sweden
- Been thanked: 16 times
- Contact:
....
I highly doubt they are in the SLZX amiga packing format.
why would a pc developer use an old amiga packer for their data?
currently makes no sense from my point of view..........
but it would be fun if they really is. =)
why would a pc developer use an old amiga packer for their data?
currently makes no sense from my point of view..........
but it would be fun if they really is. =)
-
Mr.Mouse
- Site Admin
- Posts: 4073
- Joined: Wed Jan 15, 2003 6:45 pm
- Location: Dungeons of Doom
- Has thanked: 450 times
- Been thanked: 680 times
- Contact:
Re: ....
That wouldn't be the first time. In the past developers have used packing methods from other platforms, even old ones. Amiga crunching methods are as valid as todays methods. Most platforms use similar crunching methods. Sometimes a developer tasked with the packing of files uses an old (and therefore FREE) method for fun, and because it's old, not many will know how it works.Strobe wrote:I highly doubt they are in the SLZX amiga packing format.
why would a pc developer use an old amiga packer for their data?
currently makes no sense from my point of view..........
but it would be fun if they really is. =)
Hi there,
the LZX compression algorithm is not as uncommon today as you might think. Compiled HTML Help files (*.chm) use LZX compression to keep file size down. The brand-new Windows Imaging Format (WIM), a form in which Windows Vista will be deployed, can use a LZX algorithm for compression (amongst others).
Therefore I believe it is quite possible that a game developer might utilise this form of compression today. Although the file format will most certainly not be compatible to the one used on Amiga.
However, I'm off to analyze the header of these SLZX chunks... let's see what we can find out.
Greetings,
reima
the LZX compression algorithm is not as uncommon today as you might think. Compiled HTML Help files (*.chm) use LZX compression to keep file size down. The brand-new Windows Imaging Format (WIM), a form in which Windows Vista will be deployed, can use a LZX algorithm for compression (amongst others).
Therefore I believe it is quite possible that a game developer might utilise this form of compression today. Although the file format will most certainly not be compatible to the one used on Amiga.
However, I'm off to analyze the header of these SLZX chunks... let's see what we can find out.
Greetings,
reima
-
Strobe
- Moderator
- Posts: 411
- Joined: Mon Oct 24, 2005 8:52 am
- Location: Sweden
- Been thanked: 16 times
- Contact:
..........
Someone kick me in the head.
I didnt read the LZX out of the SLZX. i took it as the whole word.
i need a vacation........
EDIT: World changed to word. =/ ....jesus.
I didnt read the LZX out of the SLZX. i took it as the whole word.
i need a vacation........
EDIT: World changed to word. =/ ....jesus.
Last edited by Strobe on Mon Sep 11, 2006 9:38 pm, edited 1 time in total.
-
Mr.Mouse
- Site Admin
- Posts: 4073
- Joined: Wed Jan 15, 2003 6:45 pm
- Location: Dungeons of Doom
- Has thanked: 450 times
- Been thanked: 680 times
- Contact:
Exactly, LZX, among other compression formats of today were actually born on yesterday's machines. More then today was there the need to compress files, as the internal memory was often the rate limiting factor, coders had to squeeze the maximum out of the memory that was there. So they created sophisticated compression algorhitms, that were also fast enough to decompress at run-time without interfering with the screen synchronization (there will be those among you who know what I meanreima wrote:Hi there,
the LZX compression algorithm is not as uncommon today as you might think. Compiled HTML Help files (*.chm) use LZX compression to keep file size down. The brand-new Windows Imaging Format (WIM), a form in which Windows Vista will be deployed, can use a LZX algorithm for compression (amongst others).
Therefore I believe it is quite possible that a game developer might utilise this form of compression today. Although the file format will most certainly not be compatible to the one used on Amiga.
However, I'm off to analyze the header of these SLZX chunks... let's see what we can find out.
Greetings,
reima
Hi there!
I finally had some time to take a thorough look at those SLZX files. Here's what I've found out:
The actual data is compressed with a LZSS-ish algorithm, which it a variation of the classical LZ77. LZSS is a dictionary encoding technique. This means that the encoder replaces parts of the data with references to matching data in the dictionary. The dictionary is a history of a fixed number of bytes which already passed through the encoder. A dictionary reference is just a offset/length pair.
This leads to the following, quite simple decoding algorithm:
Legend:
Now watch what happens while decoding:etc.
So much for the basics. But now lets get down to the nitty-gritty details (the fun part!). I'll describe them in a FAQ like manner (because I frequently asked this questions to my dear self :))
Q1: What does a reference in the encoded stream look like?
A1: A reference consists of two bytes (R[0] being the first and R[1] being the second one from now on). To extract offset O and length L of the corresponding part of the dictionary, you must split up R[1] into two fields. The five most significant bits are L, the three least significant bits put in front of the eight bits of R[0] are O. You can use the following algorithm to do that:
O = (R[0] + (R[1] << 8)) & 0x7ff
L = R[1] >> 3
Update: Thinking about it, it would be much simpler if you just interpreted these two bytes as a single 16 bit word (in little endian/Intel byte order). Then the leftmost 5 bits are the length while the rest (the rightmost 11 bits) are the offset.
Q2: What size does the dictionary have?
A2: Looking at Q1 one can see that the offset part of a reference is a 11-bit value ranging between 0 and 2,047. This indicates that the dictionary might be 2,048 bytes (or 2 KB) wide--which it actually is :)
Q3: How do you determine if a chunk is a literal or a reference?
A3: The first byte of the encoded stream is always a flag byte. It describes the layout of the octuplet immediately following it. An octuplet consists of eight parts (hence the name), each of which can be one of the following:
Therefore an octuplet has a size ranging between 8 bytes (only parts of type a) and 24 bytes (only parts of type b).
Some examples for flag bytes and corresponding octuplet byte layouts:
The encoded data stream consists of an arbitrary number of flag byte/octuplet pairs immediately following each other.
Q4: Where does the encoded data start?
A4: The encoded data begins immediately after the file header.
Q5: How is the file header structured?
Q5: Each SLZX encoded file starts with the 4 byte sequence 0x53 0x4C 0x5A 0x58 ('SLZX' in ASCII). After that there are two identical DWORDs each of which stands for the size of the original data (i.e. the decoded data). I don't really know why it's there twice. Anyway, this totals to a file header size of 12 bytes, in case you didn't notice.
Q6: What if the encoder hits the end of the input stream while being in the middle of a octuplet?
A6: In that case the remaining parts of the octuplet are marked as literals in the flag byte and can be filled with whatever you like. The decoder knows these padding bytes are to be ignored because of the original data size stored in the header.
Well... I think that's it! Now you should be able to implement a simple decoder for SLZX files. Please feel free to ask any questions which might come up.
As for the format of the SPR container files... didn't have a closer look at them yet. But I will.
Greetings,
reima
I finally had some time to take a thorough look at those SLZX files. Here's what I've found out:
The actual data is compressed with a LZSS-ish algorithm, which it a variation of the classical LZ77. LZSS is a dictionary encoding technique. This means that the encoder replaces parts of the data with references to matching data in the dictionary. The dictionary is a history of a fixed number of bytes which already passed through the encoder. A dictionary reference is just a offset/length pair.
This leads to the following, quite simple decoding algorithm:
- Get the next chunk from the encoded stream
- Determine if this chunk is a literal symbol or a reference
- If it is a literal symbol, just write it straight to the output stream
- If it is a reference, extract offset O and length L. Then copy L bytes from the dictionary to the output stream, starting at position O.
- If there are still unread bytes in the input stream, return to 1.
Legend:
Code: Select all
+- Sliding dictionary window
/
########
O: 0123456789abcdef -- Output stream
D: 89abcdef
\
+-- Dictionary content (? means undetermined)Code: Select all
########
O:
D: ????????
########
O: 0123
D: 0123????
######## -- dictionary is full
O: 01234567
D: 01234567
######## -- window beginning to slide
O: 012345678
D: 12345678
########
O: 0123456789abc
D: 56789abc
########
O: 0123456789abcdef
D: 89abcdefSo much for the basics. But now lets get down to the nitty-gritty details (the fun part!). I'll describe them in a FAQ like manner (because I frequently asked this questions to my dear self :))
Q1: What does a reference in the encoded stream look like?
A1: A reference consists of two bytes (R[0] being the first and R[1] being the second one from now on). To extract offset O and length L of the corresponding part of the dictionary, you must split up R[1] into two fields. The five most significant bits are L, the three least significant bits put in front of the eight bits of R[0] are O. You can use the following algorithm to do that:
O = (R[0] + (R[1] << 8)) & 0x7ff
L = R[1] >> 3
Update: Thinking about it, it would be much simpler if you just interpreted these two bytes as a single 16 bit word (in little endian/Intel byte order). Then the leftmost 5 bits are the length while the rest (the rightmost 11 bits) are the offset.
Q2: What size does the dictionary have?
A2: Looking at Q1 one can see that the offset part of a reference is a 11-bit value ranging between 0 and 2,047. This indicates that the dictionary might be 2,048 bytes (or 2 KB) wide--which it actually is :)
Q3: How do you determine if a chunk is a literal or a reference?
A3: The first byte of the encoded stream is always a flag byte. It describes the layout of the octuplet immediately following it. An octuplet consists of eight parts (hence the name), each of which can be one of the following:
- A single literal: L (= 1 byte)
- A reference followed by a single literal: RRL (= 3 bytes) (two Rs due to the fact that a reference is two bytes wide (see Q1))
Therefore an octuplet has a size ranging between 8 bytes (only parts of type a) and 24 bytes (only parts of type b).
Some examples for flag bytes and corresponding octuplet byte layouts:
Code: Select all
F: 00000000b
O: LLLLLLLL
F: 00000001b
O: RRLLLLLLLL
F: 00000010b
O: LRRLLLLLLL
F: 00100000b
O: LLLLLRRLLL
F: 10011010b
O: LRRLLRRLRRLLLRRL
F: 11111111b
O: RRLRRLRRLRRLRRLRRLRRLRRLQ4: Where does the encoded data start?
A4: The encoded data begins immediately after the file header.
Q5: How is the file header structured?
Q5: Each SLZX encoded file starts with the 4 byte sequence 0x53 0x4C 0x5A 0x58 ('SLZX' in ASCII). After that there are two identical DWORDs each of which stands for the size of the original data (i.e. the decoded data). I don't really know why it's there twice. Anyway, this totals to a file header size of 12 bytes, in case you didn't notice.
Q6: What if the encoder hits the end of the input stream while being in the middle of a octuplet?
A6: In that case the remaining parts of the octuplet are marked as literals in the flag byte and can be filled with whatever you like. The decoder knows these padding bytes are to be ignored because of the original data size stored in the header.
Well... I think that's it! Now you should be able to implement a simple decoder for SLZX files. Please feel free to ask any questions which might come up.
As for the format of the SPR container files... didn't have a closer look at them yet. But I will.
Greetings,
reima
Last edited by reima on Tue Sep 19, 2006 1:34 pm, edited 1 time in total.
- Dinoguy1000
- Site Admin
- Posts: 786
- Joined: Mon Sep 13, 2004 1:55 am
- Has thanked: 154 times
- Been thanked: 163 times
-
john_doe
Not bad 
I really should have taken the time to write the specs for this format.
I've figured out most of the format and made a tool for it some time ago (You can download it from http://gamefileformats.the-underdogs.in ... orer11.zip - it's in German but it's not that complicated I think
)
I'll try to write down the specs when I have some free time.
To Q5: The size is there twice because the first one belongs to the archive reader that calls the correct decompression funtion according to the ID (SLZX etc.). The second size DWORD then is used by the decompression code itself.
I really should have taken the time to write the specs for this format.
I've figured out most of the format and made a tool for it some time ago (You can download it from http://gamefileformats.the-underdogs.in ... orer11.zip - it's in German but it's not that complicated I think
I'll try to write down the specs when I have some free time.
To Q5: The size is there twice because the first one belongs to the archive reader that calls the correct decompression funtion according to the ID (SLZX etc.). The second size DWORD then is used by the decompression code itself.


