Here's some information:rev3rsix wrote:No help? No advice? Nothing?
Code: Select all
# Valkyria Chronicles 3 - File Formats
## DATA.BIN datafile
This is a PGD-encrypted CPK file. The file can be decrypted by using JPCSP's
CryptoEngine code, or by simply running the game with certain settings
enabled.
### CPK file
The actual CPK has a general format, and can be worked with using CRIWARE's
crifilesystem utilities. The game appears to accept CPKs generated by older
versions of the CRIWARE's softare.
The general format seems to consist of:
* 2048 byte general header.
* TOC (file records, filenames.)
* file data (in blocks of 2048 bytes.)
### CPK TOC
The actual structure of the TOC was not really deciphered, except that it has
24-byte records (starting 91 bytes in?) for each file within the CPK. These
files are identified by filename, and the ID is not important (so they can be
in any order.)
The 24 bytes are 6 big-endian DWORDs, and include the filename pointer, ID,
length, and position of the file data.
## Files inside the CPK
There are several types of files in the CPK, but they all use a common
overall format. High level, it looks something like this:
* packet header
* packet data
* packet header
* packet data
* packet header
* packet data
The NAD files don't exactly conform to this format, though, for some reason.
### Packet headers
Each packet has a 4 byte identifier, like MTPA. The header format appears
to follow this format (in little endian):
* char[4] magic;
* uint32 packet_size; // not including header, round up to 16.
* uint32 header_size;
* uint32 flags;
If the header_size is 32 or greater (except for MSCR packets), the next
16 bytes are as follows:
* ubyte[4] unknown;
* uint32 data_size; // not including header, round up to 16.
* ubyte[4] unknown;
* ubyte[4] unknown;
### Packet data and sub-packets
For packets with headers of 32 bytes or more, there may be "sub-packets."
These generally look like:
+------------------------------+
| Containing Packet Header |
+------------------------------+
| Containing Packet Data |
| ... |
| |
| +--------------------------+ |
| | Sub Packet #1 Header | |
| +--------------------------+ |
| | Sub Packet #1 Data | |
| | | |
| +--------------------------+ |
| | Sub Packet #2 Header | |
| +--------------------------+ |
| | Sub Packet #2 Data | |
| | | |
| +--------------------------+ |
+------------------------------+
Everything is aligned to 16 bytes, which is very convenient for hex editors.
The data size or packet size can be 0, which means there's no data.
### Sub-structures
Even though the files already have a nesting capability for the headers,
sometimes there will be a data packet that is opaque, but itself is just
another file formatted in this same way (with headers and nesting all over
again.)
For example, MLX files (which contain graphics) have an IZCA packet that
works exactly this way.
### XOR encryption
Some files have their data segments encrypted using a rolling XOR. How
it determines the first byte is not understood, but generally not needed
because the files follow a consistent format.
If the "flags" uint32 in the header has its 19th bit set (0x40000), then
this encryption is being used.
You can simply XOR each byte by the previous byte (pre-encrypted.) This is
easy to decrypt and re-encrypt.
### MTP files (MTPA packets)
MTPA packets are fairly simple and just have the Shift-JIS text with
each byte incremented by one for no apparent reason.
Note that pointers within the data are generally relative to the header.
That is, if the header is 32 bytes, than 0x20 would point to the
beginning of the data.
struct info_header (16 bytes)
DWORD unknown5 always 0x4000000f
DWORD pointer_count number of pointer records
DWORD data_size number of DWORDs each data record is
DWORD data_count number of data records
struct unknown6[] repeats data_size times
DWORD unknown7 always <= 2
<pointer segment>
struct pointer_record[] repeats pointer_count times
DWORD data_pos pointer into data record segment
<data_segment>
struct data_record[] repeats data_count times
if data_size = 2
DWORD id id of voice data within OD_VOICE.AFS
DWORD text_pos position of text within text segment
if data size = 4
DWORD flags1? unknown meaning, varies wildly
DWORD id id of voice data within OD_VOICE.AFS
DWORD text_pos position of text within text segment
DWORD flags3? 0x00 or 0x01 with 4 mysterious unique exceptions
struct unknown8[] always once?
DWORD unknown9 unknown meaning
EACH BYTE INCREMENTED
<text_segment>
struct text_record[] undetermined length?
UBYTE* shiftjis text in shift jis, null terminated
EACH BYTE INCREMENTED
EACH BYTE INCREMENTED section always multiple of 4 bytes
struct footer_padding
DWORD padding always 0x00 (padding to align ENRS)
### MXE files (MXEC packets)
MXEC packets are quite complicated, but consistent.
Note that pointers within the data are generally relative to the header.
That is, if the header is 32 bytes, then 0x20 would point to the
beginning of the data.
DWORD unknown varies, xor, doesn't seem important
DWORD unknown always 0x60
DWORD something4_header_ptr 0x00 or pointer to something4 header.
DWORD something2_header_ptr 0x00 or pointer to something2 header.
DWORD unknown meaning unknown, 0x00/0x01.
DWORD unknown always 0x00 (ends at 24)
DWORD unknown sometimes 0x00 or 0x01? MAYBE something6_count??
DWORD something6_ptr pointer to something6 data.
DWORD[9] unknown always 0x00 (ends at 68)
DWORD something1_count number of something1 records
DWORD unknown always 0xA0
DWORD[13] unknown always 0x00
something1[] always something1_count of them.
DWORD id seems like an id, counts up...
DWORD type_ptr points to ascii identifier in file.
DWORD length length of data.
DWORD data_ptr points to beginning of data.
<something1 data> variable in size, ends at last data_ptr + last length.
DWORD[] varies varies per record type.
(padded to a multiple of 16 bytes.)
<something4 header> (optional)
DWORD unknown always 0x00
DWORD something4_count number of something4 records.
DWORD something4_ptr pointer to something4 records.
DWORD unknown always 0x00
DWORD[12] unknown always 0x00
something4[] always something4_count of them.
DWORD unknown increasing, appears pointer like? unknown meaning.
DWORD unknown optional text pointer, sometimes 0x00.
DWORD something5_count count of sub-something5's inside the something4.
DWORD something5_ptr pointer to the something5 records.
DWORD[6] unknown always 0x00
DWORD unknown usually 0x00, sometimes 0x01?
DWORD weird_ptr 0x00 or pointer after text segment.
DWORD[4] unknown always 0x00
something5[] always something5_count of them PER something4.
DWORD text_ptr pointer to an ascii identifier.
DWORD unknown seems like a number? maybe value for text_ptr.
DWORD data_ptr points to some extra data.
DWORD unknown always 0x00?
<something5 data> variable size?
DWORD unknown one per pointer?
(padded to a multiple of 16 bytes after ALL the something5s.)
<something2 header> (optional)
DWORD unknown always 0x00
DWORD something2_count count of something2 records.
DWORD something2_ptr pointer to something2 records.
DWORD something3_count count of something3 records.
DWORD something3_ptr pointer to something3 records.
DWORD[3] unknown always 0x00
something2[] always something2_count of them.
DWORD[2] unknown unknown meaning.
DWORD path_ptr pointer to path string.
DWORD filename_ptr pointer to filename string.
DWORD[6] unknown unknown meaning.
something3[] always something3_count of them.
DWORD unknown increases, seems like a pointer? doesn't seem to match file?
(padded to a multiple of 16 bytes.)
<something6 data> (optional)
DWORD unknown unknown meaning.
(padded to a multiple of 16 bytes.)
<text starts here>
(padded to a multiple of 16 bytes.)
<weird data>
(padded to a multiple of 16 bytes.)
Not everything is understood. Some of the records have varying structure
defined by their identifier pointer, and may have string pointers embedded
within those structures.
The text itself is in Shift-JIS, null terminated.
## Comparison with Valkyria Chronicles 2
Valkyria Chroncles 2 seems to use the same data files and format, in general.
The primary difference is that rather than PGD-encrypting the DATA.BIN file,
instead they encrypt each file within the CPK.
Each file within the CPK has a 16-byte header which serves as a key.
The file is treated as a series of sets of 4 DWORDs, and uses the following
basic algorithm:
uint32[4] key;
uint32[] data;
for (int i = 0; i < data.length; i++)
{
int key_i = i % 4;
key[key_i] = key[key_i] * 3 + 1;
data[i] ^= key[key_i];
}
However, when it hits EOFC packets or other boundaries, it appears to do
something different, so this is not a complete description of the format.And also, some information on text within the files:
Code: Select all
## Formatting codes
The text supports a few formatting codes which are sometimes used. For the
most part, it's best to just use the same codes the Japanese text used in
roughly the same places.
Not all codes are supported everywhere.
The codes are:
* **@lt0**: Left justifies the current and following lines. Not actually
used within the game. This resets e.g. @lt1.
* **@lt1**: Centers the current and following lines. Interacts badly with
pound (#) signs.
* **@lt2**: Right aligns the current and following lines. Not actually
used within the game, and interacts badly with pound (#) signs.
* **@db#**: # is a number like 1. Defines the spread for a text shadow,
the game only uses @db2.
* **@dt#**: # is a number like 1. Affects the number of shadows the text
has drawn (combined with @dc and @db.)
* **@fcFF000000**: Make text black (FF is alpha, 000000 is color.)
* **@fci**: Resets the text color back to what it was before.
* **@dcFF000000**: Show a text shadow (FF is alpha, 000000 is color.)
* **\n**: The same as an enter. Sometimes seems to show extra space, so
as a rule of thumb use it when the Japanese does.
* **#**: In some dialogs (like tutorials), pound signs wait for X to be
pressed and start a new screenful of text. Sometimes there are two in
a row for this function, for unknown reasons.
Note: the @ codes (except @fci) may need to be terminated by a colon (:).
They keep going as long as there are matching characters (0-9, or in the
case of colors, A-F and a-f as well.)There's also the font in a file like .BF1 or something. You may need to change it to get your character set (we parsed it in to generate an accurate preview to work with text width issues... no translator, but hey, it's nice to file like we've torn apart the majority of the game.) I would just replace Japanese characters and make your insertion script translate your text to those.
Note that there are generally no text-size limits, but there are a LOT of visual width limits (most are reasonable.) Also, they did a good job of not reusing strings. Unfortunately, there are a few strings in the ELF itself as well. Most of it is in UTF-8, iirc. I think the tank name is in UTF-16LE, though (annoyingly.)
I would ignore the script list, CNK files, MSB files, etc. They're just red herrings. No text there.
What language are you planning to translate to, anyhow?
-[Unknown]
