Valkyria Chronicles 3 - Senjou no Valkyria 3 [PSP]

Need help translating games in other languages? Have your language problems solved here.
[Unknown]
ultra-n00b
Posts: 2
Joined: Mon Jun 04, 2012 8:12 am

Re: Valkyria Chronicles 3 - Senjou no Valkyria 3 [PSP]

Post by [Unknown] » Sat Aug 25, 2012 7:53 pm

rev3rsix wrote:No help? No advice? Nothing?
Here's some information:

Code: Select all

# Valkyria Chronicles 3 - File Formats

## DATA.BIN datafile

This is a PGD-encrypted CPK file.  The file can be decrypted by using JPCSP's
CryptoEngine code, or by simply running the game with certain settings
enabled.

### CPK file

The actual CPK has a general format, and can be worked with using CRIWARE's
crifilesystem utilities.  The game appears to accept CPKs generated by older
versions of the CRIWARE's softare.

The general format seems to consist of:

 * 2048 byte general header.
 * TOC (file records, filenames.)
 * file data (in blocks of 2048 bytes.)

### CPK TOC

The actual structure of the TOC was not really deciphered, except that it has
24-byte records (starting 91 bytes in?) for each file within the CPK.  These
files are identified by filename, and the ID is not important (so they can be
in any order.)

The 24 bytes are 6 big-endian DWORDs, and include the filename pointer, ID,
length, and position of the file data.

## Files inside the CPK

There are several types of files in the CPK, but they all use a common
overall format.  High level, it looks something like this:

 * packet header
   * packet data
   * packet header
     * packet data
 * packet header
   * packet data

The NAD files don't exactly conform to this format, though, for some reason.

### Packet headers

Each packet has a 4 byte identifier, like MTPA.  The header format appears
to follow this format (in little endian):

 * char[4] magic;
 * uint32 packet_size; // not including header, round up to 16.
 * uint32 header_size;
 * uint32 flags;

If the header_size is 32 or greater (except for MSCR packets), the next
16 bytes are as follows:

 * ubyte[4] unknown;
 * uint32 data_size; // not including header, round up to 16.
 * ubyte[4] unknown;
 * ubyte[4] unknown;

### Packet data and sub-packets

For packets with headers of 32 bytes or more, there may be "sub-packets."
These generally look like:

	+------------------------------+
	| Containing Packet Header     |
	+------------------------------+
	| Containing Packet Data       |
	| ...                          |
	|                              |
	| +--------------------------+ |
	| | Sub Packet #1 Header     | |
	| +--------------------------+ |
	| | Sub Packet #1 Data       | |
	| |                          | |
	| +--------------------------+ |
	| | Sub Packet #2 Header     | |
	| +--------------------------+ |
	| | Sub Packet #2 Data       | |
	| |                          | |
	| +--------------------------+ |
	+------------------------------+

Everything is aligned to 16 bytes, which is very convenient for hex editors.
The data size or packet size can be 0, which means there's no data.

### Sub-structures

Even though the files already have a nesting capability for the headers,
sometimes there will be a data packet that is opaque, but itself is just
another file formatted in this same way (with headers and nesting all over
again.)

For example, MLX files (which contain graphics) have an IZCA packet that
works exactly this way.

### XOR encryption

Some files have their data segments encrypted using a rolling XOR.  How
it determines the first byte is not understood, but generally not needed
because the files follow a consistent format.

If the "flags" uint32 in the header has its 19th bit set (0x40000), then
this encryption is being used.

You can simply XOR each byte by the previous byte (pre-encrypted.) This is
easy to decrypt and re-encrypt.

### MTP files (MTPA packets)

MTPA packets are fairly simple and just have the Shift-JIS text with
each byte incremented by one for no apparent reason.

Note that pointers within the data are generally relative to the header.
That is, if the header is 32 bytes, than 0x20 would point to the
beginning of the data.

	struct info_header (16 bytes)
		DWORD unknown5		always 0x4000000f
		DWORD pointer_count	number of pointer records
		DWORD data_size		number of DWORDs each data record is
		DWORD data_count	number of data records

	struct unknown6[]		repeats data_size times
		DWORD unknown7		always <= 2

	<pointer segment>
	struct pointer_record[]	repeats pointer_count times
		DWORD data_pos		pointer into data record segment

	<data_segment>
	struct data_record[]	repeats data_count times
	if data_size = 2
		DWORD id			id of voice data within OD_VOICE.AFS
		DWORD text_pos		position of text within text segment
	if data size = 4
		DWORD flags1?		unknown meaning, varies wildly
		DWORD id			id of voice data within OD_VOICE.AFS
		DWORD text_pos		position of text within text segment
		DWORD flags3?		0x00 or 0x01 with 4 mysterious unique exceptions

	struct unknown8[]		always once?
		DWORD unknown9		unknown meaning
					EACH BYTE INCREMENTED

	<text_segment>
	struct text_record[]	undetermined length?
		UBYTE* shiftjis		text in shift jis, null terminated
					EACH BYTE INCREMENTED

	EACH BYTE INCREMENTED section always multiple of 4 bytes
					
	struct footer_padding
		DWORD padding		always 0x00 (padding to align ENRS)

### MXE files (MXEC packets)

MXEC packets are quite complicated, but consistent.

Note that pointers within the data are generally relative to the header.
That is, if the header is 32 bytes, then 0x20 would point to the
beginning of the data.

		DWORD unknown				varies, xor, doesn't seem important
		DWORD unknown				always 0x60
		DWORD something4_header_ptr	0x00 or pointer to something4 header.
		DWORD something2_header_ptr	0x00 or pointer to something2 header.
		DWORD unknown				meaning unknown, 0x00/0x01.
		DWORD unknown				always 0x00 (ends at 24)
		DWORD unknown				sometimes 0x00 or 0x01? MAYBE something6_count??
		DWORD something6_ptr		pointer to something6 data.
		DWORD[9] unknown			always 0x00 (ends at 68)
		DWORD something1_count		number of something1 records
		DWORD unknown				always 0xA0
		DWORD[13] unknown			always 0x00
	
	something1[]					always something1_count of them.
		DWORD id					seems like an id, counts up...
		DWORD type_ptr				points to ascii identifier in file.
		DWORD length				length of data.
		DWORD data_ptr				points to beginning of data.

	<something1 data>				variable in size, ends at last data_ptr + last length.
		DWORD[] varies				varies per record type.
	(padded to a multiple of 16 bytes.)

	<something4 header> (optional)
		DWORD unknown				always 0x00
		DWORD something4_count		number of something4 records.
		DWORD something4_ptr		pointer to something4 records.
		DWORD unknown				always 0x00
		DWORD[12] unknown			always 0x00

	something4[]					always something4_count of them.
		DWORD unknown				increasing, appears pointer like?  unknown meaning.
		DWORD unknown				optional text pointer, sometimes 0x00.
		DWORD something5_count		count of sub-something5's inside the something4.
		DWORD something5_ptr		pointer to the something5 records.
		DWORD[6] unknown			always 0x00
		DWORD unknown				usually 0x00, sometimes 0x01?
		DWORD weird_ptr				0x00 or pointer after text segment.
		DWORD[4] unknown			always 0x00

	something5[]					always something5_count of them PER something4.
		DWORD text_ptr				pointer to an ascii identifier.
		DWORD unknown				seems like a number?  maybe value for text_ptr.
		DWORD data_ptr				points to some extra data.
		DWORD unknown				always 0x00?

	<something5 data>				variable size?
		DWORD unknown				one per pointer?

	(padded to a multiple of 16 bytes after ALL the something5s.)

	<something2 header> (optional)
		DWORD unknown				always 0x00
		DWORD something2_count		count of something2 records.
		DWORD something2_ptr		pointer to something2 records.
		DWORD something3_count		count of something3 records.
		DWORD something3_ptr		pointer to something3 records.
		DWORD[3] unknown			always 0x00

	something2[]					always something2_count of them.
		DWORD[2] unknown			unknown meaning.
		DWORD path_ptr				pointer to path string.
		DWORD filename_ptr			pointer to filename string.
		DWORD[6] unknown			unknown meaning.

	something3[]					always something3_count of them.
		DWORD unknown				increases, seems like a pointer?  doesn't seem to match file?
	(padded to a multiple of 16 bytes.)

	<something6 data> (optional)
		DWORD unknown				unknown meaning.
	(padded to a multiple of 16 bytes.)

	<text starts here>
	(padded to a multiple of 16 bytes.)

	<weird data>
	(padded to a multiple of 16 bytes.)

Not everything is understood.  Some of the records have varying structure
defined by their identifier pointer, and may have string pointers embedded
within those structures.

The text itself is in Shift-JIS, null terminated.

## Comparison with Valkyria Chronicles 2

Valkyria Chroncles 2 seems to use the same data files and format, in general.
The primary difference is that rather than PGD-encrypting the DATA.BIN file,
instead they encrypt each file within the CPK.

Each file within the CPK has a 16-byte header which serves as a key.

The file is treated as a series of sets of 4 DWORDs, and uses the following
basic algorithm:

	uint32[4] key;
	uint32[] data;
	
	for (int i = 0; i < data.length; i++)
	{
		int key_i = i % 4;

		key[key_i] = key[key_i] * 3 + 1;
		data[i] ^= key[key_i];
	}

However, when it hits EOFC packets or other boundaries, it appears to do
something different, so this is not a complete description of the format.
Sorry if that's a bit rough, typed it up as documentation for myself (it's in markdown format, by the way.)

And also, some information on text within the files:

Code: Select all

## Formatting codes

The text supports a few formatting codes which are sometimes used.  For the
most part, it's best to just use the same codes the Japanese text used in
roughly the same places.

Not all codes are supported everywhere.

The codes are:

 * **@lt0**: Left justifies the current and following lines.  Not actually
   used within the game.  This resets e.g. @lt1.
 * **@lt1**: Centers the current and following lines.  Interacts badly with
   pound (#) signs.
 * **@lt2**: Right aligns the current and following lines.  Not actually
   used within the game, and interacts badly with pound (#) signs.
 * **@db#**: # is a number like 1.  Defines the spread for a text shadow,
   the game only uses @db2.
 * **@dt#**: # is a number like 1.  Affects the number of shadows the text
   has drawn (combined with @dc and @db.)
 * **@fcFF000000**: Make text black (FF is alpha, 000000 is color.)
 * **@fci**: Resets the text color back to what it was before.
 * **@dcFF000000**: Show a text shadow (FF is alpha, 000000 is color.)
 * **\n**: The same as an enter.  Sometimes seems to show extra space, so
   as a rule of thumb use it when the Japanese does.
 * **#**: In some dialogs (like tutorials), pound signs wait for X to be
   pressed and start a new screenful of text.  Sometimes there are two in
   a row for this function, for unknown reasons.

Note: the @ codes (except @fci) may need to be terminated by a colon (:).
They keep going as long as there are matching characters (0-9, or in the
case of colors, A-F and a-f as well.)
Well, I'm not sure how that other group's Japanese -> Chinese -> English is gonna turn out, but they seem enthusiastic, and hey, they actually have translators. Me and a friend and fun pulling these things apart, anyway. They probably know all this anyway, not sure why they weren't willing to share.

There's also the font in a file like .BF1 or something. You may need to change it to get your character set (we parsed it in to generate an accurate preview to work with text width issues... no translator, but hey, it's nice to file like we've torn apart the majority of the game.) I would just replace Japanese characters and make your insertion script translate your text to those.

Note that there are generally no text-size limits, but there are a LOT of visual width limits (most are reasonable.) Also, they did a good job of not reusing strings. Unfortunately, there are a few strings in the ELF itself as well. Most of it is in UTF-8, iirc. I think the tank name is in UTF-16LE, though (annoyingly.)

I would ignore the script list, CNK files, MSB files, etc. They're just red herrings. No text there.

What language are you planning to translate to, anyhow?

-[Unknown]

rev3rsix
n00b
Posts: 12
Joined: Mon Jul 23, 2012 10:40 am
Has thanked: 4 times

Re: Valkyria Chronicles 3 - Senjou no Valkyria 3 [PSP]

Post by rev3rsix » Wed Sep 05, 2012 12:05 am

Thanks Unknown for your help.

I'd like to translate in italian language..

[Unknown]
ultra-n00b
Posts: 2
Joined: Mon Jun 04, 2012 8:12 am

Re: Valkyria Chronicles 3 - Senjou no Valkyria 3 [PSP]

Post by [Unknown] » Wed Sep 05, 2012 6:49 am

Cool. I dunno a lick of Italian, but at least it's all Latin characters so it shouldn't be super hard.

Here's a version with some minor cleanup (and being formatted automatically as HTML):
http://wrttn.in/04fb3f

Note that there are several images which contain text in them, some of it weird English, some of it Japanese. You'll want pngquant and gimconv for those if you plan to do them.

And, at least for me, the game doesn't work properly in JPCSP - but even so, it's definitely the easiest way to test things. Things just don't render properly after you go past the first action in a battle. Would love to know if there was some workaround for that, aside from saving and loading.

-[Unknown]

rev3rsix
n00b
Posts: 12
Joined: Mon Jul 23, 2012 10:40 am
Has thanked: 4 times

Re: Valkyria Chronicles 3 - Senjou no Valkyria 3 [PSP]

Post by rev3rsix » Fri Sep 07, 2012 5:39 pm

Well, I don't have your knowledge in exadec, a little help how to extract text and image is welcome!!

ohnhai
ultra-n00b
Posts: 1
Joined: Tue Jan 08, 2013 1:16 pm

Re: Valkyria Chronicles 3 - Senjou no Valkyria 3 [PSP]

Post by ohnhai » Wed Jan 09, 2013 11:58 am

rev3rsix wrote:Thanks Unknown for your help.

I'd like to translate in italian language..
http://vc3translationproject.wordpress.com/

the guys here are translating it into english.. maybe you could help them and get an Italian version on the go too

Post Reply