The Division SDF Archive Format

The Original Forum. Game archives, full of resources. How to open them? Get help here.
Post Reply
sgtfrankieboy
ultra-n00b
Posts: 1
Joined: Tue Jan 26, 2016 2:11 am
Been thanked: 2 times

The Division SDF Archive Format

Post by sgtfrankieboy » Tue Jan 26, 2016 12:51 pm

The Division beta is coming up and I've been looking around the archive format. This is what I've
discovered at the moment. I'm still looking into it but I'm not very skilled with figuring out
archive formats.

There are three types of files in the data folder.

SDFVER

sdfver files are text based files. They contain a numeric key value pair separated by spaces.
Each pair is for a different section in the sdfdata files (0 = A, 1 = B, 2 = C, etc.)
The key and value are separated by a colon. Each value consists of 5 numbers divided by a space.

Currently do not know what the numbers mean, I think it has something to do patching / version info.

SDFTOC

sdftoc files are binary based files with the header magic of "WEST". All sdftoc files have a separator(?).
The separator starts with "massive " and end with "ubisoft " (spaces being 0x00) in between there are 20
random characters, this separator is repeated multiple times in a file, and is always at the beginning
of the file at offset 0x1C and at the very end of the file.

Looking at the name I'm assuming the sdftoc files are Table of Content files, the actual content isn't
readable directly.

SDFDATA

sdfdata files are also in a binary format and have the header magic of "BERG". They have the same separator
as sdftoc files but starting separator is at offset 0x08 and also end with it.

I have come across sdfdata files that have plain text in them and bnk files.

A lot of the sdfdata files only contain the word "dummy".

Other Observations

Looking at the data folders, there are two "sdf" and "sdf_streaming".

The sdf folder contains most of the game data and is about 27GB in size for the PC Beta.
In the folder is a single sdftoc and sdfver file both named "sdf". There are 2700 sdfdata files.
Named "sdf-{0}-{1}.sdfdata". {0} = A, B, or C each section is 1000 files large. {1} = 0000-2699

The sdf_streaming data contains a folder for nyc_manhatten with a lot
of tile folders and a global folder each of these folders contain a data folder with the previous metioned
sdf data structure. There is also a "tileconfig.txt" file which contains a JSON like structure with the following
properties "hasGlobal", "tileSize", and "tiles" which is an array with three numeric values per item.

I have a high feeling this data structure is meant to be mounted similar to a hard drive.

d875j
advanced
Posts: 40
Joined: Sun Jun 17, 2012 3:32 am
Has thanked: 3 times
Been thanked: 3 times

Re: The Division SDF Archive Format

Post by d875j » Wed Feb 03, 2016 2:31 pm

Anything new on this?

bulihack
ultra-n00b
Posts: 2
Joined: Wed Feb 03, 2016 10:06 pm
Been thanked: 2 times

Re: The Division SDF Archive Format

Post by bulihack » Wed Feb 03, 2016 11:01 pm

I hope i can extend the great and investigative post of sgtfrankieboy with some further notes.

My primary interest in this content was the audio/music so i did experiments with the resources as focusing on this particular type of content.
About 260+ of the sdfdata resource files seems to contain both Audiokinetic Wwise RIFF Vorbis and Wwise_SoundBank files as embedded content.

The wwise audio files with RIFF/Wave headers easy enough to split out into individual files using regular splitter tools with header management (vgmtoolbox did a great job on this). The wave headers were "usual" and a very few of them length were illegal causing the splitter to extract multiple wave headers into one file (this only a few files, compared to the some*10k audio files it found and saved properly). Extracting the actual bnk files and using bnkextr might help to get the individual files properly - not tested so far.

Regardless, the ww2ogg has done a nice job using the --pcb packed_codebooks_aoTuV_603.bin switch, then revorbto do the polishing jobs. The ogg files are coming in mono/stereo/surround with VBR and 48khz sample rate.

The music content is just perfect for my taste i made up some playlists for personal joy. All the music is in stereo format, i found no evidence of 5.1 version of any of them. Ambiences are mostly 6 channel ogg files, but stereo and mono are happens sometimes too. Looped music content is very usual, that most likely helps the cinematic experience in the gameplay for sure.

Fun-facts, that some of the dialogues (which was never actually the part of the beta gameplay) are already "here" (spoiler alert heh), and a few lines are text-to-read voice recoded for now. I also found one particular speech line that exists both t2r and real voice actor versions, which (i'm speculate to) all indicates the packages maybe not entirely polished or simply rolling forward without cleaning old/unused content, and no replace will occure in later patches either. However, about 25% of the contents are happens to be binary-duplicates (sometimes more than one duplicate in different sdfdata files) and sometimes alternative versions of a given content (music/speech etc) also occures, that strengthens my suspicions about the largely unpolished packages, and may result in duplicates in other contents (3d models, textures etc) too as well. Such a waste of hdd space.

As a side note, the extracted ogg audio files were always complete and i did not hit missing file ends, which indicates (the high possibility of) that the sdfdata files are always complete. This of course doesn't goes against that any/all of the sdfdata contents would be part of a huge virtual disk file. I also speculate that the sdf_streaming data will most likely be (as the folder structure also indicates) the sliced up open-world model w/o the textures, which helps the snowdrop engine to organize and stream in/out far-parts of the environment with the given lod requirements.

I wish the bests for anyone up to the challenge to decyphering the resource format, and i sure hope this bit of information will also prove to be useful in the process.
:up:
Last edited by bulihack on Fri Feb 05, 2016 7:01 am, edited 1 time in total.

redspike474
n00b
Posts: 10
Joined: Thu Feb 04, 2016 7:24 pm

Re: The Division SDF Archive Format

Post by redspike474 » Thu Feb 04, 2016 7:27 pm

This page herehttp://zenhax.com/viewtopic.php?t=2072 details how to dump everything from the archives,
But that method produces thousands and thousands of unnamed files which makes finding anything specific pretty hard.

Good news is it seems everything is pretty much regular formats, xml, lua, .dat, dds, and the raw shader sources are there aswell.

If someone could make a tool that dumps everything with the correct folder structure and filenames... that would be great!

bulihack
ultra-n00b
Posts: 2
Joined: Wed Feb 03, 2016 10:06 pm
Been thanked: 2 times

Re: The Division SDF Archive Format

Post by bulihack » Fri Feb 05, 2016 7:23 am

offzip does indeed extract some content but it's a hit and miss. ran trhu a few audio filled sdfdata files but all i got was junk instead of wem/bnk. this is actually the expected result since it indicates that some parts are zip'ed while others are uncompressed data streams.

i also have tested the sdf/../sdf.sdftoc file with the offzip tool and it quickly revealed that the table of contents is actually there in a ziped format, but the unpack had a broken result, where only partial filenames can be human-readed in the output stream. this indicates that either the ziped stream is malformed, the offzip is broken, the inflated/deflated stream is encrypted or the zip algorithm is altered which makes it harder to get the proper index file out of the stream. Anyway, this index file seems to contain all filenames embeded in the /sdf file resource files which have to be used to recreate the folder structure. the same applies to the .sdftoc files in the sdf_streaming folders too.

wantoosree
ultra-n00b
Posts: 1
Joined: Mon Feb 22, 2016 8:04 am
Has thanked: 2 times

Re: The Division SDF Archive Format

Post by wantoosree » Mon Feb 22, 2016 8:11 am

Has anyone made any progress on recreating the file structures? I've matched a few files up based on their association with other files, but thats about it.

I've also been unable to find extract the image data, both the pngs and the sprite sheets. This is my first time working with offzip so I don't know whether or not it outputs png from the extracted data but none of the files were extracted as png. I also haven't been able to identify any by the file signature.

Gh0stBlade
Moderator
Posts: 678
Joined: Mon Jul 05, 2010 8:55 pm
Has thanked: 20 times
Been thanked: 316 times

Re: The Division SDF Archive Format

Post by Gh0stBlade » Mon Feb 22, 2016 2:02 pm

I took a look at the SDF format a while ago. It is pretty complex at the moment. Right now, the TOC files need to be decrypted, that's most likely why nobody has written an unpacker because the TOC seems to be obfuscated/encrypted.
Click the thanks button if I helped!

Sir Kane
veteran
Posts: 98
Joined: Mon Aug 06, 2012 4:14 am
Been thanked: 73 times

Re: The Division SDF Archive Format

Post by Sir Kane » Thu Feb 25, 2016 10:38 pm

There's no encryption/obfuscation going on in the TOC file.

Gh0stBlade
Moderator
Posts: 678
Joined: Mon Jul 05, 2010 8:55 pm
Has thanked: 20 times
Been thanked: 316 times

Re: The Division SDF Archive Format

Post by Gh0stBlade » Thu Feb 25, 2016 10:43 pm

Sir Kane wrote:There's no encryption/obfuscation going on in the TOC file.
Then what's this block of data? (:

Image
Click the thanks button if I helped!

Scrapz
ultra-n00b
Posts: 1
Joined: Sat Feb 27, 2016 2:40 am

Re: The Division SDF Archive Format

Post by Scrapz » Sat Feb 27, 2016 2:44 am

offzip takes care of that

Image

Gh0stBlade
Moderator
Posts: 678
Joined: Mon Jul 05, 2010 8:55 pm
Has thanked: 20 times
Been thanked: 316 times

Re: The Division SDF Archive Format

Post by Gh0stBlade » Sat Feb 27, 2016 3:23 am

Scrapz wrote:offzip takes care of that

Image
Actually, no it doesn't because that block I highlighted is NOT zlib compressed data. :)

The problem here is figuring out how to extract files properly without offzip. It's an interesting file format, I've not seen one like this yet. But I'm fairly certain some information is missing in order to unpack the files properly like the offsets and sizes of the data.
Click the thanks button if I helped!

Sir Kane
veteran
Posts: 98
Joined: Mon Aug 06, 2012 4:14 am
Been thanked: 73 times

Re: The Division SDF Archive Format

Post by Sir Kane » Sat Feb 27, 2016 4:38 pm

Names, offsets, sdfdata indices and all that are in the TOC file's zlib compressed chunk. That 0x140 bytes block is probably some signature or something like that.

lunger
ultra-n00b
Posts: 7
Joined: Wed Mar 02, 2016 8:09 pm
Has thanked: 2 times
Been thanked: 1 time

Re: The Division SDF Archive Format

Post by lunger » Wed Mar 02, 2016 8:12 pm

Sir Kane wrote:Names, offsets, sdfdata indices and all that are in the TOC file's zlib compressed chunk. That 0x140 bytes block is probably some signature or something like that.
I wasn't able to get clean enough data to determine that, unless I am not viewing it properly... I used offzip to unzip the sdftoc. Then I opened the single dat file with HxD to view it. I see patterns, but also see data that looks fragmented. Maybe HxD isn't the best viewer for the content.

Example pattern:

Code: Select all

 ®..1.. ..á..C......_discover.mmissionB.
 ®..1'.—.=ã..C......eventtemplate.mmissionB.
 ®..1å.š.Ôä..C.....uO-£..o_verticalslice_sÿ,£.p ,£.cw,£..bar.mmissionB. 

lunger
ultra-n00b
Posts: 7
Joined: Wed Mar 02, 2016 8:09 pm
Has thanked: 2 times
Been thanked: 1 time

Re: The Division SDF Archive Format

Post by lunger » Wed Mar 02, 2016 10:59 pm

After a little more digging, it appears that there is a more specific pattern at play.

In hex:

Code: Select all

42 __ __ AE 1E 00 __
The 42 is what generates the ASCII 'B' at the end of every visible path. 1E is technically considered a Record Separator. Since AE and 1E are always paired, I am assuming that they symbolize the termination of a record.

So a regex pattern for a record match with two sub matches might be:

Code: Select all

(([0-9A-F]+)42([0-9A-F]{4}))AE1E

redspike474
n00b
Posts: 10
Joined: Thu Feb 04, 2016 7:24 pm

Re: The Division SDF Archive Format

Post by redspike474 » Tue Mar 08, 2016 2:04 pm

:)
Last edited by redspike474 on Thu Mar 24, 2016 7:35 pm, edited 1 time in total.

Post Reply