If you have comments on any of the points feel free to respond. If you disagree with something, that is even more interesting to read.
After talking to some people and reading various posts about it, I think there are a couple important points when it comes to reversing a binary format through hex reading that really doesn't get across, or is not given enough emphasis that people actually keep it in the back of their minds when reading any material about it.
If anything, I think these are what people should know before they even start reading about reversing (eg: guide to file formats, etc)
I would summarize it in three points:
First1: Know what you're looking at.
You can't begin to understand something if you don't know where to begin.
2: See the big picture.
If you don't understand something, just skip it. Chances are, you'll either figure it out later, or realize it's not even important. Also, bytes are just bytes. Their values mean very little; you are only interested in what they represent and how they may be used.
3: Start easy.
Some people are ambitious and want to just jump right into their favorite games. Unless it's really easy, you're probably not going to get anywhere, and you'll probably give up quickly. Don't waste your time, get a format out successfully and get a feel of what it's like
The first point may be obvious. In fact, it is totally obvious. So obvious that there really is no need to even mention it.
But the fact that some literature don't bother mentioning it creates a problem, because it is clearly wrong to assume people know what that really means.
If you don't know what you're looking at, how can you begin to try to understand what it is?
It is clear that the contents of an archive isn't the same as a 3D model or a picture.
So to reverse an archive, you should treat it like an archive.
That's obvious, so obviously it's clear that when I say "treat it like an archive," you're thinking "where are the archive-related values that I want?"
It's an archive. What kind of data do you think the values would represent?
If you don't know, then this is a good time to review what an archive is.
Once you know what you're looking for, you should stick to that approach.
A 3D model contains 3D model data. So figure out what "3D model data" could be and start looking for it. (of course, figuring out what it is isn't an easy task)
The second point is only relevant if you actually know what hex is.
If you don't know, then none of this will mean anything to you.
It typically takes 15 seconds to get an idea of what hex is.
It might take a little longer to understand what data types (ints, floats, chars, etc) are and how they are represented (4-bytes, 2-bytes, etc), but it probably takes at most an hour to realize that there are only so many of them and they only come in so many ways.
So then where is the problem? A binary format is just a bunch of data thrown together in some logical way, and all of it is just those data types and those hex values.
My impression is that somewhere along the way, people forget what they are looking at and simply dwell on bytes; I know I do that sometimes when a section makes no sense. Like they are fascinated by how many shorts they can count and not really thinking about what those shorts might represent.
If you were given a format in a nice, easy-to-read text file, I highly doubt people are going to get caught up on how many vowels there are and how often a word appears. So why does that happen with binary files?
Is it because you don't really understand the concept (eg: model, archive, etc)?
Or that you don't really know what you're looking for (eg: model data)?
Or that you don't actually know what the data might be in the first place (eg: it's a model?)?
This is quite mysterious to me.
While a 3D format in a nice text format might clearly say "Vertex: 1.0 1.0 1.0", I don't think seeing three floats in a row is that much different, aside from the absence of "this is a VERTEX".
How do I know 3 floats in a row are vertex coords? I don't.
I just guess that it is, and try it out. And if it looks right, then they're vertex coords.
Don't be afraid to experiment. If you don't like guessing, then you might be better off disassembling routines.
"But I am not good with this sort of thing"
If you're not good at it because
1: you're lazy, or
2: you don't want to start from the basics, or
3: you don't want to learn, or
Then I would suggest giving up and finding something else to do.
This is an obvious point: you start small, refine your skills, pick up techniques, and then proceed to bigger things.
Tools are available to make things easier, but that doesn't change the fact that you need to get a feel for what it's like before cracking a complex format. You also need to learn how to use the tool, so that just adds to the learning curve.
If you don't want to start easy because
1: you're too proud to bother yourself with trivial matters, or
2: why bother with something that's already been done, or
3: you follow some other philosophy that prevents you from starting from the beginning
Then you should also give up.
The point here is that you should start with the basics, whether it is part of your goal or not.
And your favorite games probably aren't the basics.