Today I contributed my piece of research to the unofficial WIM Format Documentation - it’s regarding the WIM header, binary ImageListing format and some of the changes that could be seen in Windows Longhorn 5048’s WIM file (over the previous versions).

Header Format
08 bytes: Signature MSWIM\\0\\0\\0
04 bytes: Header length (including signature length)
04 bytes: WIM File Version
02 bytes: Compression On/Off
02 bytes: Compression Type Flag
04 bytes: LZX/LZNT Window Size
20 bytes: Unknown (Image Hash ?)

The WIM File Version field is used to determine if the unpacking application (mostly XIMAGE — the application that Microsoft uses to create and deploy Windows Images) can handle this file - the most spread version is 0×010A00, and 5048’s one is 0×010C00, so this means there are changes (described below).

The LZX/LZNT Window Size is the compression level - 0×8000 applies to LZX15 - that is a buffer with 2^15 bytes used by the decompressor. Images with only uncompressed data have value of zero.

In 5048’s WIM file header you can see that the header length is 0×74, while the only seen header length before that was 0×60. That’s because there is a new field after the Window Size one and is 0×14-byte long - probably a SHA-160 hash of the whole image data, or something related to NTFS (security).

Other people who research the WIM Format are Rafael Rivera, WinMonKey and ZoRoNaX.

Topic: Windows Image Format Documentation
Post: Here

Another change is that the Image Listing entries in the File Lookup Table now have full SHA-1 hash instead of empty one (as it was in previous versions).

File Lookup Table Entry Format

7 bytes: Compressed file length (if compressed) otherwise Uncompressed file length
01 byte: Type Flag
0×0: Not compressed
0×2: Image Listing
0×4: Compressed

08 bytes: Offset in image
08 bytes: Uncompressed File Length
04 bytes: File Identifier
4/2 bytes: Unknown (4 bytes for WIM files with version 0×01A00, otherwise, 2)
20 bytes: SHA-1 hash of the file data (used for determining if files are different)

In the FLT entries format can be seen a little change - the unknown field is 2-byte long, so it’s a USHORT now.

Binary Image Listing Format (with a few Unknown fields, but still very useful):

UINT: Header length
VAR: the header data

Here come the entries (until end of ImageListing file comes; end of each entry is padded to 8 bytes) format :

ULONG: Raw Entry data length (uncluding header length, that is 0×8 for this ULONG)
UINT: Attributes (ZoRoNaX explained the flags in his original post)

For File/Folder entries (the attributes should contain 0×10 or 0×20, there are 0×1, 0×3 - I think these are NTFS secondary streams)

UINT: Unknown
UINT: FileID
UINT: Unknown
ULONG: Creation Time stamp (C/C++ FILETIME structure)
ULONG: Last Accessed Time stamp (C/C++ FILETIME structure)
ULONG: Last Modified Time stamp (C/C++ FILETIME structure)
UINT: Unknown
USHORT: Unknown (probably separator)
USHORT: Short Name (8.3) Length
USHORT: Long Name Length
Unicode STRING: Short Name,
USHORT: Separator (?) = 0×0
Unicode STRING: Long Name
[PADDING]

If Short Name Length is 0, then the long name can be used as 8.3 one.

UINT is a 4-byte long integer.
ULONG is an 8-byte long integer.
USHORT is a 2-byte long integer. All three integer types are “little endian” and unsigned.
VAR means variable-size data block.
Padding is to 8 bytes.


3 Responses to “Windows Image Format Documentation Update”  

  1. 1 oZZy

    Excellent finds!
    I hope u (or maybe Z, Rafael) will make WIM unpacker (maybe even packer) soon! Who knows! But LZX format is not so hard to understand as i noticed.

    Wish u luck, stan! ;)

  2. 2 osvald

    hello

  1. 1 Stanimir Stoyanov’s Blog » Blog Archive » Approaching Longhorn Beta 1


Leave a Reply



You are currently browsing the weblog archives.

About

Stanimir Stoyanov is a programmer, software beta tester, and Windows enthusiast. Read More...

Currently, he is administering AeroXperience and coding using Visual Studio 2008 on Windows Vista. He is looking forward to testing Windows 7 soon.