Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # SnailShrink File Layout #
- ## Table of Contents ##
- `ehh ill do this eventually`
- ## Headnotes ##
- - SnailShrink files are stored as text-encoded binary data in SmileBASIC TXT files.
- - Unless otherwise noted, each entry is one character (16 bits)
- ## Global Header ##
- This header sits at the very beginning of every SnailShrink file.
- - Magic number: The string "SNS"
- - Global file info bitset
- - `vvvv p**** **** ffff`
- - `v`: version
- - bitset indicating version: `ppss`
- - `p`: primary version
- - `s`: secondary version
- - Ex. if the version is 1.0, this section will be `0100`
- - `p`: password flag
- - indicates a password was used on this file
- - all file contents past the Global Header are encrypted on a per-character basis using a string password as the key
- - password hash is not stored; rather, integrity is checked after decryption using the included CRC-16
- - `f`: number of files in this archive - 1
- - if this field is nonzero, it is a SnailShrink archive (contains multiple files)
- - the "minus 1" means that if the field is 0, the file contains one file, if it's 1 it contains 2, etc.
- - a SnailShrink file can't contain the contents of no files, so this was a logical decision to maximize file limit (16) without using more bits
- - `*`: unused/reserved
- - Stuff might go into these fields in the future, or I may just use more of these bits to increase the archive size limit. Who knows?
- - CRC-16 of remainder of file
- - if a password is used, this is pre-encryption
- - in this way it doubles as an integrity check and makes sure you used the right password
- ## File Contents ##
- The contents of this section depend on the files that were compressed and are now contained in this file. If the SnailShrink file is an archive (it contains multple files), it will contain multiple content blocks.
- ### Info Bitset ###
- `ttss ddcc ffff tttt`
- - `t`: filetype
- - `00`: TXT
- - `01`: PRG
- - `10`: DAT
- - `11`: GRP
- - `s`: subtype
- - denotes variable type of DAT file
- - `00`: int
- - `01`: real
- - `10`: unsigned short
- - specifies that the input DAT was an integer DAT whose values are in the range of 0 - 65535
- - useful for compressing data from `GSAVE` or `BGSAVE`
- - `11`: unused/reserved
- - subtypes may be added for other file types in a future spec
- - variable-size GRPs are a possibility
- - `c`: encoding method
- - automatically determined based on the complexity of the input data
- - `00`: single-value RLE
- - entire file contents only consist of a single unique value repeated
- - file payload will simply consist of this one value, and the length of the input (sint32) for TXT/PRG files
- - this reduces files of only one unique value repeated down to one fixed size; for larger files this results in extreme size reduction
- - `01`: two unique values
- - the input data only contains two unique values
- - as a result, using a full Huffman coding scheme is unnecessary
- - these two values are stored, and the input data is reduced to a single bitstring
- - 0 bits represent the most common value and 1s represent the least common
- - `10`: Huffman coding
- - the file contains more than two unique values, so Huffman coding is used
- - the Huffman tree is stored in a bitstring form, and after that is the input data converted to a Huffman bitstring
- - `11`: reserved/unused
- - `f`: padding bits on Huffman data
- - only checked if Huffman encoding type is used
- - `t`: padding bits on Huffman tree
- - only checked if Huffman encoding type is used
- - really I jammed these here because I didn't want to use 16 bits to encode 8 bits worth of data so they're going in the file header
- - it's not like the file header was gonna use those 8 bits
- - `d`: dimensions on DAT file - 1
- - only checked if source type is DAT
- ### Filename Field ###
- This segment is only present on archive files. Because archives are not planned for this version of the spec, it will not be described.
- ### DAT Dimension Sizes ###
- This segment is only present on DAT type files. It contains *n* 32-bit values decribing the size of each DAT dimension, where *n* is the number of dimensions.
- ### Payload Block ###
- The structure of the payload block depends on the encoding type used.
- #### Single-value RLE ####
- - The value contained in this file. Encoding depends on filetype.
- - Length of file as 32-bit signed int. Only present on filetypes where the size isn't determined elsewhere (TXT and PRG)
- #### Two-value ####
- - Most common value (0 bit). Encoding depends on filetype.
- - Least common value (1 bit).
- - Bitstring of source file. This is creating by substituting occurences of 0 bit with the most common value, and 1 bit with the least common.
- #### Huffman Coding ####
- - Bitstring form of this file's Huffman tree
- - Bitstring of file contents; created by substituting each value in the file with its Huffman code
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement