Advertisement
TheFastFish

snsnsns

Jun 29th, 2016
462
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.87 KB | None | 0 0
  1. # SnailShrink File Layout #
  2. ## Table of Contents ##
  3. `ehh ill do this eventually`
  4.  
  5. ## Headnotes ##
  6. - SnailShrink files are stored as text-encoded binary data in SmileBASIC TXT files.
  7. - Unless otherwise noted, each entry is one character (16 bits)
  8.  
  9. ## Global Header ##
  10. This header sits at the very beginning of every SnailShrink file.
  11. - Magic number: The string "SNS"
  12. - Global file info bitset
  13. - `vvvv p**** **** ffff`
  14. - `v`: version
  15. - bitset indicating version: `ppss`
  16. - `p`: primary version
  17. - `s`: secondary version
  18. - Ex. if the version is 1.0, this section will be `0100`
  19. - `p`: password flag
  20. - indicates a password was used on this file
  21. - all file contents past the Global Header are encrypted on a per-character basis using a string password as the key
  22. - password hash is not stored; rather, integrity is checked after decryption using the included CRC-16
  23. - `f`: number of files in this archive - 1
  24. - if this field is nonzero, it is a SnailShrink archive (contains multiple files)
  25. - the "minus 1" means that if the field is 0, the file contains one file, if it's 1 it contains 2, etc.
  26. - a SnailShrink file can't contain the contents of no files, so this was a logical decision to maximize file limit (16) without using more bits
  27. - `*`: unused/reserved
  28. - Stuff might go into these fields in the future, or I may just use more of these bits to increase the archive size limit. Who knows?
  29. - CRC-16 of remainder of file
  30. - if a password is used, this is pre-encryption
  31. - in this way it doubles as an integrity check and makes sure you used the right password
  32.  
  33. ## File Contents ##
  34. The contents of this section depend on the files that were compressed and are now contained in this file. If the SnailShrink file is an archive (it contains multple files), it will contain multiple content blocks.
  35. ### Info Bitset ###
  36. `ttss ddcc ffff tttt`
  37. - `t`: filetype
  38. - `00`: TXT
  39. - `01`: PRG
  40. - `10`: DAT
  41. - `11`: GRP
  42. - `s`: subtype
  43. - denotes variable type of DAT file
  44. - `00`: int
  45. - `01`: real
  46. - `10`: unsigned short
  47. - specifies that the input DAT was an integer DAT whose values are in the range of 0 - 65535
  48. - useful for compressing data from `GSAVE` or `BGSAVE`
  49. - `11`: unused/reserved
  50. - subtypes may be added for other file types in a future spec
  51. - variable-size GRPs are a possibility
  52. - `c`: encoding method
  53. - automatically determined based on the complexity of the input data
  54. - `00`: single-value RLE
  55. - entire file contents only consist of a single unique value repeated
  56. - file payload will simply consist of this one value, and the length of the input (sint32) for TXT/PRG files
  57. - this reduces files of only one unique value repeated down to one fixed size; for larger files this results in extreme size reduction
  58. - `01`: two unique values
  59. - the input data only contains two unique values
  60. - as a result, using a full Huffman coding scheme is unnecessary
  61. - these two values are stored, and the input data is reduced to a single bitstring
  62. - 0 bits represent the most common value and 1s represent the least common
  63. - `10`: Huffman coding
  64. - the file contains more than two unique values, so Huffman coding is used
  65. - the Huffman tree is stored in a bitstring form, and after that is the input data converted to a Huffman bitstring
  66. - `11`: reserved/unused
  67. - `f`: padding bits on Huffman data
  68. - only checked if Huffman encoding type is used
  69. - `t`: padding bits on Huffman tree
  70. - only checked if Huffman encoding type is used
  71. - really I jammed these here because I didn't want to use 16 bits to encode 8 bits worth of data so they're going in the file header
  72. - it's not like the file header was gonna use those 8 bits
  73. - `d`: dimensions on DAT file - 1
  74. - only checked if source type is DAT
  75.  
  76. ### Filename Field ###
  77. This segment is only present on archive files. Because archives are not planned for this version of the spec, it will not be described.
  78.  
  79. ### DAT Dimension Sizes ###
  80. This segment is only present on DAT type files. It contains *n* 32-bit values decribing the size of each DAT dimension, where *n* is the number of dimensions.
  81.  
  82. ### Payload Block ###
  83. The structure of the payload block depends on the encoding type used.
  84. #### Single-value RLE ####
  85. - The value contained in this file. Encoding depends on filetype.
  86. - Length of file as 32-bit signed int. Only present on filetypes where the size isn't determined elsewhere (TXT and PRG)
  87.  
  88. #### Two-value ####
  89. - Most common value (0 bit). Encoding depends on filetype.
  90. - Least common value (1 bit).
  91. - Bitstring of source file. This is creating by substituting occurences of 0 bit with the most common value, and 1 bit with the least common.
  92.  
  93. #### Huffman Coding ####
  94. - Bitstring form of this file's Huffman tree
  95. - Bitstring of file contents; created by substituting each value in the file with its Huffman code
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement