SwiftyBinaryFormatter
Table of Content
SwiftyBinaryFormatter
Warning: This post appears a bit dense at first glance. I promise it’s not that bad, but you may have to pay attention a slight bit more than usual. Also a disclaimer: it’s entirely possible I am not conforming to best practices or missing out on crucial elements as most of this post and library are made from my own personal observational learnings.
I have an (as of this writing, unreleased) project where I needed to export to a binary file format to import into a professional app. While investigating how to do this, I found the interface to work with raw bytes in Swift to be slightly obtuse. To get around this, I wrote SwiftyBinaryFormatter.
How to Use
Let’s say we have a simple file spec that goes something like this…
FOO word processor file spec:
File Structure
Byte Count
Content
4 Bytes
Magic number (0x464f4f00
) (FOO
in ascii)
1 Byte
File Version (1)
2 Bytes
total file chunk count as TwoByte
?
Chunks (see definition below)
Chunk Structure
Byte Count
Content
4 Bytes
Chunk Type (see table)
4 Bytes
Chunk Data Byte Size as Word
? Bytes
Chunk Data
Chunk Types
Type
Value
Description
Meta
0x4d455441
(META
in ascii)
contains a 4 Byte key immediately followed by its value, containing metadata
Body
0x424f4459
(BODY
in ascii)
consists of a null terminated string
Valid Meta Keys
Key
HexValue
Description
AUTH
0x41555448
Associated value is an author of this file
DATE
0x44415445
Associated value is the date the file was created, Double value as seconds since epoch
- Chunk: A chunk is defined as a section of data – in this format, a chunk is either metadata (author/date of creation/etc) or the actual document data itself.
Brace yourself. This will be a large section of code. I’ve added marks and will attempt to summarize following the block:
import SwiftyBinaryFormatter
struct FooFormatExport {
// MARK: - (1) Constants and Properties
/// Magic number to identify this file format as the first few bytes of the file.
let magicHeader = Word(0x464f4f00)
/// Identifies the file version
let version = Byte(1)
/// Provides the total count of all the chunks in this file.
var chunkCount: TwoByte {
// there's always 1 body chunk + a variable amount of meta chunks
TwoByte(metaChunks.count) + 1
}
/// Storage for meta data until it's time to compile down to a binary file
private var metaChunks = [MetaType]()
/// Storage for body data until it's time to compile down to a binary file
private var bodyStorage = Data()
/// Storage for compiled data to cache for potential subsequent access
private var _renderedData: Data?
// MARK: - (2) Nested Helper Types
/// Identifies different chunk types with their corresponding constant value
enum ChunkType: Word {
/// Provides the constant value to identify the chunk type in compiled binary
case meta = 0x4d455441
/// Provides the constant value to identify the chunk type in compiled binary
case body = 0x424f4459
}
/// Identifies different types of meta data for use in meta chunks.
enum MetaType {
/// Identifies and contains the author type of meta data.
case author(author: String)
/// Identifies and contains the creation date type of meta data.
case creationDate(secondsSinceEpoch: Double)
/// Provides the constant value to identify the meta type in compiled binary
var hexKey: Word {
switch self {
case .author:
return 0x41555448
case .creationDate:
return 0x44415445
}
}
}
// MARK: - (3) Data accumulation
/// Accumulates and stores metadata for inclusion when compiling.
mutating func addMetaData(ofType type: MetaType) {
metaChunks.append(type)
// if cached data exists, delete it
_renderedData = nil
}
/// Takes the body string and stores it for future compilation. Overwrites any previously stored value.
mutating func setBody(_ body: String) {
// convert string to data and store it
bodyStorage = strToData(body)
// if cached data exists, delete it
_renderedData = nil
}
/// Could really just convert the string to Data, but this tests more internals
private func strToData(_ string: String) -> Data {
let dataSeq = string.compactMap { letter -> UInt8? in
guard let value = try? Byte(character: letter) else { return nil }
return value
}
return Data(dataSeq)
}
// MARK: - (4) Compilation
private func renderMetaType(_ type: MetaType) -> Data {
// structure: 4 bytes for type, 4 bytes for data byte size, then all data bytes
// convert the value portion to Data
let metaValue: Data
switch type {
case .author(let author):
metaValue = strToData(author)
case .creationDate(let dateValue):
metaValue = Data(dateValue.bytes)
}
// calulate size
let hexKeyBytes = type.hexKey.bytes
let byteCount = hexKeyBytes.count + metaValue.count
// compile data in correct order
var mergedData = Data(ChunkType.meta.rawValue)
mergedData.append(Word(byteCount))
mergedData.append(contentsOf: hexKeyBytes)
mergedData.append(metaValue)
return mergedData
}
/// Compiles all the accumualted information into a single binary blob, suitable for writing to disk or otherwise exporting.
mutating func renderData() -> Data {
// structure: 4 byte magic number, 1 byte file version, 2 bytes for chunk count, finally followed by all the chunks
if let renderedData = _renderedData {
return renderedData
}
// compile the file header, including magic, version, chunk count, and meta info
var renderedData = Data([magicHeader, version, chunkCount])
// compile and append all the meta chunks
for metaChunk in metaChunks {
renderedData.append(renderMetaType(metaChunk))
}
// compile and append the body data
let bodyHeader = Data([ChunkType.body.rawValue, Word(bodyStorage.count + 1)])
let bodyData = bodyHeader + bodyStorage + [0]
renderedData.append(bodyData)
// cache the data so subsequent requests are faster
_renderedData = renderedData
return renderedData
}
}
I’ve put a lot of documentation inline in the code, so I’ll do my best to not simply repeat what’s already there.
Section 1
The Foo file spec has some values that are the same for every file, so the magicHeader
and version
are just hard coded, using SwiftyBinaryFormatter types. chunkCount
is easily calculated from the individual components that make up our resulting file, so that’s implemented as a computed property. This section also contains storage to hold all the meta data and primary content of the file.
Section 2
Again, referring to the Foo file spec, you’ll see there are different chunk types and the meta chunk has sub types. I wrote a couple enums to store their magic number and identify each one, respectively.
Pop quiz: Why didn’t I give MetaType
a RawRepresentable type? Take a guess in the comments!
Section 3
These functions are pretty simple and straightforward. Basically, just accumulate data and store it in a way that can later be rectified into binary.
Worthy of noting though is the strToData
function. It’d be pretty easy to refactor (and I originally had it this way) the code to just use String
‘s built in .data(using: .utf8)
to get the whole thing all at once, which would probably have a performance benefit as well. However, the reason I have it this way is twofold – the first is that this is actual code I’m using in unit tests, and this is a practical way to test the initializer that takes in a Character
type, the second is that this solution is more dynamic and granular than the .data...
function. You see, if the .data
function fails for whatever reason, none of the string is retained and we end up in empty data. However, if we check each character individually, we only omit illegal characters, retaining (hopefully) the majority of the data. In the real world, you’d probably want to somehow communicate to the user that there’s some likely data loss if it fails, but this is a sample. Screw em!
Section 4
This is where the real magic happens. The extensions on all the primitives and Data
provided by SwifyBinaryFormatter all work together to make a fairly simple compiling experience.
Basically, the file header is generated by combining the magicHeader
(Word
), version
(Byte
), and chunkCount
(TwoByte
). Then each piece of meta data stored in metaChunks
is compiled through a similar process, just looped for each meta chunk. Finally the body is compiled and appended to the final blob.
If the changes from basic Swift aren’t obvious to you, Data
underneath everything, is simply an array of UInt8
s (or Byte
s). While understanding that a UInt32
(or Word
) is just 4 UInt8
s grouped together, the method to ungroup them is not exactly straightforward. SwiftyBinaryFormatter makes it easy to just append a Word
to a Data
object, magically handling the ungrouping for you.
Caveats
One thing to keep in mind is that this implementation is only designed as an accumulator – basically, you just keep adding information, never removing or modifying existing data. A more robust system might implement those additional details, but my original needs were only for exporting a binary file to another app. Not to mention that SwiftyBinaryFomatter is kind of designed with the same intention of writing and not reading data (though I wouldn’t be opposed to eventually augmenting that ability either myself or through pull requests – but if you want to make a PR, please dm me somehow first to make sure we are on the same page for implementation details or risk my potential rejection of merging code).
It also bears repeating from the start of this post, it’s entirely possible I am not conforming to best practices or missing out on crucial elements as most of this post and library are made from my own personal observational learnings.
My Brain Hurts
That was a long post. I hope you enjoyed it, or at least learned something. If not, feel free to insult me from afar. You can find the project on GitHub
SwiftyBinaryFormatter
Warning: This post appears a bit dense at first glance. I promise it’s not that bad, but you may have to pay attention a slight bit more than usual. Also a disclaimer: it’s entirely possible I am not conforming to best practices or missing out on crucial elements as most of this post and library are made from my own personal observational learnings.
I have an (as of this writing, unreleased) project where I needed to export to a binary file format to import into a professional app. While investigating how to do this, I found the interface to work with raw bytes in Swift to be slightly obtuse. To get around this, I wrote SwiftyBinaryFormatter.
How to Use
Let’s say we have a simple file spec that goes something like this…
FOO word processor file spec:
File Structure
Byte Count | Content |
---|---|
4 Bytes | Magic number (0x464f4f00 ) (FOO in ascii) |
1 Byte | File Version (1) |
2 Bytes | total file chunk count as TwoByte |
? | Chunks (see definition below) |
Chunk Structure
Byte Count | Content |
---|---|
4 Bytes | Chunk Type (see table) |
4 Bytes | Chunk Data Byte Size as Word |
? Bytes | Chunk Data |
Chunk Types
Type | Value | Description |
---|---|---|
Meta | 0x4d455441 (META in ascii) |
contains a 4 Byte key immediately followed by its value, containing metadata |
Body | 0x424f4459 (BODY in ascii) |
consists of a null terminated string |
Valid Meta Keys
Key | HexValue | Description |
---|---|---|
AUTH | 0x41555448 |
Associated value is an author of this file |
DATE | 0x44415445 |
Associated value is the date the file was created, Double value as seconds since epoch |
- Chunk: A chunk is defined as a section of data – in this format, a chunk is either metadata (author/date of creation/etc) or the actual document data itself.
Brace yourself. This will be a large section of code. I’ve added marks and will attempt to summarize following the block:
import SwiftyBinaryFormatter
struct FooFormatExport {
// MARK: - (1) Constants and Properties
/// Magic number to identify this file format as the first few bytes of the file.
let magicHeader = Word(0x464f4f00)
/// Identifies the file version
let version = Byte(1)
/// Provides the total count of all the chunks in this file.
var chunkCount: TwoByte {
// there's always 1 body chunk + a variable amount of meta chunks
TwoByte(metaChunks.count) + 1
}
/// Storage for meta data until it's time to compile down to a binary file
private var metaChunks = [MetaType]()
/// Storage for body data until it's time to compile down to a binary file
private var bodyStorage = Data()
/// Storage for compiled data to cache for potential subsequent access
private var _renderedData: Data?
// MARK: - (2) Nested Helper Types
/// Identifies different chunk types with their corresponding constant value
enum ChunkType: Word {
/// Provides the constant value to identify the chunk type in compiled binary
case meta = 0x4d455441
/// Provides the constant value to identify the chunk type in compiled binary
case body = 0x424f4459
}
/// Identifies different types of meta data for use in meta chunks.
enum MetaType {
/// Identifies and contains the author type of meta data.
case author(author: String)
/// Identifies and contains the creation date type of meta data.
case creationDate(secondsSinceEpoch: Double)
/// Provides the constant value to identify the meta type in compiled binary
var hexKey: Word {
switch self {
case .author:
return 0x41555448
case .creationDate:
return 0x44415445
}
}
}
// MARK: - (3) Data accumulation
/// Accumulates and stores metadata for inclusion when compiling.
mutating func addMetaData(ofType type: MetaType) {
metaChunks.append(type)
// if cached data exists, delete it
_renderedData = nil
}
/// Takes the body string and stores it for future compilation. Overwrites any previously stored value.
mutating func setBody(_ body: String) {
// convert string to data and store it
bodyStorage = strToData(body)
// if cached data exists, delete it
_renderedData = nil
}
/// Could really just convert the string to Data, but this tests more internals
private func strToData(_ string: String) -> Data {
let dataSeq = string.compactMap { letter -> UInt8? in
guard let value = try? Byte(character: letter) else { return nil }
return value
}
return Data(dataSeq)
}
// MARK: - (4) Compilation
private func renderMetaType(_ type: MetaType) -> Data {
// structure: 4 bytes for type, 4 bytes for data byte size, then all data bytes
// convert the value portion to Data
let metaValue: Data
switch type {
case .author(let author):
metaValue = strToData(author)
case .creationDate(let dateValue):
metaValue = Data(dateValue.bytes)
}
// calulate size
let hexKeyBytes = type.hexKey.bytes
let byteCount = hexKeyBytes.count + metaValue.count
// compile data in correct order
var mergedData = Data(ChunkType.meta.rawValue)
mergedData.append(Word(byteCount))
mergedData.append(contentsOf: hexKeyBytes)
mergedData.append(metaValue)
return mergedData
}
/// Compiles all the accumualted information into a single binary blob, suitable for writing to disk or otherwise exporting.
mutating func renderData() -> Data {
// structure: 4 byte magic number, 1 byte file version, 2 bytes for chunk count, finally followed by all the chunks
if let renderedData = _renderedData {
return renderedData
}
// compile the file header, including magic, version, chunk count, and meta info
var renderedData = Data([magicHeader, version, chunkCount])
// compile and append all the meta chunks
for metaChunk in metaChunks {
renderedData.append(renderMetaType(metaChunk))
}
// compile and append the body data
let bodyHeader = Data([ChunkType.body.rawValue, Word(bodyStorage.count + 1)])
let bodyData = bodyHeader + bodyStorage + [0]
renderedData.append(bodyData)
// cache the data so subsequent requests are faster
_renderedData = renderedData
return renderedData
}
}
I’ve put a lot of documentation inline in the code, so I’ll do my best to not simply repeat what’s already there.
Section 1
The Foo file spec has some values that are the same for every file, so the magicHeader
and version
are just hard coded, using SwiftyBinaryFormatter types. chunkCount
is easily calculated from the individual components that make up our resulting file, so that’s implemented as a computed property. This section also contains storage to hold all the meta data and primary content of the file.
Section 2
Again, referring to the Foo file spec, you’ll see there are different chunk types and the meta chunk has sub types. I wrote a couple enums to store their magic number and identify each one, respectively.
Pop quiz: Why didn’t I give MetaType
a RawRepresentable type? Take a guess in the comments!
Section 3
These functions are pretty simple and straightforward. Basically, just accumulate data and store it in a way that can later be rectified into binary.
Worthy of noting though is the strToData
function. It’d be pretty easy to refactor (and I originally had it this way) the code to just use String
‘s built in .data(using: .utf8)
to get the whole thing all at once, which would probably have a performance benefit as well. However, the reason I have it this way is twofold – the first is that this is actual code I’m using in unit tests, and this is a practical way to test the initializer that takes in a Character
type, the second is that this solution is more dynamic and granular than the .data...
function. You see, if the .data
function fails for whatever reason, none of the string is retained and we end up in empty data. However, if we check each character individually, we only omit illegal characters, retaining (hopefully) the majority of the data. In the real world, you’d probably want to somehow communicate to the user that there’s some likely data loss if it fails, but this is a sample. Screw em!
Section 4
This is where the real magic happens. The extensions on all the primitives and Data
provided by SwifyBinaryFormatter all work together to make a fairly simple compiling experience.
Basically, the file header is generated by combining the magicHeader
(Word
), version
(Byte
), and chunkCount
(TwoByte
). Then each piece of meta data stored in metaChunks
is compiled through a similar process, just looped for each meta chunk. Finally the body is compiled and appended to the final blob.
If the changes from basic Swift aren’t obvious to you, Data
underneath everything, is simply an array of UInt8
s (or Byte
s). While understanding that a UInt32
(or Word
) is just 4 UInt8
s grouped together, the method to ungroup them is not exactly straightforward. SwiftyBinaryFormatter makes it easy to just append a Word
to a Data
object, magically handling the ungrouping for you.
Caveats
One thing to keep in mind is that this implementation is only designed as an accumulator – basically, you just keep adding information, never removing or modifying existing data. A more robust system might implement those additional details, but my original needs were only for exporting a binary file to another app. Not to mention that SwiftyBinaryFomatter is kind of designed with the same intention of writing and not reading data (though I wouldn’t be opposed to eventually augmenting that ability either myself or through pull requests – but if you want to make a PR, please dm me somehow first to make sure we are on the same page for implementation details or risk my potential rejection of merging code).
It also bears repeating from the start of this post, it’s entirely possible I am not conforming to best practices or missing out on crucial elements as most of this post and library are made from my own personal observational learnings.
My Brain Hurts
That was a long post. I hope you enjoyed it, or at least learned something. If not, feel free to insult me from afar. You can find the project on GitHub