SwiftyBinaryFormatter

Table of Content

SwiftyBinaryFormatter

Warning: This post appears a bit dense at first glance. I promise it’s not that bad, but you may have to pay attention a slight bit more than usual. Also a disclaimer: it’s entirely possible I am not conforming to best practices or missing out on crucial elements as most of this post and library are made from my own personal observational learnings.

I have an (as of this writing, unreleased) project where I needed to export to a binary file format to import into a professional app. While investigating how to do this, I found the interface to work with raw bytes in Swift to be slightly obtuse. To get around this, I wrote SwiftyBinaryFormatter.

How to Use

Let’s say we have a simple file spec that goes something like this…

FOO word processor file spec:

File Structure

Byte Count Content
4 Bytes Magic number (0x464f4f00) (FOO in ascii)
1 Byte File Version (1)
2 Bytes total file chunk count as TwoByte
? Chunks (see definition below)

Chunk Structure

Byte Count Content
4 Bytes Chunk Type (see table)
4 Bytes Chunk Data Byte Size as Word
? Bytes Chunk Data

Chunk Types

Type Value Description
Meta 0x4d455441 (META in ascii) contains a 4 Byte key immediately followed by its value, containing metadata
Body 0x424f4459 (BODY in ascii) consists of a null terminated string

Valid Meta Keys

Key HexValue Description
AUTH 0x41555448 Associated value is an author of this file
DATE 0x44415445 Associated value is the date the file was created, Double value as seconds since epoch
  • Chunk: A chunk is defined as a section of data – in this format, a chunk is either metadata (author/date of creation/etc) or the actual document data itself.

Brace yourself. This will be a large section of code. I’ve added marks and will attempt to summarize following the block:

import SwiftyBinaryFormatter

struct FooFormatExport {
// MARK: - (1) Constants and Properties
    /// Magic number to identify this file format as the first few bytes of the file.
    let magicHeader = Word(0x464f4f00)
    /// Identifies the file version
    let version = Byte(1)
    /// Provides the total count of all the chunks in this file.
    var chunkCount: TwoByte {
        // there's always 1 body chunk + a variable amount of meta chunks
        TwoByte(metaChunks.count) + 1
    }

    /// Storage for meta data until it's time to compile down to a binary file
    private var metaChunks = [MetaType]()
    /// Storage for body data until it's time to compile down to a binary file
    private var bodyStorage = Data()

    /// Storage for compiled data to cache for potential subsequent access
    private var _renderedData: Data?

// MARK: - (2) Nested Helper Types
    /// Identifies different chunk types with their corresponding constant value
    enum ChunkType: Word {
        /// Provides the constant value to identify the chunk type in compiled binary
        case meta = 0x4d455441
        /// Provides the constant value to identify the chunk type in compiled binary
        case body = 0x424f4459
    }

    /// Identifies different types of meta data for use in meta chunks.
    enum MetaType {
        /// Identifies and contains the author type of meta data.
        case author(author: String)
        /// Identifies and contains the creation date type of meta data.
        case creationDate(secondsSinceEpoch: Double)

        /// Provides the constant value to identify the meta type in compiled binary
        var hexKey: Word {
            switch self {
            case .author:
                return 0x41555448
            case .creationDate:
                return 0x44415445
            }
        }
    }

// MARK: - (3) Data accumulation
    /// Accumulates and stores metadata for inclusion when compiling.
    mutating func addMetaData(ofType type: MetaType) {
        metaChunks.append(type)

        // if cached data exists, delete it
        _renderedData = nil
    }

    /// Takes the body string and stores it for future compilation. Overwrites any previously stored value.
    mutating func setBody(_ body: String) {
        // convert string to data and store it
        bodyStorage = strToData(body)

        // if cached data exists, delete it
        _renderedData = nil
    }

    /// Could really just convert the string to Data, but this tests more internals
    private func strToData(_ string: String) -> Data {
        let dataSeq = string.compactMap { letter -> UInt8? in
            guard let value = try? Byte(character: letter) else { return nil }
            return value
        }
        return Data(dataSeq)
    }

// MARK: - (4) Compilation
    private func renderMetaType(_ type: MetaType) -> Data {
        // structure: 4 bytes for type, 4 bytes for data byte size, then all data bytes
        // convert the value portion to Data
        let metaValue: Data
        switch type {
        case .author(let author):
            metaValue = strToData(author)
        case .creationDate(let dateValue):
            metaValue = Data(dateValue.bytes)
        }

        // calulate size
        let hexKeyBytes = type.hexKey.bytes
        let byteCount =  hexKeyBytes.count + metaValue.count

        // compile data in correct order
        var mergedData = Data(ChunkType.meta.rawValue)
        mergedData.append(Word(byteCount))
        mergedData.append(contentsOf: hexKeyBytes)
        mergedData.append(metaValue)

        return mergedData
    }

    /// Compiles all the accumualted information into a single binary blob, suitable for writing to disk or otherwise exporting.
    mutating func renderData() -> Data {
        // structure: 4 byte magic number, 1 byte file version, 2 bytes for chunk count, finally followed by all the chunks
        if let renderedData = _renderedData {
            return renderedData
        }

        // compile the file header, including magic, version, chunk count, and meta info
        var renderedData = Data([magicHeader, version, chunkCount])

        // compile and append all the meta chunks
        for metaChunk in metaChunks {
            renderedData.append(renderMetaType(metaChunk))
        }

        // compile and append the body data
        let bodyHeader = Data([ChunkType.body.rawValue, Word(bodyStorage.count + 1)])
        let bodyData = bodyHeader + bodyStorage + [0]
        renderedData.append(bodyData)

        // cache the data so subsequent requests are faster
        _renderedData = renderedData
        return renderedData
    }
}

I’ve put a lot of documentation inline in the code, so I’ll do my best to not simply repeat what’s already there.

Section 1

The Foo file spec has some values that are the same for every file, so the magicHeader and version are just hard coded, using SwiftyBinaryFormatter types. chunkCount is easily calculated from the individual components that make up our resulting file, so that’s implemented as a computed property. This section also contains storage to hold all the meta data and primary content of the file.

Section 2

Again, referring to the Foo file spec, you’ll see there are different chunk types and the meta chunk has sub types. I wrote a couple enums to store their magic number and identify each one, respectively.

Pop quiz: Why didn’t I give MetaType a RawRepresentable type? Take a guess in the comments!

Section 3

These functions are pretty simple and straightforward. Basically, just accumulate data and store it in a way that can later be rectified into binary.

Worthy of noting though is the strToData function. It’d be pretty easy to refactor (and I originally had it this way) the code to just use String‘s built in .data(using: .utf8) to get the whole thing all at once, which would probably have a performance benefit as well. However, the reason I have it this way is twofold – the first is that this is actual code I’m using in unit tests, and this is a practical way to test the initializer that takes in a Character type, the second is that this solution is more dynamic and granular than the .data... function. You see, if the .data function fails for whatever reason, none of the string is retained and we end up in empty data. However, if we check each character individually, we only omit illegal characters, retaining (hopefully) the majority of the data. In the real world, you’d probably want to somehow communicate to the user that there’s some likely data loss if it fails, but this is a sample. Screw em!

Section 4

This is where the real magic happens. The extensions on all the primitives and Data provided by SwifyBinaryFormatter all work together to make a fairly simple compiling experience.

Basically, the file header is generated by combining the magicHeader (Word), version (Byte), and chunkCount (TwoByte). Then each piece of meta data stored in metaChunks is compiled through a similar process, just looped for each meta chunk. Finally the body is compiled and appended to the final blob.

If the changes from basic Swift aren’t obvious to you, Data underneath everything, is simply an array of UInt8s (or Bytes). While understanding that a UInt32 (or Word) is just 4 UInt8s grouped together, the method to ungroup them is not exactly straightforward. SwiftyBinaryFormatter makes it easy to just append a Word to a Data object, magically handling the ungrouping for you.

Caveats

One thing to keep in mind is that this implementation is only designed as an accumulator – basically, you just keep adding information, never removing or modifying existing data. A more robust system might implement those additional details, but my original needs were only for exporting a binary file to another app. Not to mention that SwiftyBinaryFomatter is kind of designed with the same intention of writing and not reading data (though I wouldn’t be opposed to eventually augmenting that ability either myself or through pull requests – but if you want to make a PR, please dm me somehow first to make sure we are on the same page for implementation details or risk my potential rejection of merging code).
It also bears repeating from the start of this post, it’s entirely possible I am not conforming to best practices or missing out on crucial elements as most of this post and library are made from my own personal observational learnings.

My Brain Hurts

That was a long post. I hope you enjoyed it, or at least learned something. If not, feel free to insult me from afar. You can find the project on GitHub

Leave a Reply

Your email address will not be published. Required fields are marked *