Proposal for a high-performance storage system for OSMIC branching histories

Introduction:

John Ohno
4 min readJan 7, 2019

--

OSMIC is a tree versioning system proposed & used as part of Project Xanadu®. This document proposes an alternative, compacted, high-speed storage system for OSMIC ledgers. For an introduction to OSMIC & the Enfilade, see my introduction to Xanadu technologies. Alternatively, you may look at my proof of concept implementations of OSMIC and the Enfilade.

Abstract:

The storage system consists of a binary ledger, an append-only permascroll, a reverse index of hashes of large chunks of text (organized first by length), and an index & reverse index for translating OSMIC numbers into offsets into the ledger.

Binary ledger:

The binary ledger consists of three fixed-length integers: a length, an insertion offset into the versioned blob (concatext), and a start offset into the permascroll.

The length is a signed 16-bit integer — negative for a delete operation; the other two are unsigned 64-bit integers.

Permascroll:

The permascroll contains all historically-inserted text concatenated together in insertion order.

Reverse hash index:

For long chunks of text (where length is more than some n times the length of the output of some fast hashing algorithm), we avoid double-inserting strings into the permascroll by checking them against a reverse hash index. This index consists of k buckets organized by size. Each bucket associates a hashed string to its position in the concatext. For long strings, the maximum prefix for each bucket size will be filed into those buckets.

The value of n must be calculated based on the cost of hashing & index lookup. I would start with n=3 as a potentially-reasonable value.

The value of k must be calculated based on the likelihood of collision for a given number of hashes. The hashing algorithm should be fast and produce relatively small output, in order to ensure performance, so collision is possible at smaller sizes. (Cryptographic hashes are not reasonable in this context.)

OSMIC number to ledger offset index:

This index should be an enfilade with 32-bit integer indices.

--

--

John Ohno

Resident hypertext crank. Author of Big and Small Computing: Trajectories for the Future of Software. http://www.lord-enki.net