Schema Evolution

Yardl provides support for evolving data models while maintaining compatibility across versions.

When deploying protocol readers and writers generated from the current version of your schema, you may need the ability to read and write older versions of your data. Conversely, you may want to update older versions of your software to read and write the current version.

Yardl supports cross-version compatibility by statically generating conversions to and from other versions of your schema (i.e. at yardl codegen time).

In the future, yardl will support dynamic compatibility between schema versions, enabling backward- and forward-compatibility between any schema version at runtime.

Currently, schema evolution is only supported for the binary encoding format using C++. In the future, yardl will support schema evolution for other encodings and for Python.

Statically-Generated Compatibility

Yardl generates static conversion code for reading and writing data under different versions of your schema. This means, for example, protocol readers and writers for schema version "X" can read and write data for schema version "Y" if the package file for version "X" references version "Y" at codegen time. To generate code compatible with multiple schema versions, you must reference each version in the versions section of your package file, e.g.

yaml

# filename: _package.yml

namespace: MyProject

versions:
  Y: https://github.com/username/myproject/model/versionY

cpp:
  sourcesOutputDir: ../src/versionX/generated
...

Yardl does NOT support dynamic conversions between unknown schema versions. This means the code yardl generates for your model is only compatible with the versions explicitly referenced in your package file.

If you need existing protocol readers/writers to handle new data, you will need to re-generate and re-compile your code with the newer schema definition.

Example: Reading and Writing to other versions

Imagine you've already deployed the first version of your schema, v1, such that software on a system is using the protocol readers and writers generated by yardl for v1.

Now you've made changes to your data model and you're ready to release a new version of the schema.

First, point your schema package file to the version that has already been deployed, as shown above.

Next, finalize updates to your schema and generate the code with yardl generate.

Now you have protocol readers and writers that can communicate with the software that was deployed with the v1 schema!

To write a protocol such that it can be read by v1, instantiate a ProtocolWriter with the version v1, e.g.

cpp

::binary::MyProtocolWriter w(std::cout, my_project::Version::v1);

If you need the deployed software to be able to read and write the new schema version, you must re-generate the code (as demonstrated above), re-compile, and re-deploy it.

Automated Change Detection

For each referenced version of your schema, yardl automatically determines:

How the schema changed, and
Whether the changes are compatible.

Automated change detection is performed by:

Symbolically matching named types across each version of the schema
Comparing semantically equivalent "base" type definitions
Generating compatibility serializers for all type definitions that changed between versions
Comparing protocol sequences step by step to validate each change and generate appropriate type conversions

In the future, yardl will allow you to explicitly define how your schema is meant to evolve, enabling non-trivial type transformations.

Example: Renaming a Record

To rename a record (or any other type definition), introduce a new alias to match the name in the previous version.

Previous Version:

yaml

MyProtocol: !protocol
  sequence:
    people: !stream
      items: MyRecord

MyRecord: !record
  fields:
    firstName: string
    lastName: string

Current Version:

yaml

MyProtocol: !protocol
  sequence:
    people: !stream
      items: Person

Person: !record
  fields:
    firstName: string
    lastName: string
    age: int

MyRecord: Person

In this example, the record was previously named MyRecord, which is not very descriptive. In the current version, we renamed it to Person, and also added the new alias MyRecord = Person, which tells yardl that the two records are semantically equivalent.

Compatibility

Yardl recursively detects changes to named types, fields, and protocol steps, classifying each granular change into one of three categories:

Compatible
Partially-compatible
Incompatible

Compatible Changes

Compatible changes are fully supported by yardl. Examples include:

Adding streams, vectors, and/or optional steps to a Protocol sequence
Adding or removing optional fields
Adding or removing aliases to types
Reordering Record fields

Partially-Compatible Changes

Partially-compatible changes are valid, but may result in errors at runtime, depending on the data you serialize for older versions of your Protocol. Examples include:

Changing between primitive types, including integers, floating point values, and strings
Making a field optional
Changing an optional field to a union, and vice versa
Adding or removing non-optional fields to/from a Record
Adding or removing types to/from a Union

yardl will emit a warning for each of these types of changes.

Example: Adding a new type to a Protocol stream

Here, we've upgraded our schema to enable streaming many more data types.

Tip: Use an alias (e.g. StreamItem in the example below) to easily add new types in the future

Previous Version:

yaml

ImageFloat: Image<float>
StreamItem: ImageFloat

MyProtocol: !protocol
  sequence:
    data: !stream
      items: StreamItem

Current Version, with new types added to MyProtocol.data stream:

yaml

Acquisition: !record
  ...
WaveformUint32: Waveform<uint32>
ImageInt16: Image<int16>
ImageFloat: Image<float>
ImageComplexDouble: Image<complexdouble>

StreamItem: [Acquisition, WaveformUint32, ImageInt16, ImageFloat, ImageComplexDouble]

MyProtocol: !protocol
  sequence:
    data: !stream
      items: StreamItem

This change is valid because the old stream type, ImageFloat, is still one of the possible stream types for MyProtocol.data in the new version. Now, using the current version, we can read/write a variety of data types to/from the MyProtocol.data stream.

When writing to the previous schema, however, a runtime error will occur if attempting to write a type other than ImageFloat.

Default conversion

Some partially-compatible changes use default type conversions, for example:

Converting between numbers and strings in C++ relies on the standard library numeric parsing utilities
Converting floating point numbers to integers may round to the nearest whole number.

In the future, yardl will parse user-defined type conversions for each schema version.

Default zero values

In instances where a partially-compatible change may result in invalid values at runtime, yardl defaults to the "zero" value for a type, e.g. 0 for numbers, "" for string, empty vectors, null Optional/Union, etc.

Example: Making it Optional

Here we make a previously "required" description field optional, signaling that MyProtocol doesn't always have a "description".

Previous Version:

yaml

MyProtocol: !protocol
  sequence:
    description: string

Latest Version:

yaml

MyProtocol: !protocol
  sequence:
    description: string?

Now, when description is empty, its value is just null, because its type is Optional string. To maintain compatibility with software using the older version of the schema, however, when description is empty, yardl will write the empty string "".

Runtime errors

In instances where a partially-compatible change results in an invalid conversion at runtime, yardl may emit a runtime error, for example:

Numeric overflow when converting between numbers
Failure to convert a number to a string
Incompatible union case detected (e.g. a union case that was removed in the current version of the schema)

Incompatible Changes

Changes for which yardl (currently) cannot generate valid code include:

Removing or reordering any steps in a Protocol sequence
Changing enum/flag definitions
Changing a scalar type to a vector or array
Changing the number of generic type parameters on a type definition
Changing the type arguments to a generic type

Detecting these types of changes will cause yardl to emit one or more errors and stop.

In the future, yardl may support compatibility for a subset of the above changes.

Schema Evolution ​

Statically-Generated Compatibility ​

Example: Reading and Writing to other versions ​

Automated Change Detection ​

Example: Renaming a Record ​

Compatibility ​

Compatible Changes ​

Partially-Compatible Changes ​

Example: Adding a new type to a Protocol stream ​

Default conversion ​

Default zero values ​

Example: Making it Optional ​

Runtime errors ​

Incompatible Changes ​

Schema Evolution

Statically-Generated Compatibility

Example: Reading and Writing to other versions

Automated Change Detection

Example: Renaming a Record

Compatibility

Compatible Changes

Partially-Compatible Changes

Example: Adding a new type to a Protocol stream

Default conversion

Default zero values

Example: Making it Optional

Runtime errors

Incompatible Changes