Skip to content

Schema Evolution

Yardl provides support for evolving data models while maintaining compatibility across versions.

When deploying protocol readers and writers generated from the current version of your schema, you may need the ability to read and write older versions of your data. Conversely, you may want to update older versions of your software to read and write the current version.

Yardl supports cross-version compatibility by statically generating conversions to and from other versions of your schema (i.e. at yardl codegen time).

In the future, yardl will support dynamic compatibility between schema versions, enabling backward- and forward-compatibility between any schema version at runtime.

Currently, schema evolution is only supported for the binary encoding format using C++. In the future, yardl will support schema evolution for other encodings and for Python.

Statically-Generated Compatibility

Yardl generates static conversion code for reading and writing data under different versions of your schema. This means, for example, protocol readers and writers for schema version "X" can read and write data for schema version "Y" if the package file for version "X" references version "Y" at codegen time. To generate code compatible with multiple schema versions, you must reference each version in the versions section of your package file, e.g.

yaml
# filename: _package.yml

namespace: MyProject

versions:
  Y: https://github.com/username/myproject/model/versionY

cpp:
  sourcesOutputDir: ../src/versionX/generated
...
# filename: _package.yml

namespace: MyProject

versions:
  Y: https://github.com/username/myproject/model/versionY

cpp:
  sourcesOutputDir: ../src/versionX/generated
...

Yardl does NOT support dynamic conversions between unknown schema versions. This means the code yardl generates for your model is only compatible with the versions explicitly referenced in your package file.

If you need existing protocol readers/writers to handle new data, you will need to re-generate and re-compile your code with the newer schema definition.

Example: Reading and Writing to other versions

Imagine you've already deployed the first version of your schema, v1, such that software on a system is using the protocol readers and writers generated by yardl for v1.

Now you've made changes to your data model and you're ready to release a new version of the schema.

First, point your schema package file to the version that has already been deployed, as shown above.

Next, finalize updates to your schema and generate the code with yardl generate.

Now you have protocol readers and writers that can communicate with the software that was deployed with the v1 schema!

To write a protocol such that it can be read by v1, instantiate a ProtocolWriter with the version v1, e.g.

cpp
::binary::MyProtocolWriter w(std::cout, my_project::Version::v1);
::binary::MyProtocolWriter w(std::cout, my_project::Version::v1);

If you need the deployed software to be able to read and write the new schema version, you must re-generate the code (as demonstrated above), re-compile, and re-deploy it.

Automated Change Detection

For each referenced version of your schema, yardl automatically determines:

  1. How the schema changed, and
  2. Whether the changes are compatible.

Automated change detection is performed by:

  1. Symbolically matching named types across each version of the schema
  2. Comparing semantically equivalent "base" type definitions
  3. Generating compatibility serializers for all type definitions that changed between versions
  4. Comparing protocol sequences step by step to validate each change and generate appropriate type conversions

In the future, yardl will allow you to explicitly define how your schema is meant to evolve, enabling non-trivial type transformations.

Example: Renaming a Record

To rename a record (or any other type definition), introduce a new alias to match the name in the previous version.

Previous Version:

yaml
MyProtocol: !protocol
  sequence:
    people: !stream
      items: MyRecord

MyRecord: !record
  fields:
    firstName: string
    lastName: string
MyProtocol: !protocol
  sequence:
    people: !stream
      items: MyRecord

MyRecord: !record
  fields:
    firstName: string
    lastName: string

Current Version:

yaml
MyProtocol: !protocol
  sequence:
    people: !stream
      items: Person

Person: !record
  fields:
    firstName: string
    lastName: string
    age: int

MyRecord: Person
MyProtocol: !protocol
  sequence:
    people: !stream
      items: Person

Person: !record
  fields:
    firstName: string
    lastName: string
    age: int

MyRecord: Person

In this example, the record was previously named MyRecord, which is not very descriptive. In the current version, we renamed it to Person, and also added the new alias MyRecord = Person, which tells yardl that the two records are semantically equivalent.

Compatibility

Yardl recursively detects changes to named types, fields, and protocol steps, classifying each granular change into one of three categories:

  1. Compatible
  2. Partially-compatible
  3. Incompatible

Compatible Changes

Compatible changes are fully supported by yardl. Examples include:

  1. Adding streams, vectors, and/or optional steps to a Protocol sequence
  2. Adding or removing optional fields
  3. Adding or removing aliases to types
  4. Reordering Record fields

Partially-Compatible Changes

Partially-compatible changes are valid, but may result in errors at runtime, depending on the data you serialize for older versions of your Protocol. Examples include:

  1. Changing between primitive types, including integers, floating point values, and strings
  2. Making a field optional
  3. Changing an optional field to a union, and vice versa
  4. Adding or removing non-optional fields to/from a Record
  5. Adding or removing types to/from a Union

yardl will emit a warning for each of these types of changes.

Example: Adding a new type to a Protocol stream

Here, we've upgraded our schema to enable streaming many more data types.

Tip: Use an alias (e.g. StreamItem in the example below) to easily add new types in the future

Previous Version:

yaml
ImageFloat: Image<float>
StreamItem: ImageFloat

MyProtocol: !protocol
  sequence:
    data: !stream
      items: StreamItem
ImageFloat: Image<float>
StreamItem: ImageFloat

MyProtocol: !protocol
  sequence:
    data: !stream
      items: StreamItem

Current Version, with new types added to MyProtocol.data stream:

yaml
Acquisition: !record
  ...
WaveformUint32: Waveform<uint32>
ImageInt16: Image<int16>
ImageFloat: Image<float>
ImageComplexDouble: Image<complexdouble>

StreamItem: [Acquisition, WaveformUint32, ImageInt16, ImageFloat, ImageComplexDouble]

MyProtocol: !protocol
  sequence:
    data: !stream
      items: StreamItem
Acquisition: !record
  ...
WaveformUint32: Waveform<uint32>
ImageInt16: Image<int16>
ImageFloat: Image<float>
ImageComplexDouble: Image<complexdouble>

StreamItem: [Acquisition, WaveformUint32, ImageInt16, ImageFloat, ImageComplexDouble]

MyProtocol: !protocol
  sequence:
    data: !stream
      items: StreamItem

This change is valid because the old stream type, ImageFloat, is still one of the possible stream types for MyProtocol.data in the new version. Now, using the current version, we can read/write a variety of data types to/from the MyProtocol.data stream.

When writing to the previous schema, however, a runtime error will occur if attempting to write a type other than ImageFloat.

Default conversion

Some partially-compatible changes use default type conversions, for example:

  • Converting between numbers and strings in C++ relies on the standard library numeric parsing utilities
  • Converting floating point numbers to integers may round to the nearest whole number.

In the future, yardl will parse user-defined type conversions for each schema version.

Default zero values

In instances where a partially-compatible change may result in invalid values at runtime, yardl defaults to the "zero" value for a type, e.g. 0 for numbers, "" for string, empty vectors, null Optional/Union, etc.

Example: Making it Optional

Here we make a previously "required" description field optional, signaling that MyProtocol doesn't always have a "description".

Previous Version:

yaml
MyProtocol: !protocol
  sequence:
    description: string
MyProtocol: !protocol
  sequence:
    description: string

Latest Version:

yaml
MyProtocol: !protocol
  sequence:
    description: string?
MyProtocol: !protocol
  sequence:
    description: string?

Now, when description is empty, its value is just null, because its type is Optional string. To maintain compatibility with software using the older version of the schema, however, when description is empty, yardl will write the empty string "".

Runtime errors

In instances where a partially-compatible change results in an invalid conversion at runtime, yardl may emit a runtime error, for example:

  • Numeric overflow when converting between numbers
  • Failure to convert a number to a string
  • Incompatible union case detected (e.g. a union case that was removed in the current version of the schema)

Incompatible Changes

Changes for which yardl (currently) cannot generate valid code include:

  1. Removing or reordering any steps in a Protocol sequence
  2. Changing enum/flag definitions
  3. Changing a scalar type to a vector or array
  4. Changing the number of generic type parameters on a type definition
  5. Changing the type arguments to a generic type

Detecting these types of changes will cause yardl to emit one or more errors and stop.

In the future, yardl may support compatibility for a subset of the above changes.