Schema Evolution
Yardl provides support for evolving data models while maintaining compatibility across versions.
When deploying protocol readers and writers generated from the current version of your schema, you may need the ability to read and write older versions of your data. Conversely, you may want to update older versions of your software to read and write the current version.
Yardl supports cross-version compatibility by statically generating conversions to and from other versions of your schema (i.e. at yardl codegen time).
In the future, yardl will support dynamic compatibility between schema versions, enabling backward- and forward-compatibility between any schema version at runtime.
Currently, schema evolution is only supported for the binary encoding format using C++. In the future, yardl will support schema evolution for other encodings and for Python.
Statically-Generated Compatibility
Yardl generates static conversion code for reading and writing data under different versions of your schema. This means, for example, protocol readers and writers for schema version "X" can read and write data for schema version "Y" if the package file for version "X" references version "Y" at codegen time. To generate code compatible with multiple schema versions, you must reference each version in the versions
section of your package file, e.g.
# filename: _package.yml
namespace: MyProject
versions:
Y: https://github.com/username/myproject/model/versionY
cpp:
sourcesOutputDir: ../src/versionX/generated
...
# filename: _package.yml
namespace: MyProject
versions:
Y: https://github.com/username/myproject/model/versionY
cpp:
sourcesOutputDir: ../src/versionX/generated
...
Yardl does NOT support dynamic conversions between unknown schema versions. This means the code yardl generates for your model is only compatible with the versions explicitly referenced in your package file.
If you need existing protocol readers/writers to handle new data, you will need to re-generate and re-compile your code with the newer schema definition.
Example: Reading and Writing to other versions
Imagine you've already deployed the first version of your schema, v1
, such that software on a system is using the protocol readers and writers generated by yardl for v1
.
Now you've made changes to your data model and you're ready to release a new version of the schema.
First, point your schema package file to the version that has already been deployed, as shown above.
Next, finalize updates to your schema and generate the code with yardl generate
.
Now you have protocol readers and writers that can communicate with the software that was deployed with the v1
schema!
To write a protocol such that it can be read by v1
, instantiate a ProtocolWriter
with the version v1
, e.g.
::binary::MyProtocolWriter w(std::cout, my_project::Version::v1);
::binary::MyProtocolWriter w(std::cout, my_project::Version::v1);
If you need the deployed software to be able to read and write the new schema version, you must re-generate the code (as demonstrated above), re-compile, and re-deploy it.
Automated Change Detection
For each referenced version of your schema, yardl automatically determines:
- How the schema changed, and
- Whether the changes are compatible.
Automated change detection is performed by:
- Symbolically matching named types across each version of the schema
- Comparing semantically equivalent "base" type definitions
- Generating compatibility serializers for all type definitions that changed between versions
- Comparing protocol sequences step by step to validate each change and generate appropriate type conversions
In the future, yardl will allow you to explicitly define how your schema is meant to evolve, enabling non-trivial type transformations.
Example: Renaming a Record
To rename a record (or any other type definition), introduce a new alias to match the name in the previous version.
Previous Version:
MyProtocol: !protocol
sequence:
people: !stream
items: MyRecord
MyRecord: !record
fields:
firstName: string
lastName: string
MyProtocol: !protocol
sequence:
people: !stream
items: MyRecord
MyRecord: !record
fields:
firstName: string
lastName: string
Current Version:
MyProtocol: !protocol
sequence:
people: !stream
items: Person
Person: !record
fields:
firstName: string
lastName: string
age: int
MyRecord: Person
MyProtocol: !protocol
sequence:
people: !stream
items: Person
Person: !record
fields:
firstName: string
lastName: string
age: int
MyRecord: Person
In this example, the record was previously named MyRecord
, which is not very descriptive. In the current version, we renamed it to Person
, and also added the new alias MyRecord = Person
, which tells yardl that the two records are semantically equivalent.
Compatibility
Yardl recursively detects changes to named types, fields, and protocol steps, classifying each granular change into one of three categories:
- Compatible
- Partially-compatible
- Incompatible
Compatible Changes
Compatible changes are fully supported by yardl. Examples include:
- Adding streams, vectors, and/or optional steps to a Protocol sequence
- Adding or removing optional fields
- Adding or removing aliases to types
- Reordering Record fields
Partially-Compatible Changes
Partially-compatible changes are valid, but may result in errors at runtime, depending on the data you serialize for older versions of your Protocol. Examples include:
- Changing between primitive types, including integers, floating point values, and strings
- Making a field optional
- Changing an optional field to a union, and vice versa
- Adding or removing non-optional fields to/from a Record
- Adding or removing types to/from a Union
yardl will emit a warning for each of these types of changes.
Example: Adding a new type to a Protocol stream
Here, we've upgraded our schema to enable streaming many more data types.
Tip: Use an alias (e.g. StreamItem
in the example below) to easily add new types in the future
Previous Version:
ImageFloat: Image<float>
StreamItem: ImageFloat
MyProtocol: !protocol
sequence:
data: !stream
items: StreamItem
ImageFloat: Image<float>
StreamItem: ImageFloat
MyProtocol: !protocol
sequence:
data: !stream
items: StreamItem
Current Version, with new types added to MyProtocol.data
stream:
Acquisition: !record
...
WaveformUint32: Waveform<uint32>
ImageInt16: Image<int16>
ImageFloat: Image<float>
ImageComplexDouble: Image<complexdouble>
StreamItem: [Acquisition, WaveformUint32, ImageInt16, ImageFloat, ImageComplexDouble]
MyProtocol: !protocol
sequence:
data: !stream
items: StreamItem
Acquisition: !record
...
WaveformUint32: Waveform<uint32>
ImageInt16: Image<int16>
ImageFloat: Image<float>
ImageComplexDouble: Image<complexdouble>
StreamItem: [Acquisition, WaveformUint32, ImageInt16, ImageFloat, ImageComplexDouble]
MyProtocol: !protocol
sequence:
data: !stream
items: StreamItem
This change is valid because the old stream type, ImageFloat
, is still one of the possible stream types for MyProtocol.data
in the new version. Now, using the current version, we can read/write a variety of data types to/from the MyProtocol.data
stream.
When writing to the previous schema, however, a runtime error will occur if attempting to write a type other than ImageFloat
.
Default conversion
Some partially-compatible changes use default type conversions, for example:
- Converting between numbers and strings in C++ relies on the standard library numeric parsing utilities
- Converting floating point numbers to integers may
round
to the nearest whole number.
In the future, yardl will parse user-defined type conversions for each schema version.
Default zero values
In instances where a partially-compatible change may result in invalid values at runtime, yardl defaults to the "zero" value for a type, e.g. 0
for numbers, ""
for string, empty vectors, null Optional/Union, etc.
Example: Making it Optional
Here we make a previously "required" description
field optional, signaling that MyProtocol
doesn't always have a "description".
Previous Version:
MyProtocol: !protocol
sequence:
description: string
MyProtocol: !protocol
sequence:
description: string
Latest Version:
MyProtocol: !protocol
sequence:
description: string?
MyProtocol: !protocol
sequence:
description: string?
Now, when description
is empty, its value is just null
, because its type is Optional string. To maintain compatibility with software using the older version of the schema, however, when description
is empty, yardl will write the empty string ""
.
Runtime errors
In instances where a partially-compatible change results in an invalid conversion at runtime, yardl may emit a runtime error, for example:
- Numeric overflow when converting between numbers
- Failure to convert a number to a string
- Incompatible union case detected (e.g. a union case that was removed in the current version of the schema)
Incompatible Changes
Changes for which yardl (currently) cannot generate valid code include:
- Removing or reordering any steps in a Protocol sequence
- Changing enum/flag definitions
- Changing a scalar type to a vector or array
- Changing the number of generic type parameters on a type definition
- Changing the type arguments to a generic type
Detecting these types of changes will cause yardl to emit one or more errors and stop.
In the future, yardl may support compatibility for a subset of the above changes.