Introduction
To reduce the barrier of entry and avoid an inconsistent representation of the BICEPS data model in protoSDC, the entire XML Schema for BICEPS is automatically converted into protobuf as well as additional protoSDC target languages. This ensures a high degree of compatibility between protoSDC implementations as well as a very low barrier of entry.
How it works
proto-converter
introduces an intermediate layer, which takes care of most of the conversion work needed to go from an
inheritance based model to a composition based model. Languages only need to traverse the resulting graph of nodes to
generate the types and parameters, they no longer need to know about XML. The only thing that is still related to
XML is setting builtin and custom types, as they are using QNames.
Intermediate layer
The intermediate layer is a graph of nodes called BaseNode
. Each BaseNode
has a name, children, a nodeType
and
can hold a language specific information type to specify how a node is generated for a given target language.
NodeType
s are very basic types and all very much look like something from protobuf:
NodeType.Message
represents what is essentially an object, or a message in proto-speak.NodeType.Parameter
is a field within a message, which can point to a message, an enum or a builtin type.NodeType.StringEnumeration
is an enum which can only represent strings, no values whatsoever.NodeType.OneOf
is a collection of parameters in which only one can be present, similar to proto.NodeType.BuiltinType
is a builtin type, as the name suggests. It is essentially the type holding everything built into XML Schema, such as string, decimal and friends.
Every element on the first level of an XML Schema is recursively converted into BaseNode
s with NodeType
s,
simplifying the structure and removing any inheritance.
Breaking up inheritance
Transforming inheritance into a composition-based model follows very simple rules:
if type
A
extends typeB
,B
will become a field of the messageA
if the XML model uses an element which has subtypes, replace it with a OneOf which allows for all subtypes of that element as well as the element itself to be used
Mapping example
To understand how an XML Schema type is mapped into nodes, let’s take a look at an example.
<xsd:complexType name="ClockDescriptor">
<xsd:annotation>
<xsd:documentation>Bla bla.</xsd:documentation>
</xsd:annotation>
<xsd:complexContent>
<xsd:extension base="pm:AbstractDeviceComponentDescriptor">
<xsd:sequence>
<xsd:element name="TimeProtocol" type="pm:CodedValue" minOccurs="0" maxOccurs="unbounded">
<xsd:annotation>
<xsd:documentation>Bla bla here.</xsd:documentation>
</xsd:annotation>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="Resolution" type="xsd:duration">
<xsd:annotation>
<xsd:documentation>So much bla bla.</xsd:documentation>
</xsd:annotation>
</xsd:attribute>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
ClockDescriptor
is an element on the root level of the schema, therefore it will become a message. Below that,
a complex content element essentially tells us that ClockDescriptor
is an extension of
pm:AbstractDeviceComponentDescriptor
, but adds a TimeProtocol
field and a Resolution
attribute.
This then turns into the following tree.
The inheritance is resolved by applying composition and including the extended base type as well as all the extension
type parameters as children. Where base types such as AbstractState
are used as parameter types within the graph,
they will be replaced by an AbstractStateOneOf
which can be any of the extension types of the base type, or of
course the base type itself. For example, AbstractMetricReport
contains a list of AbstractMetricStateOneOf
elements.
message AbstractMetricReportMsg {
...
repeated AbstractMetricStateOneOfMsg metric_state = 3;
...
}
Now, once this is applied for every XML Schema element, we end up with a list of messages, which are sorted in an order that allows the language generator to simply traverse the graph in order and always have every previous type resolved. Notable exceptions occur if there are cycles in the graph, which can happen, and must be handled differently. Types included in such cycles are marked as being part of a cluster, the consequences of such clusters are language specific. In protobuf, this simply means that all nodes which are part of the cluster must be generated into the same .proto file.
The proto generator ultimately traverses the resulting graph and attaches its language types to each node.
Every BaseNode
then has a languageType
attached in form of a ProtoType
. These are essentially the same as
BaseType
s, but they have rules on how to generate protocol buffers schema data attached to them. Finally, once every
node has a ProtoType
attached, the graph will be traversed a final time, this time writing the output for each child
of the root of the graph into a file, thus resulting in a protobuf conversion of the XML Schema.
syntax = "proto3";
package org.somda.protosdc.proto.model.biceps;
option java_multiple_files = true;
option java_outer_classname = "ClockDescriptorProto";
import "org/somda/protosdc/proto/model/biceps/abstractdevicecomponentdescriptor.proto";
import "org/somda/protosdc/proto/model/biceps/codedvalue.proto";
import "google/protobuf/duration.proto";
message ClockDescriptorMsg {
AbstractDeviceComponentDescriptorMsg abstract_device_component_descriptor = 1;
repeated CodedValueMsg time_protocol = 2;
google.protobuf.Duration resolution_attr = 3;
}
Generating Kotlin/Rust/*
proto-converter
provides generators for programming languages as well. The basic principle remains the same as for
protobuf but the output is changed to reflect the needs of the specific target.
This includes, e.g.
different nesting behavior
introducing smart pointers to break cycles in the data model
language specific builtin types
The ClockDescriptor shown in the protobuf example would look like this in Kotlin:
package org.somda.protosdc.model.biceps
import org.somda.protosdc.model.biceps.AbstractDeviceComponentDescriptor
import org.somda.protosdc.model.biceps.CodedValue
import java.time.Duration
data class ClockDescriptor (
val abstractDeviceComponentDescriptor: AbstractDeviceComponentDescriptor,
val timeProtocol: List<CodedValue> = listOf(),
val resolutionAttr: Duration? = null,
)
Generating mappers
proto-converter
can additionally generate mappers for mapping between language specific representations of data
and their protobuf representation. This also reduces the barrier of entry for clean separation of transport data types
and language specific internal representations.
Since these mappers are automatically generated, they always match the current proto and language output the generator generates. Every supported language stores the information needed to generate the output on the nodes in the graph, which allows a mapper generator to determine the full layout of the target language. The task of mapping to and from the protobuf representation is very language specific, as it is necessary to know how protobuf schema files will be represented when compiled for that language. Field names might change from camel_case to PascalCase, nested messages might be in modules named after their parent message, or primitive types might not have an exact representation.
FAQ
Isn’t everything in proto3 optional? How do you express mandatory fields?
In short: The mappers do that. Non-primitive fields have a presence, which allows the receiver to determine whether a
message field was explicitly set by the sender. When generating the protobuf model, optional primitive fields are
represented by their *Value
counterparts (string
-> StringValue
), which allows for presence checks as well.
Mapping the message into the internal representation then enforces the presence of the mandatory fields as required by
BICEPS.
BICEPS uses inheritance, protobuf doesn’t support that.
Composition works fine.
What about extensions?
protobuf does allow for extensions by using Any. We
plan on supporting converting BICEPS XML extensions using the proto-converter
, but due to time constraints, this is
currently untested.
Are XML restrictions supported?
Not currently, but just like mandatory fields, validation can be integrated into the mapper.