# Introduction To reduce the barrier of entry and avoid an inconsistent representation of the BICEPS data model in protoSDC, the entire XML Schema for BICEPS is automatically converted into protobuf as well as additional protoSDC target languages. This ensures a high degree of compatibility between protoSDC implementations as well as a very low barrier of entry. ## How it works `proto-converter` introduces an intermediate layer, which takes care of most of the conversion work needed to go from an inheritance based model to a composition based model. Languages only need to traverse the resulting graph of nodes to generate the types and parameters, they no longer need to know about XML. The only thing that is still related to XML is setting builtin and custom types, as they are using QNames. (generating_proto)= ### Intermediate layer The intermediate layer is a graph of nodes called `BaseNode`. Each `BaseNode` has a name, children, a `nodeType` and can hold a language specific information type to specify how a node is generated for a given target language. `NodeType`s are very basic types and all very much look like something from protobuf: - `NodeType.Message` represents what is essentially an object, or a message in proto-speak. - `NodeType.Parameter` is a field within a message, which can point to a message, an enum or a builtin type. - `NodeType.StringEnumeration` is an enum which can only represent strings, no values whatsoever. - `NodeType.OneOf` is a collection of parameters in which only one can be present, similar to proto. - `NodeType.BuiltinType` is a builtin type, as the name suggests. It is essentially the type holding everything built into XML Schema, such as string, decimal and friends. Every element on the first level of an XML Schema is recursively converted into `BaseNode`s with `NodeType`s, simplifying the structure and removing any inheritance. ### Breaking up inheritance Transforming inheritance into a composition-based model follows very simple rules: - if type `A` extends type `B`, `B` will become a field of the message `A` - if the XML model uses an element which has subtypes, replace it with a OneOf which allows for all subtypes of that element as well as the element itself to be used ### Mapping example To understand how an XML Schema type is mapped into nodes, let's take a look at an example. ```xml Bla bla. Bla bla here. So much bla bla. ``` `ClockDescriptor` is an element on the root level of the schema, therefore it will become a message. Below that, a complex content element essentially tells us that `ClockDescriptor` is an extension of `pm:AbstractDeviceComponentDescriptor`, but adds a `TimeProtocol` field and a `Resolution` attribute. This then turns into the following tree. ```{mermaid} graph TD subgraph clockdescriptor["Message(ClockDescriptor)"] subgraph children["Children"] b1["Parameter(name=AbstractDeviceComponentDescriptor)"] b2["Parameter(name=TimeProtocol, isList=true)"] b3["Parameter(Resolution)"] end end subgraph AbstractDeviceComponentDescriptor["Message(AbstractDeviceComponentDescriptor)"] end subgraph CodedValue["Message(CodedValue)"] end subgraph duration["BuiltinType(xsd:Duration)"] end b1 -- parameter type --> AbstractDeviceComponentDescriptor b2 -- parameter type --> CodedValue b3 -- parameter type --> duration ``` The inheritance is resolved by applying composition and including the extended base type as well as all the extension type parameters as children. Where base types such as `AbstractState` are used as parameter types within the graph, they will be replaced by an `AbstractStateOneOf` which can be any of the extension types of the base type, or of course the base type itself. For example, `AbstractMetricReport` contains a list of `AbstractMetricStateOneOf` elements. ```proto message AbstractMetricReportMsg { ... repeated AbstractMetricStateOneOfMsg metric_state = 3; ... } ``` Now, once this is applied for every XML Schema element, we end up with a list of messages, which are sorted in an order that allows the language generator to simply traverse the graph in order and always have every previous type resolved. Notable exceptions occur if there are cycles in the graph, which can happen, and must be handled differently. Types included in such cycles are marked as being part of a cluster, the consequences of such clusters are language specific. In protobuf, this simply means that all nodes which are part of the cluster must be generated into the same .proto file. The proto generator ultimately traverses the resulting graph and attaches its language types to each node. Every `BaseNode` then has a `languageType` attached in form of a `ProtoType`. These are essentially the same as `BaseType`s, but they have rules on how to generate protocol buffers schema data attached to them. Finally, once every node has a `ProtoType` attached, the graph will be traversed a final time, this time writing the output for each child of the root of the graph into a file, thus resulting in a protobuf conversion of the XML Schema. ```proto syntax = "proto3"; package org.somda.protosdc.proto.model.biceps; option java_multiple_files = true; option java_outer_classname = "ClockDescriptorProto"; import "org/somda/protosdc/proto/model/biceps/abstractdevicecomponentdescriptor.proto"; import "org/somda/protosdc/proto/model/biceps/codedvalue.proto"; import "google/protobuf/duration.proto"; message ClockDescriptorMsg { AbstractDeviceComponentDescriptorMsg abstract_device_component_descriptor = 1; repeated CodedValueMsg time_protocol = 2; google.protobuf.Duration resolution_attr = 3; } ``` ### Generating Kotlin/Rust/* `proto-converter` provides generators for programming languages as well. The basic principle remains the same as for {ref}`protobuf` but the output is changed to reflect the needs of the specific target. This includes, e.g. - different nesting behavior - introducing smart pointers to break cycles in the data model - language specific builtin types The ClockDescriptor shown in the protobuf example would look like this in Kotlin: ```kotlin package org.somda.protosdc.model.biceps import org.somda.protosdc.model.biceps.AbstractDeviceComponentDescriptor import org.somda.protosdc.model.biceps.CodedValue import java.time.Duration data class ClockDescriptor ( val abstractDeviceComponentDescriptor: AbstractDeviceComponentDescriptor, val timeProtocol: List = listOf(), val resolutionAttr: Duration? = null, ) ``` ## Generating mappers `proto-converter` can additionally generate mappers for mapping between language specific representations of data and their protobuf representation. This also reduces the barrier of entry for clean separation of transport data types and language specific internal representations. Since these mappers are automatically generated, they always match the current proto and language output the generator generates. Every supported language stores the information needed to generate the output on the nodes in the graph, which allows a mapper generator to determine the full layout of the target language. The task of mapping to and from the protobuf representation is very language specific, as it is necessary to know how protobuf schema files will be represented when compiled for that language. Field names might change from camel_case to PascalCase, nested messages might be in modules named after their parent message, or primitive types might not have an exact representation. ## FAQ ### Isn't everything in proto3 optional? How do you express mandatory fields? In short: The mappers do that. Non-primitive fields have a presence, which allows the receiver to determine whether a message field was explicitly set by the sender. When generating the protobuf model, optional primitive fields are represented by their `*Value` counterparts (`string` -> `StringValue`), which allows for presence checks as well. Mapping the message into the internal representation then enforces the presence of the mandatory fields as required by BICEPS. ### BICEPS uses inheritance, protobuf doesn't support that. Composition works fine. ### What about extensions? protobuf does allow for extensions by using [Any](https://developers.google.com/protocol-buffers/docs/proto3#any). We plan on supporting converting BICEPS XML extensions using the `proto-converter`, but due to time constraints, this is currently untested. ### Are XML restrictions supported? Not currently, but just like mandatory fields, validation can be integrated into the mapper.