# Introduction
To reduce the barrier of entry and avoid an inconsistent representation of the BICEPS data model in protoSDC, the entire
XML Schema for BICEPS is automatically converted into protobuf as well as additional protoSDC target languages. This
ensures a high degree of compatibility between protoSDC implementations as well as a very low barrier of entry.
## How it works
`proto-converter` introduces an intermediate layer, which takes care of most of the conversion work needed to go from an
inheritance based model to a composition based model. Languages only need to traverse the resulting graph of nodes to
generate the types and parameters, they no longer need to know about XML. The only thing that is still related to
XML is setting builtin and custom types, as they are using QNames.
(generating_proto)=
### Intermediate layer
The intermediate layer is a graph of nodes called `BaseNode`. Each `BaseNode` has a name, children, a `nodeType` and
can hold a language specific information type to specify how a node is generated for a given target language.
`NodeType`s are very basic types and all very much look like something from protobuf:
- `NodeType.Message` represents what is essentially an object, or a message in proto-speak.
- `NodeType.Parameter` is a field within a message, which can point to a message, an enum or a builtin type.
- `NodeType.StringEnumeration` is an enum which can only represent strings, no values whatsoever.
- `NodeType.OneOf` is a collection of parameters in which only one can be present, similar to proto.
- `NodeType.BuiltinType` is a builtin type, as the name suggests. It is essentially the type holding everything built
into XML Schema, such as string, decimal and friends.
Every element on the first level of an XML Schema is recursively converted into `BaseNode`s with `NodeType`s,
simplifying the structure and removing any inheritance.
### Breaking up inheritance
Transforming inheritance into a composition-based model follows very simple rules:
- if type `A` extends type `B`, `B` will become a field of the message `A`
- if the XML model uses an element which has subtypes, replace it with a OneOf which allows for all subtypes of that
element as well as the element itself to be used
### Mapping example
To understand how an XML Schema type is mapped into nodes, let's take a look at an example.
```xml
Bla bla.
Bla bla here.
So much bla bla.
```
`ClockDescriptor` is an element on the root level of the schema, therefore it will become a message. Below that,
a complex content element essentially tells us that `ClockDescriptor` is an extension of
`pm:AbstractDeviceComponentDescriptor`, but adds a `TimeProtocol` field and a `Resolution` attribute.
This then turns into the following tree.
```{mermaid}
graph TD
subgraph clockdescriptor["Message(ClockDescriptor)"]
subgraph children["Children"]
b1["Parameter(name=AbstractDeviceComponentDescriptor)"]
b2["Parameter(name=TimeProtocol, isList=true)"]
b3["Parameter(Resolution)"]
end
end
subgraph AbstractDeviceComponentDescriptor["Message(AbstractDeviceComponentDescriptor)"]
end
subgraph CodedValue["Message(CodedValue)"]
end
subgraph duration["BuiltinType(xsd:Duration)"]
end
b1 -- parameter type --> AbstractDeviceComponentDescriptor
b2 -- parameter type --> CodedValue
b3 -- parameter type --> duration
```
The inheritance is resolved by applying composition and including the extended base type as well as all the extension
type parameters as children. Where base types such as `AbstractState` are used as parameter types within the graph,
they will be replaced by an `AbstractStateOneOf` which can be any of the extension types of the base type, or of
course the base type itself. For example, `AbstractMetricReport` contains a list of `AbstractMetricStateOneOf` elements.
```proto
message AbstractMetricReportMsg {
...
repeated AbstractMetricStateOneOfMsg metric_state = 3;
...
}
```
Now, once this is applied for every XML Schema element, we end up with a list of messages, which are sorted in an order
that allows the language generator to simply traverse the graph in order and always have every previous type resolved.
Notable exceptions occur if there are cycles in the graph, which can happen, and must be handled differently. Types
included in such cycles are marked as being part of a cluster, the consequences of such clusters are language specific.
In protobuf, this simply means that all nodes which are part of the cluster must be generated into the same .proto file.
The proto generator ultimately traverses the resulting graph and attaches its language types to each node.
Every `BaseNode` then has a `languageType` attached in form of a `ProtoType`. These are essentially the same as
`BaseType`s, but they have rules on how to generate protocol buffers schema data attached to them. Finally, once every
node has a `ProtoType` attached, the graph will be traversed a final time, this time writing the output for each child
of the root of the graph into a file, thus resulting in a protobuf conversion of the XML Schema.
```proto
syntax = "proto3";
package org.somda.protosdc.proto.model.biceps;
option java_multiple_files = true;
option java_outer_classname = "ClockDescriptorProto";
import "org/somda/protosdc/proto/model/biceps/abstractdevicecomponentdescriptor.proto";
import "org/somda/protosdc/proto/model/biceps/codedvalue.proto";
import "google/protobuf/duration.proto";
message ClockDescriptorMsg {
AbstractDeviceComponentDescriptorMsg abstract_device_component_descriptor = 1;
repeated CodedValueMsg time_protocol = 2;
google.protobuf.Duration resolution_attr = 3;
}
```
### Generating Kotlin/Rust/*
`proto-converter` provides generators for programming languages as well. The basic principle remains the same as for
{ref}`protobuf` but the output is changed to reflect the needs of the specific target.
This includes, e.g.
- different nesting behavior
- introducing smart pointers to break cycles in the data model
- language specific builtin types
The ClockDescriptor shown in the protobuf example would look like this in Kotlin:
```kotlin
package org.somda.protosdc.model.biceps
import org.somda.protosdc.model.biceps.AbstractDeviceComponentDescriptor
import org.somda.protosdc.model.biceps.CodedValue
import java.time.Duration
data class ClockDescriptor (
val abstractDeviceComponentDescriptor: AbstractDeviceComponentDescriptor,
val timeProtocol: List = listOf(),
val resolutionAttr: Duration? = null,
)
```
## Generating mappers
`proto-converter` can additionally generate mappers for mapping between language specific representations of data
and their protobuf representation. This also reduces the barrier of entry for clean separation of transport data types
and language specific internal representations.
Since these mappers are automatically generated, they always match the current proto and language output the generator
generates. Every supported language stores the information needed to generate the output on the nodes in the graph,
which allows a mapper generator to determine the full layout of the target language.
The task of mapping to and from the protobuf representation is very language specific, as it is necessary to know
how protobuf schema files will be represented when compiled for that language. Field names might change from camel_case
to PascalCase, nested messages might be in modules named after their parent message, or primitive types might not have
an exact representation.
## FAQ
### Isn't everything in proto3 optional? How do you express mandatory fields?
In short: The mappers do that. Non-primitive fields have a presence, which allows the receiver to determine whether a
message field was explicitly set by the sender. When generating the protobuf model, optional primitive fields are
represented by their `*Value` counterparts (`string` -> `StringValue`), which allows for presence checks as well.
Mapping the message into the internal representation then enforces the presence of the mandatory fields as required by
BICEPS.
### BICEPS uses inheritance, protobuf doesn't support that.
Composition works fine.
### What about extensions?
protobuf does allow for extensions by using [Any](https://developers.google.com/protocol-buffers/docs/proto3#any). We
plan on supporting converting BICEPS XML extensions using the `proto-converter`, but due to time constraints, this is
currently untested.
### Are XML restrictions supported?
Not currently, but just like mandatory fields, validation can be integrated into the mapper.