BlackBox: binary protocol processor code generator.

Basic ideas of the  BlackBox  .

Manual writing of serialization and data deserialization code, especially between heterogeneous devices in different programming languages, is a time- consuming process, often resulting in hard-to-eliminate errors. The most efficient solution to this problem is to create a DSL – by means of which formally describe the protocol, and then create a program that generates the source code based on this description for various target platforms, in the programming languages required. Examples of ready solutions in this manner are many.

Protocol Buffers 
Cap’n Proto 
FlatBuffers 
ZCM 
MAVLink 
Thrift

Having studied these, and many other ways of implementation, I have decided to create a system that will implement and complement the merits, eliminating the discovered shortcomings.

 

The language that describes an exchange protocol of the BlackBox  is based on constructions of the popular programming language JAVA.  JAVA has a great number of convenient IDEs. Such as Android Studio  for instance.  

In general, the system and the interconnection of components looks like this:

At the highest level of the hierarchy, the devices ( Hosts ) that can receive, process, and send packets are described as JAVA class construction.

In the implements statement, you are requesting the programming languages in which you want to generate the source code.

The host can include several exchange interfaces that are described by means of the interface statement.

The interface combines a set of packages (described by the class construct ). The interface extends statement allows the interface to inherit multiple packets from other interfaces.

The package description class contains only fields that describe the data transmitted by the package.

  • The implements construct on the package description class, allows selectively insert the package into interfaces listed after implements statement.
  • The extends statement of the package description class allows the package to inherit fields from other packages.

Packages names must be unique.

The  BlackBox   allows to describe constants:
The Named set of constants with unique integer values are described through fields of the enum construct. Unique values (within the enum)  of such constants are assigned automatically.
In the case of necessity to have control over integer values of constants, they are described in the form of static final fields within enum with initialization by required values.
Annotation  @BitFlags  on enum allows specifying that these constants are bit flags. This affects the generated values of the constants as well as the code generated for their processing.
At the same time,  BlackBox  provides a more convenient substitution for bit flags: bit fields.
enums can be used as reference data types of package fields. The enum should be declared at the root of the description file, outside of any context.
Ordinary individual constants, including non-integer values, if necessary, can be declared as static final fields directly in the package declaration class.   

Attention!  While a source code generation in JAVA named constants is transformed into annotated primitives, named SlimEnum. For convenient work with them,  the plug-in SlimEnum has been created for IntelliJ IDE. Do not forget to install it from IntelliJ plugin repository or directly download SlimEnum.jar.

The annotations of the fields of the package  @A, @V, @X, @I  are store the meta information about the pattern of changing field values. Based on this annotation information code generator induce an algorithm Base 128 Varint   (used, for example, in the protocol buffer), which allows tangibly and with minimal load compress the sending data. This is achieved by excluding the older, not filled bits on the transmitting side and restoring them on the receiving side.     

The graph shows the dependence of the number of bytes on the value of the transferred value

 

The data sent between nodes can be divided into two types.

  1. Random values, uniformly distributed in all the range. Almost a noise.

    Such data is transmitted as it is, compression or any encoding of this type of data is a waste of computing resources. Fields with that type of values have no any annotations or have one described by   @I
  2. Data that have some pattern/gradient in value’s changes.

These fields are annotated with  @A, @V, @X    denote three variants of values distribution relative to the most probable.  

Rare fluctuations are possible only in the direction of bigger values relative to most probable value val.
Rare fluctuations are possible in both directions relative to most probable value val.
Fluctuations are possible only in the direction of smaller values relative to most probable value val.

The most probable value  – val is passed as an argument to annotation.

Examples of using data distribution annotations

Language construct

Description

@I byte field mandatory field, the field data before sending is not encoded (poorly compressible), and can take values in the range from -128 to 127
@A byte field mandatory field, the data is compressed, the field can take values in the range from 0 to 255 .  In fact it is an analogy to the type uint8_t in C.
@I (-1000) byte field mandatory field, (not to be compressed), the field can take values in the range from -1128 to -873
@X_ short  field The nullable(optional) field takes values in the range
from -32 768 to 32 767 . will be compressed on sending.
@A (1000)  short field nullable(optional)  field takes a value between – 65,535  to 0 will be compressed on sending.
@V_ short field nullable field takes a value between    –65 535  to 0
 will be compressed on sending.
@I(-11 / 75) short field Required field with uniformly distributed values in the specified range.

 

Description of fields with arrays

The following annotations are used to describe arrays

@D This annotation type denotes the array with predefined dimensions and all space for data is allocated in advance. Used in a case when it is known that the array is most likely to be completely filled with data. Even if the data is not set – the space for it is allocated, but there is no resource wasted on tracking the fullness of the data.
@D_ This annotation type denotes the array with predefined parameters, but the space for data, have set within predefined limits, is allocated only when data inserted. Used for sparse arrays, when it is known that the array is most likely to be poorly filled. There are additional costs associated with tracking the fullness of the data.

 

 

Language construct

Description

@D(1 | 2 | 3)  int  field1; Mandatory multidimensional array with predefined dimensions
1 x 2 x 3.
Returns primitives.
@D(1 | 2 | 3)  int []  field1; Multidimensional array with predefined dimensions
1 x 2
Returns arrays of predefined length 3.
@D(1 | 2 | -3)  int []  field1; A multidimensional array with predefined dimensions
1 x 2
Returns arrays of equal,  variable (  1 to  3 ) length
@D(1 | 2 | ~3)  int []  field1; Multidimensional array with predefined dimensions
1 x 2
Returns arrays of different, variable (  1 to  3 ) length
@A @D1 | 2 | 3 ) byte field Required field multidimensional array with predefined dimensions
1 x 2 x 3.

Returns primitives with uneven distribution of values upward.
@A_ @D1 | 2 | 3 ) byte field Optional field is a multidimensional array with predefined dimensions of 1 x 2 x 3.  When an array is created, all the necessary space is allocated.
Returns primitives with unequal distribution of values upward.
@A(337) String field Returns a string with a maximum length of 337 2-bytes per characters.
(An annotation  @V  denote a string 1-byte characters string)
@V String [] field Returns the contents of the string – an array of single-byte characters. The maximum length of lines is up to 127 characters.
@X_(3 / 45) @D( 12) byte [] field Optional field returns an array of a predefined length  12. The values of the array are in a given range, with uneven distribution in both directions relative to the middle of the range.
@D(-45) int [] field Optional field.
Returns an array of lengths from 1 to 45
@B) byte field Mandatory bit field. Field length 3 bits
@B_12 | 67 ) byte field Optional bit field. The length of the field in bits will be calculated based on the transmitted range of allowed values.
@D_(1 | -2 | -3)  int  field1; A multidimensional array with a predetermined first dimension while other dimensions are variable. The place for the data, within the maximum values of the dimensions, is allocated only as it is added to the array.
Return primitives.

In addition to optimizing traffic,  BlackBox  allows you to consider the topology, quality of the data transmission channels and generate the appropriate source code.  The communication channels between hosts are described through the top level class declaration. It contains:

  1. the type of the protocol of exchange
  2. the list of two connected interfaces by which the devices interact via the given channel.

  • The SimpleProtocol protocol type means that packet data transmitted directly without any transformation. This type can be used to transfer data over transport protocol that have built-in error protection like  TCP/IP or local, low-noise, high-quality channels such as synchronous  SPI / I2C buses, or for simple dump data from memory to the file, for later restore.
  • For unreliable, noisy channels with a high probability of errors, like  UART  over a radio, an improved, Noise-protected version of the AdvancedProtocol protocol is used. It used CRC16 and Byte(0x55) stuffing framing for fast recovery after an error.

Everything in all description file looks like the following:

you can download LedBlinkProject.java and use it as a template.