Avro, being a schema-based serialization utility, accepts schemas as input. In spite of various schemas being available, Avro follows its own standards of defining schemas. These schemas describe the following details −
Using these schemas, you can store serialized values in binary format using less space. These values are stored without any metadata.
The Avro schema is created in JavaScript Object Notation (JSON) document format, which is a lightweight text-based data interchange format. It is created in one of the following ways −
Example − The following example shows a schema, which defines a document, under the name space Tutorialspoint, with name Employee, having fields name and age.
In this example, you can observe that there are four fields for each record −
Avro schema is having primitive data types as well as complex data types. The following table describes the primitive data types of Avro −
Data type | Description |
---|---|
null | Null is a type having no value. |
int | 32-bit signed integer. |
long | 64-bit signed integer. |
float | single precision (32-bit) IEEE 754 floating-point number. |
double | double precision (64-bit) IEEE 754 floating-point number. |
bytes | sequence of 8-bit unsigned bytes. |
string | Unicode character sequence. |
Along with primitive data types, Avro provides six complex data types namely Records, Enums, Arrays, Maps, Unions, and Fixed.
A record data type in Avro is a collection of multiple attributes. It supports the following attributes −
Given below is the example of a record.
An enumeration is a list of items in a collection, Avro enumeration supports the following attributes −
Given below is the example of an enumeration.
This data type defines an array field having a single attribute items. This items attribute specifies the type of items in the array.
The map data type is an array of key-value pairs, it organizes data as key-value pairs. The key for an Avro map must be a string. The values of a map hold the data type of the content of map.
A union datatype is used whenever the field has one or more datatypes. They are represented as JSON arrays. For example, if a field that could be either an int or null, then the union is represented as ["int", "null"].
Given below is an example document using unions −
This data type is used to declare a fixed-sized field that can be used for storing binary data. It has field name and data as attributes. Name holds the name of the field, and size holds the size of the field.