A bit of explanation … protobufs are heavily used at Google both for transmitting and for storing data; and are about as lightweight as possible. For example, if you wanted to send data like this (in JSON):
{person_name: {first: "Freddy", last: "Flintstone}, number: 1000}
there would be a description of this message (in a .proto
file):
message PersonAndNumber {
message PersonName {
string last= 3;
string first = 4;
}
Person person_name = 1;
int number = 2;
}
and the message would be serialized somewhat like this:
1(20)4(6)Freddy3(10)Flintstone2(1000)
.
The 1,2,3,4 are the field numbers(*) and the items in parentheses are lengths or numbers.
The actual encoding is a bit different (it’s more compact and not human-readable), but follows the same idea.
The ordering of fields is undefined; this is also a correct serialization:
2(1000)1(20)3(10)Flintstone4(6)Freddy
.
Notice that there’s no metadata in the message: both sender and receiver have to agree in advance that they’re communicating via a PersonAndNumber message.
The protobuf compiler (protoc
) generates code so that, for example, in Python you can write:
import PersonAndNumber_pb2 # generated by protoc from the .proto file
p_a_n = PersonAndNumber_pb2(receive_serialized_msg())
print(f"{p_a_n.person_name.first} {p_a_n.person_name.last}'s number is{p_a_n.number}")
and similarly, the message can be created something like this (the details are slightly different):
p_a_n = PersonAndNumber_pb2()
name = p_a_n.person_name()
name.first = "Freddy"
name.last = "Flintstone"
p_a_n.number = 1000
send_serialized_msg(p_a_n.Serialize())
So, in C++, Java, Python, the protobuf contents can be handled almost like a native data type, but the data can be transmitted in a very compact form; and the two ends of the communication can be written in whatever languages you want.
Also, it’s easy to add new fields in a message definition (.proto file) - older programs will just ignore the new fields. (That’s why the field numbers are included in the message; they also allow leaving out fields with default values.)
(*) A real .proto file would label the fields from 1 in each message; I’ve given every field a different number to make it easier to understand how the message is serialized.