Does anyone care about protobufs?

A bit of explanation … protobufs are heavily used at Google both for transmitting and for storing data; and are about as lightweight as possible. For example, if you wanted to send data like this (in JSON):

{person_name: {first: "Freddy", last: "Flintstone}, number: 1000}

there would be a description of this message (in a .proto file):

message PersonAndNumber {
  message PersonName {
    string last= 3;
    string first = 4;
  }
  Person person_name = 1;
  int number = 2;
}

and the message would be serialized somewhat like this:
1(20)4(6)Freddy3(10)Flintstone2(1000).
The 1,2,3,4 are the field numbers(*) and the items in parentheses are lengths or numbers.
The actual encoding is a bit different (it’s more compact and not human-readable), but follows the same idea.
The ordering of fields is undefined; this is also a correct serialization:
2(1000)1(20)3(10)Flintstone4(6)Freddy.

Notice that there’s no metadata in the message: both sender and receiver have to agree in advance that they’re communicating via a PersonAndNumber message.

The protobuf compiler (protoc) generates code so that, for example, in Python you can write:

import PersonAndNumber_pb2  # generated by protoc from the .proto file
p_a_n = PersonAndNumber_pb2(receive_serialized_msg())
print(f"{p_a_n.person_name.first} {p_a_n.person_name.last}'s number is{p_a_n.number}")

and similarly, the message can be created something like this (the details are slightly different):

p_a_n = PersonAndNumber_pb2()
name = p_a_n.person_name()
name.first = "Freddy"
name.last = "Flintstone"
p_a_n.number = 1000
send_serialized_msg(p_a_n.Serialize())

So, in C++, Java, Python, the protobuf contents can be handled almost like a native data type, but the data can be transmitted in a very compact form; and the two ends of the communication can be written in whatever languages you want.

Also, it’s easy to add new fields in a message definition (.proto file) - older programs will just ignore the new fields. (That’s why the field numbers are included in the message; they also allow leaving out fields with default values.)

(*) A real .proto file would label the fields from 1 in each message; I’ve given every field a different number to make it easier to understand how the message is serialized.

3 Likes