You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@@ -8,7 +8,7 @@ There doesn't seem to be a good resource online describing the issues with proto
> Wire format ordering and map iteration ordering of map values is undefined, so you cannot rely on your map items being in a particular order.
[Encoding & Field Order documentation]():
[Encoding & Field Order documentation](https://developers.google.com/protocol-buffers/docs/encoding#order):
> While you can use field numbers in any order in a `.proto`, when a message is serialized its known fields should be written sequentially by field number, as in the provided C++, Java, and Python serialization code. This allows parsing code to use optimizations that rely on field numbers being in sequence. However, protocol buffer parsers must be able to parse fields in any order, as not all messages are created by simply serializing an object – for instance, it's sometimes useful to merge two messages by simply concatenating them.
kchristidis
revised
this gist Jun 15, 2017.
1 changed file
with
1 addition
and
1 deletion.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@@ -8,7 +8,7 @@ There doesn't seem to be a good resource online describing the issues with proto
> Wire format ordering and map iteration ordering of map values is undefined, so you cannot rely on your map items being in a particular order.
[Enconding & Field Order documentation]():
[Encoding & Field Order documentation]():
> While you can use field numbers in any order in a `.proto`, when a message is serialized its known fields should be written sequentially by field number, as in the provided C++, Java, and Python serialization code. This allows parsing code to use optimizations that rely on field numbers being in sequence. However, protocol buffer parsers must be able to parse fields in any order, as not all messages are created by simply serializing an object – for instance, it's sometimes useful to merge two messages by simply concatenating them.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There doesn't seem to be a good resource online describing the issues with protocol buffers and deterministic serialization (or lack thereof). This is a collection of links on the subject.
> The deterministic serialization is, however, NOT canonical across languages; it is also unstable across different builds with schema changes due to unknown fields.
> Wire format ordering and map iteration ordering of map values is undefined, so you cannot rely on your map items being in a particular order.
[Enconding & Field Order documentation]():
> While you can use field numbers in any order in a `.proto`, when a message is serialized its known fields should be written sequentially by field number, as in the provided C++, Java, and Python serialization code. This allows parsing code to use optimizations that rely on field numbers being in sequence. However, protocol buffer parsers must be able to parse fields in any order, as not all messages are created by simply serializing an object – for instance, it's sometimes useful to merge two messages by simply concatenating them.
> The undeterministic comes from unknown fields and a new feature protobuf maps. If you can guarantee there are no such fields in your proto, the protobuf library will always serialize other fields ordered by field number and thus should output the same bytes.
> In general, the same data will serialize in exactly the same way.
>
> However, this is not guaranteed by the protobuf specifications. For example, the following differences in encoding are allowable and must decode to the same result in all conforming libraries:
>
> - Encoding fields in different order than the tag number order.
>
>
> - Encoding packed fields as unpacked.
> - Encoding integers as longer varint byte sequences than needed.
> - Encoding same (non-repeated) field multiple times.
> The main concern that the deterministic serialization isn't canonical is due to the unknown fields. As string and message type share the same wire type, when parsing an unknown string/message type, the parser has no idea whether to recursively canonicalize the unknown field.The cross-language inconsistency is mainly due to the string fields comparison performance, i.e. java/objc uses utf16 encodings which has different orderings than utf8 strings due to surrogate pairs.