Benchmarks¶
Note
Benchmarks are hard.
Repeatedly calling the same function in a tight loop will lead to the instruction cache staying hot and branches being highly predictable. That’s not representative of real world access patterns. It’s also hard to write a nonbiased benchmark. I wrote quickle, naturally whatever benchmark I publish it’s going to perform well in.
Even so, people like to see benchmarks. I’ve tried to be as nonbiased as I can be, and the results hopefully indicate a few tradeoffs you make when you choose different serialization formats. I encourage you to write your own benchmarks before making these decisions.
Here we show a simple benchmark serializing some structured data. The data
we’re serializing has the following schema (defined here using quickle.Struct
types):
import quickle
from typing import List, Optional
class Address(quickle.Struct):
street: str
state: str
zip: int
class Person(quickle.Struct):
first: str
last: str
age: int
addresses: Optional[List[Address]] = None
telephone: Optional[str] = None
email: Optional[str] = None
The libraries we’re benchmarking are the following:
msgpack
- msgpack with dict message typesorjson
- orjson with dict message typespyrobuf
- pyrobuf with protobuf message typespickle
- pickle with dict message typespickle tuples
- pickle withcollections.namedtuple
message typesquickle
- quickle with dict message typesquickle structs
- quickle withquickle.Struct
message types
Each benchmark creates one or more instances of a Person
message, and
serializes it/deserializes it in a loop. The full benchmark code can be found
here.
Benchmark - 1 Object¶
Some workflows involve sending around very small messages. Here the overhead
per function call dominates (parsing of options, allocating temporary buffers,
etc…). Libraries like quickle
and msgpack
, where internal structures
are allocated once and can be reused will generally perform better here than
libraries like pickle
, where each call needs to allocate some temporary
objects.
Note
You can use the radio buttons on the bottom to sort by total roundtrip time, dumps (serialization) time, loads (deserialization) time, or serialized message size.
From the chart above, you can see that quickle structs
is the fastest
method for both serialization and deserialization. It also results in the
second smallest message size (behind pyrobuf
). This makes sense, struct
types don’t need to serialize the fields in each message (things like
first
, last
, …), only the values, so there’s less data to send
around. Since python is dynamic, each object serialized requires a few pointer
chases, so serializing fewer objects results in faster and smaller messages.
I’m actually surprised at how much overhead pyrobuf
has (the actual
protobuf encoding should be pretty efficient), I suspect there’s some
optimizations that could still be done there.
That said, all of these methods serialize/deserialize pretty quickly relative to other python operations, so unless you’re counting every microsecond your choice here probably doesn’t matter that much.
Benchmark - 1000 Objects¶
Here we serialize a list of 1000 Person
objects. There’s a lot more data
here, so the per-call overhead will no longer dominate, and we’re now measuring
the efficiency of the encoding/decoding.
As with before quickle structs
and quickle
both perform well here.
What’s interesting is that msgpack
and orjson
have now moved to the
back for deserialization time.
The reason for this is memoization. Since each message here is structured
(all dicts have the same keys), msgpack
and orjson
are serializing the
same strings multiple times. In contrast, quickle
and pickle
both
support memoization - identical objects in a message will only be serialized
once, and then referenced later on. This results in smaller messages and faster
deserialization times. For messages without repeat objects, memoization is an
added cost you don’t need. But as soon as you get more than a handful of
repeat objects, the performance win becomes important.
Note that quickle structs
, pickle tuples
, and pyrobuf
don’t require
memoization to be efficient here, as the repeated field names aren’t serialized
as part of the message.
Benchmark - 10,000 Objects¶
Here we run the same benchmark as before, but 10,000 Person
objects.
Like the 1000 object benchmark, the cost of serializing/deserializing repeated
strings dominate for the orjson
and msgpack
benchmarks.