Benchmarks are hard.

Repeatedly calling the same function in a tight loop will lead to the instruction cache staying hot and branches being highly predictable. That’s not representative of real world access patterns. It’s also hard to write a nonbiased benchmark. I wrote quickle, naturally whatever benchmark I publish it’s going to perform well in.

Even so, people like to see benchmarks. I’ve tried to be as nonbiased as I can be, and the results hopefully indicate a few tradeoffs you make when you choose different serialization formats. I encourage you to write your own benchmarks before making these decisions.

Here we show a simple benchmark serializing some structured data. The data we’re serializing has the following schema (defined here using quickle.Struct types):

import quickle
from typing import List, Optional

class Address(quickle.Struct):
    street: str
    state: str
    zip: int

class Person(quickle.Struct):
    first: str
    last: str
    age: int
    addresses: Optional[List[Address]] = None
    telephone: Optional[str] = None
    email: Optional[str] = None

The libraries we’re benchmarking are the following:

Each benchmark creates one or more instances of a Person message, and serializes it/deserializes it in a loop. The full benchmark code can be found here.

Benchmark - 1 Object

Some workflows involve sending around very small messages. Here the overhead per function call dominates (parsing of options, allocating temporary buffers, etc…). Libraries like quickle and msgpack, where internal structures are allocated once and can be reused will generally perform better here than libraries like pickle, where each call needs to allocate some temporary objects.


You can use the radio buttons on the bottom to sort by total roundtrip time, dumps (serialization) time, loads (deserialization) time, or serialized message size.

From the chart above, you can see that quickle structs is the fastest method for both serialization and deserialization. It also results in the second smallest message size (behind pyrobuf). This makes sense, struct types don’t need to serialize the fields in each message (things like first, last, …), only the values, so there’s less data to send around. Since python is dynamic, each object serialized requires a few pointer chases, so serializing fewer objects results in faster and smaller messages.

I’m actually surprised at how much overhead pyrobuf has (the actual protobuf encoding should be pretty efficient), I suspect there’s some optimizations that could still be done there.

That said, all of these methods serialize/deserialize pretty quickly relative to other python operations, so unless you’re counting every microsecond your choice here probably doesn’t matter that much.

Benchmark - 1000 Objects

Here we serialize a list of 1000 Person objects. There’s a lot more data here, so the per-call overhead will no longer dominate, and we’re now measuring the efficiency of the encoding/decoding.

As with before quickle structs and quickle both perform well here. What’s interesting is that msgpack and orjson have now moved to the back for deserialization time.

The reason for this is memoization. Since each message here is structured (all dicts have the same keys), msgpack and orjson are serializing the same strings multiple times. In contrast, quickle and pickle both support memoization - identical objects in a message will only be serialized once, and then referenced later on. This results in smaller messages and faster deserialization times. For messages without repeat objects, memoization is an added cost you don’t need. But as soon as you get more than a handful of repeat objects, the performance win becomes important.

Note that quickle structs, pickle tuples, and pyrobuf don’t require memoization to be efficient here, as the repeated field names aren’t serialized as part of the message.

Benchmark - 10,000 Objects

Here we run the same benchmark as before, but 10,000 Person objects.

Like the 1000 object benchmark, the cost of serializing/deserializing repeated strings dominate for the orjson and msgpack benchmarks.