msgspec

msgspec is a fast and friendly implementation of the MessagePack protocol for Python 3.8+. In addition to serialization/deserialization, it supports runtime message validation using schemas defined via Python’s type annotations.

from typing import Optional, List
import msgspec

# Define a schema for a `User` type
class User(msgspec.Struct):
    name: str
    groups: List[str] = []
    email: Optional[str] = None

# Create a `User` object
alice = User("alice", groups=["admin", "engineering"])

# Serialize `alice` to `bytes` using the MessagePack protocol
serialized_data = msgspec.encode(alice)

# Deserialize and validate the message as a User type
user = msgspec.Decoder(User).decode(serialized_data)

assert user == alice

Highlights

  • msgspec is fast. Benchmarks show it’s among the fastest serialization methods for Python.

  • msgspec is friendly. Through use of Python’s type annotations, messages can be validated during deserialization in a declaritive way. msgspec also works well with other type-checking tooling like mypy, providing excellent editor integration.

  • msgspec is flexible. Unlike other libraries like msgpack or json, msgspec natively supports a wider range of Python builtin types.

  • msgspec supports “schema evolution”. Messages can be sent between clients with different schemas without error.

Installation

msgspec can be installed via pip (and soon via conda). Note that Python >= 3.8 is required.

pip

pip install msgspec

Usage

For ease of use, msgspec exposes two functions encode and decode, for serializing and deserializing objects respectively.

>>> import msgspec
>>> data = msgspec.encode({"hello": "world"})
>>> msgspec.decode(data)
{'hello': 'world'}

Note that if you’re making multiple calls to encode or decode, it’s more efficient to create an Encoder and Decoder once, and then use Encoder.encode and Decoder.decode.

>>> enc = msgspec.Encoder()
>>> dec = msgspec.Decoder()
>>> data = enc.encode({"hello": "world"})
>>> dec.decode(data)
{'hello': 'world'}

Supported Types

Msgspec currently supports serializing/deserializing the following types:

Typed and Untyped Deserialization

By default, both decode and Decoder.decode will deserialize messages without any validation, performing untyped deserialization. MessagePack types are mapped to Python types as follows:

Messages composed of any combination of these will deserialize successfully without any further validation:

>>> b = msgspec.encode([1, "two", b"three"])  # encode a list with mixed types
>>> msgspec.decode(b)  # decodes using default types and no validation
[1, "two", b"three"]
>>> msgspec.Decoder().decode(b)  # likewise for Decoder.decode
[1, "two", b"three"]

If you want to deserialize a MessagePack type to a different Python type, or perform any type checking you’ll need to specify a deserialization type. This can be passed to either Decoder (recommended) or decode.

For example, say we wanted to deserialize the above as a set instead of a list (the default). We’d do this by specifying the expected message type as a set:

>>> msgspec.Decoder(set).decode(b)  # deserialize as a set
{1, "two", b"three"}
>>> msgspec.decode(b, type=set)  # can also pass to msgspec.decode
{1, "two", b"three"}

Nested type specifications are fully supported, and can be used to validate at deserialization time. If a message doesn’t match the specified type, an informative error message will be raised. For example, say we expect the above message to be a set of ints:

>>> from typing import Set
>>> dec = msgspec.Decoder(Set[int])  # define a decoder for a set of ints
>>> dec
Decoder(Set[int])
>>> dec.decode(b)  # Invalid messages raise a nice error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.DecodingError: Error decoding `Set[int]`: expected `int`, got `str`
>>> b2 = msgspec.encode({1, 2, 3})
>>> dec.decode(b2)  # Valid messages deserialize properly
{1, 2, 3}

Typed deserializers are most commonly used along with Struct messages to provide a message schema, but any combination of the following types is acceptible:

Structs

msgspec can serialize many builtin types, but unlike protocols like pickle/quickle, it can’t serialize arbitrary user classes. Two user-defined types are supported:

Structs are useful for defining structured messages. Fields are defined using python type annotations. Default values can also be specified for any optional arguments.

Here we define a struct representing a person, with two required fields and two optional fields.

>>> class Person(msgspec.Struct):
...     """A struct describing a person"""
...     first : str
...     last : str
...     address : str = ""
...     phone : str = None

Struct types automatically generate a few methods based on the provided type annotations:

  • __init__

  • __repr__

  • __copy__

  • __eq__ & __ne__

>>> harry = Person("Harry", "Potter", address="4 Privet Drive")
>>> harry
Person(first='Harry', last='Potter', address='4 Privet Drive', phone=None)
>>> harry.first
"Harry"
>>> ron = Person("Ron", "Weasley", address="The Burrow")
>>> ron == harry
False

It is forbidden to override __init__/__new__ in a struct definition, but other methods can be overridden or added as needed. The struct fields are available via the __struct_fields__ attribute (a tuple of the fields in argument order ) if you need them. Here we add a method for converting a struct to a dict.

>>> class Point(msgspec.Struct):
...     """A point in 2D space"""
...     x : float
...     y : float
...
...     def to_dict(self):
...         return {f: getattr(self, f) for f in self.__struct_fields__}
...
>>> p = Point(1.0, 2.0)
>>> p.to_dict()
{"x": 1.0, "y": 2.0}

Struct types are written in C and are quite speedy and lightweight. They’re great for defining structured messages both for serialization and for use in an application.

Like with builtin types, to deserialize a message as a struct you need to provide the expected deserialization type to Decoder.

>>> dec = msgspec.Decoder(Person)  # Create a decoder that expects a Person
>>> dec
Decoder(Person)
>>> data = msgspec.encode(harry)
>>> dec.decode(data)
Person(first='Harry', last='Potter', address='4 Privet Drive', phone=None)

Using structs for message schemas not only adds validation during deserialization, it also can improve performance. Depending on the schema, deserializing a message into a Struct can be roughly twice as fast as deserializing it into a dict.

Schema Evolution

Msgspec includes support for “schema evolution”, meaning that:

  • Messages serialized with an older version of a schema will be deserializable using a newer version of the schema.

  • Messages serialized with a newer version of the schema will be deserializable using an older version of the schema.

This can be useful if, for example, you have clients and servers with mismatched versions.

For schema evolution to work smoothly, you need to follow a few guidelines:

  1. Any new fields on a Struct must specify default values.

  2. Don’t change the type annotations for existing messages or fields

For example, suppose we wanted to add a new email field to our Person struct. To do so, we add it at the end of the definition, with a default value.

>>> class Person2(msgspec.Struct):
...     """A struct describing a person"""
...     first : str
...     last : str
...     address : str = ""
...     phone : str = None
...     email : str = None  # added at the end, with a default
...
>>> vernon = Person2("Vernon", "Dursley", address="4 Privet Drive", email="vernon@grunnings.com")

Messages serialized using the new and old schemas can still be exchanged without error. If an old message is deserialized using the new schema, the missing fields all have default values that will be used. Likewise, if a new message is deserialized with the old schema the unknown new fields will be efficiently skipped without decoding.

>>> old_dec = msgspec.Decoder(Person)
>>> new_dec = msgspec.Decoder(Person2)

>>> new_msg = msgspec.encode(vernon)
>>> old_dec.decode(new_msg)  # deserializing a new msg with an older decoder
Person(first="Vernon", last="Dursley", address="4 Privet Drive", phone=None)

>>> old_msg = msgspec.encode(harry)
>>> new_dec.decode(old_msg) # deserializing an old msg with a new decoder
Person2(first='Harry', last='Potter', address='4 Privet Drive', phone=None, email=None)