Converters#

msgspec provides builtin support for several common protocols (json, msgpack, yaml, and toml). Support for additional protocols may be added by combining a serialization library with msgspec’s converter functions: msgspec.to_builtins and msgspec.convert.

  • msgspec.to_builtins: takes an object composed of any supported type and converts it into one composed of only simple builtin types typically supported by Python serialization libraries.

  • msgspec.convert: takes an object composed of any supported type, and converts it to match a specified schema (validating along the way). If the conversion fails due to a schema mismatch, a nice error message is raised.

These functions are designed to be paired with a Python serialization library as pre/post processors for typical dumps and loads functions.

_images/converters-light.svg_images/converters-dark.svg

For example, if msgspec didn’t already provide support for json, you could add support by wrapping the standard library’s json module as follows:

In [1]: import json
   ...: from typing import Any
   ...:
   ...: import msgspec

In [2]: def encode(obj):
   ...:     return json.dumps(msgspec.to_builtins(obj))

In [3]: def decode(msg, type=Any):
   ...:     return msgspec.convert(json.loads(msg), type=type)

In [4]: class Point(msgspec.Struct):
   ...:     x: int
   ...:     y: int

In [5]: x = Point(1, 2)

In [6]: msg = encode(x)  # Encoding a high-level type works

In [7]: msg
'{"x": 1, "y": 2}'

In [8]: decode(msg, type=Point)  # Decoding a high-level type works
Point(x=1, y=2)

In [9]: decode('{"x": "oops", "y": 2}', type=Point)  # Schema mismatches error
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[9], line 1
----> 1 decode('{"x": "oops", "y": 2}', type=Point)  # Schema mismatches error

Cell In[3], line 2, in decode(msg, type)
     1 def decode(msg, type=Any):
---> 2     return msgspec.convert(json.loads(msg), type=type)

ValidationError: Expected `int`, got `str` - at `$.x`

Since all protocols are different, to_builtins and convert have a few configuration options:

  • builtin_types: an iterable of additional types to treat as builtin types, beyond the standard dict, list, tuple, set, frozenset, str, int, float, bool, and None.

  • str_keys: whether the wrapped protocol only supports strings for object keys, rather than any hashable type.

  • strict: convert only. Whether type coercion rules should be strict. Defaults is True, setting to False enables a wider set of coercion rules from string to non-string types for all values. Among other uses, this may be used to handle completely untyped protocols like URL querystrings, where only string values exist. See “Strict” vs “Lax” Mode for more information.

  • from_attributes: convert only. If True, input objects may be coerced to Struct/dataclass/attrs types by extracting attributes from the input matching fields in the output type. One use case is converting database query results (ORM or otherwise) to msgspec structured types. The default is False.

  • enc_hook/dec_hook: the standard keyword arguments used for Extending msgspec to support additional types.


Taking a look at another protocol - TOML. This protocol

If msgspec didn’t already provide support for toml, you could add support by wrapping the standard library’s tomllib module as follows:

import datetime
import tomllib
from typing import Any

import msgspec

def decode(msg, *, type=Any, dec_hook=None):
    return msgspec.convert(
        toml.loads(msg),
        type,
        builtin_types=(datetime.datetime, datetime.date, datetime.time),
        str_keys=True,
        dec_hook=dec_hook,
    )

msgspec uses these APIs to implement toml and yaml support, wrapping external serialization libraries:

  • msgspec.toml (code)

  • msgspec.yaml (code)

The implementation in msgspec.toml is almost identical to the one above, with some additional code for error handling.