GeoJSON

GeoJSON is a popular format for encoding geographic data. Its specification describes nine different types a message may take (seven “geometry” types, plus two “feature” types). Here we provide one way of implementing that specification using msgspec to handle the parsing and validation.

The loads and dumps methods defined below work similar to the standard library’s json.loads/json.dumps, but:

  • Will result in high-level msgspec.Struct objects representing GeoJSON types

  • Will error nicely if a field is missing or the wrong type

  • Will fill in default values for optional fields

  • Decodes and encodes significantly faster than the json module (as well as most other json implementations in Python).

This example makes use msgspec.Struct types to define the different GeoJSON types, and Tagged Unions to differentiate between them. See the relevant docs for more information.

The full example source can be found here.

from __future__ import annotations

import msgspec

Position = tuple[float, float]


# Define the 7 standard Geometry types.
# All types set `tag=True`, meaning that they'll make use of a `type` field to
# disambiguate between types when decoding.
class Point(msgspec.Struct, tag=True):
    coordinates: Position


class MultiPoint(msgspec.Struct, tag=True):
    coordinates: list[Position]


class LineString(msgspec.Struct, tag=True):
    coordinates: list[Position]


class MultiLineString(msgspec.Struct, tag=True):
    coordinates: list[list[Position]]


class Polygon(msgspec.Struct, tag=True):
    coordinates: list[list[Position]]


class MultiPolygon(msgspec.Struct, tag=True):
    coordinates: list[list[list[Position]]]


class GeometryCollection(msgspec.Struct, tag=True):
    geometries: list[Geometry]


Geometry = (
    Point
    | MultiPoint
    | LineString
    | MultiLineString
    | Polygon
    | MultiPolygon
    | GeometryCollection
)


# Define the two Feature types
class Feature(msgspec.Struct, tag=True):
    geometry: Geometry | None = None
    properties: dict | None = None
    id: str | int | None = None


class FeatureCollection(msgspec.Struct, tag=True):
    features: list[Feature]


# A union of all 9 GeoJSON types
GeoJSON = Geometry | Feature | FeatureCollection


# Create a decoder and an encoder to use for decoding & encoding GeoJSON types
loads = msgspec.json.Decoder(GeoJSON).decode
dumps = msgspec.json.Encoder().encode

Here we use the loads method defined above to read some example GeoJSON.

In [1]: import msgspec_geojson

In [2]: with open("canada.json", "rb") as f:
   ...:     data = f.read()

In [3]: canada = msgspec_geojson.loads(data)

In [4]: type(canada)  # loaded as high-level, validated object
Out[4]: msgspec_geojson.FeatureCollection

In [5]: canada.features[0].properties
Out[5]: {'name': 'Canada'}

Comparing performance to:

In [6]: %timeit msgspec_geojson.loads(data)  # benchmark msgspec
6.15 ms ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit orjson.loads(data)  # benchmark orjson
8.67 ms ± 20.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [8]: %timeit json.loads(data)  # benchmark json
27.6 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: %timeit geojson.loads(data)  # benchmark geojson
93.9 ms ± 88.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

This shows that the readable msgspec implementation above is 1.4x faster than orjson (on this data), while also ensuring the loaded data is valid GeoJSON. Compared to geojson (another validating geojson library for python), loading the data using msgspec was 15.3x faster.