GeoJSON¶
GeoJSON is a popular format for encoding geographic
data. Its specification describes nine different types a message may take
(seven “geometry” types, plus two “feature” types). Here we provide one way of
implementing that specification using msgspec
to handle the parsing and
validation.
The loads
and dumps
methods defined below work similar to the
standard library’s json.loads
/json.dumps
, but:
Will result in high-level
msgspec.Struct
objects representing GeoJSON typesWill error nicely if a field is missing or the wrong type
Will fill in default values for optional fields
Decodes and encodes significantly faster than the
json
module (as well as most otherjson
implementations in Python).
This example makes use msgspec.Struct
types to define the different GeoJSON
types, and Tagged Unions to differentiate between them. See the
relevant docs for more information.
The full example source can be found here.
from __future__ import annotations
import msgspec
Position = tuple[float, float]
# Define the 7 standard Geometry types.
# All types set `tag=True`, meaning that they'll make use of a `type` field to
# disambiguate between types when decoding.
class Point(msgspec.Struct, tag=True):
coordinates: Position
class MultiPoint(msgspec.Struct, tag=True):
coordinates: list[Position]
class LineString(msgspec.Struct, tag=True):
coordinates: list[Position]
class MultiLineString(msgspec.Struct, tag=True):
coordinates: list[list[Position]]
class Polygon(msgspec.Struct, tag=True):
coordinates: list[list[Position]]
class MultiPolygon(msgspec.Struct, tag=True):
coordinates: list[list[list[Position]]]
class GeometryCollection(msgspec.Struct, tag=True):
geometries: list[Geometry]
Geometry = (
Point
| MultiPoint
| LineString
| MultiLineString
| Polygon
| MultiPolygon
| GeometryCollection
)
# Define the two Feature types
class Feature(msgspec.Struct, tag=True):
geometry: Geometry | None = None
properties: dict | None = None
id: str | int | None = None
class FeatureCollection(msgspec.Struct, tag=True):
features: list[Feature]
# A union of all 9 GeoJSON types
GeoJSON = Geometry | Feature | FeatureCollection
# Create a decoder and an encoder to use for decoding & encoding GeoJSON types
loads = msgspec.json.Decoder(GeoJSON).decode
dumps = msgspec.json.Encoder().encode
Here we use the loads
method defined above to read some example GeoJSON.
In [1]: import msgspec_geojson
In [2]: with open("canada.json", "rb") as f:
...: data = f.read()
In [3]: canada = msgspec_geojson.loads(data)
In [4]: type(canada) # loaded as high-level, validated object
Out[4]: msgspec_geojson.FeatureCollection
In [5]: canada.features[0].properties
Out[5]: {'name': 'Canada'}
Comparing performance to:
In [6]: %timeit msgspec_geojson.loads(data) # benchmark msgspec
6.15 ms ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [7]: %timeit orjson.loads(data) # benchmark orjson
8.67 ms ± 20.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [8]: %timeit json.loads(data) # benchmark json
27.6 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [9]: %timeit geojson.loads(data) # benchmark geojson
93.9 ms ± 88.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
This shows that the readable msgspec
implementation above is 1.4x faster
than orjson
(on this data), while also ensuring the loaded data is valid
GeoJSON. Compared to geojson (another validating geojson library for python),
loading the data using msgspec
was 15.3x faster.