loke.dev
Header image for How to Perform Real-Time Spatial Queries Without the PostGIS Overhead

How to Perform Real-Time Spatial Queries Without the PostGIS Overhead

Leveraging Hilbert curves and S2 geometry to build a high-performance geospatial engine directly in your application layer.

· 8 min read

You don't need a multi-gigabyte relational database instance just to find the nearest coffee shop or track a delivery driver in real-time. In fact, if you’re building a high-throughput system that requires sub-millisecond spatial lookups, PostGIS might actually be the thing holding you back.

While PostGIS is the undisputed king of complex GIS analysis and heavy-duty spatial joins, it carries a heavy tax. Every time you run a query, you’re paying for disk I/O, complex locking mechanisms, and the overhead of the PostgreSQL query planner. If your goal is to handle 50,000 requests per second to find nearby points of interest, the "database-first" approach will likely lead you down a path of expensive RDS instances and frustrating scaling bottlenecks.

The alternative is moving the spatial intelligence into your application layer. By leveraging Space-Filling Curves (SFCs)—specifically Hilbert curves and Google’s S2 geometry—you can treat spatial data as simple integers. This allows you to perform complex proximity queries using nothing more than a fast Key-Value store like Redis or even a sorted array in memory.

The Locality Problem: Why Lat/Lon is Hard to Query

The fundamental problem with geospatial data is that it’s two-dimensional, but our computer memory and most of our indexes (like B-Trees) are one-dimensional.

If you store coordinates as (latitude, longitude), you can easily index one of them. But if you search for everything within a 5km radius, a standard index only helps you filter one dimension. You might find everything in the right latitude range, but you still have to manually scan thousands of rows to see if they fall within the correct longitude range. This is the classic "range scan" problem.

We need a way to map 2D coordinates onto a 1D line while preserving locality. If two points are close together on the map, they should be close together on our 1D line.

Enter the Hilbert Curve

The Hilbert curve is a space-filling curve that snakes through a 2D plane, visiting every point in a grid. Unlike the simpler Morton (Z-order) curve, the Hilbert curve is much better at maintaining locality. It doesn't "jump" across the map as often when it moves from one section of the grid to another.

When we use a Hilbert curve, we convert a (x, y) coordinate into a single index integer. Because the curve preserves locality, a "neighborhood" in 2D space becomes a "range" of integers in 1D space.

Here is a simplified Python implementation showing how you might map a 2D point to a Hilbert index. This uses bit manipulation to recursively determine which quadrant of the grid a point resides in.

def d2xy(n, d):
    """
    Converts a 1D Hilbert index (d) back to 2D coordinates (x, y)
    n: the side length of the grid (must be a power of 2)
    """
    t = d
    x = y = 0
    s = 1
    while s < n:
        rx = 1 & (t // 2)
        ry = 1 & (t ^ rx)
        x, y = rot(s, x, y, rx, ry)
        x += s * rx
        y += s * ry
        t //= 4
        s *= 2
    return x, y

def xy2d(n, x, y):
    """
    Converts 2D coordinates (x, y) to a 1D Hilbert index (d)
    """
    d = 0
    s = n // 2
    while s > 0:
        rx = (x & s) > 0
        ry = (y & s) > 0
        d += s * s * ((3 * rx) ^ ry)
        x, y = rot(s, x, y, rx, ry)
        s //= 2
    return d

def rot(n, x, y, rx, ry):
    """
    Rotates and flips the quadrant appropriately
    """
    if ry == 0:
        if rx == 1:
            x = n - 1 - x
            y = n - 1 - y
        return y, x
    return x, y

# Example Usage:
grid_size = 256 # 2^8
index = xy2d(grid_size, 10, 20)
print(f"The Hilbert index for (10, 20) is: {index}")

In a real-world scenario, you wouldn't just use a raw grid. You’d project the entire Earth onto this curve. This is exactly what Google’s S2 library does, but with a few clever mathematical upgrades.

S2 Geometry: The Sphere as a Cube

Google’s S2 library is the gold standard for application-layer spatial indexing. It’s what powers Foursquare, Uber, and Google Maps.

Instead of treating the world as a flat map (which leads to massive distortion at the poles), S2 projects the Earth onto the six faces of a cube. Each face is then subdivided using a Hilbert curve.

This gives us S2 Cell IDs. An S2 Cell ID is a 64-bit integer that represents a specific area on the Earth. The "level" of the cell determines its size. A Level 0 cell covers an entire face of the cube (thousands of kilometers), while a Level 30 cell is about a square centimeter.

The beauty of S2 is that the Cell ID is hierarchical. The first few bits tell you which face of the cube you're on, and subsequent bits define smaller and smaller quadrants. This means that all points within a specific geographic area will have Cell IDs that are numerically close to each other.

Example: Generating S2 Cell IDs in Python

Using the s2sphere library (a Python implementation of S2), we can convert a Lat/Lon into a 64-bit integer.

import s2sphere

def get_cell_id(lat, lng, level=20):
    p1 = s2sphere.LatLng.from_degrees(lat, lng)
    cell_id = s2sphere.CellId.from_lat_lng(p1).parent(level)
    return cell_id.id()

# Central Park, NY
lat, lng = 40.785091, -73.968285
cell_id = get_cell_id(lat, lng)
print(f"S2 Cell ID: {cell_id}")
# Output: 8913312217551077376 (a simple 64-bit integer)

How to Build the Spatial Index

Now that we can represent a location as a 64-bit integer, how do we query it?

If you store your data in a system that supports ordered keys (like Redis Sorted Sets, DynamoDB with a Sort Key, or even a standard B-Tree in a relational DB), you can perform a spatial search using range queries.

Let's say you want to find all drivers within a certain area.
1. You convert each driver's Lat/Lon to a Level 20 S2 Cell ID.
2. You store these in a Redis Sorted Set where the score is the Cell ID and the member is the Driver ID.
3. To search, you calculate the "covering" of your search area.

The "Covering" Concept

If you want to search a circular radius, S2 can provide a list of Cell IDs (at various levels) that completely cover that circle. Instead of one massive, complex polygon query, you end up with a few dozen integer ranges.

def get_search_ranges(lat, lng, radius_meters):
    region = s2sphere.Cap.from_axis_height(
        s2sphere.LatLng.from_degrees(lat, lng).to_point(),
        # Convert meters to radians for S2
        (radius_meters / 6371010.0) ** 2 / 2.0 
    )
    coverer = s2sphere.RegionCoverer()
    coverer.min_level = 10
    coverer.max_level = 20
    coverer.max_cells = 8 # Trade-off between accuracy and number of ranges
    
    cells = coverer.get_covering(region)
    ranges = []
    for cell in cells:
        ranges.append((cell.range_min().id(), cell.range_max().id()))
    return ranges

ranges = get_search_ranges(40.785091, -73.968285, 1000)
for r_min, r_max in ranges:
    print(f"Search range: {r_min} to {r_max}")

You now take these ranges and run parallel lookups against your database:
SELECT driver_id FROM drivers WHERE cell_id BETWEEN range_min AND range_max;

In Redis, this is a simple ZRANGEBYSCORE. Because you are doing integer comparisons on an indexed field, it is blazingly fast.

Why This Beats PostGIS for Real-Time Apps

1. Memory over Disk: You can keep your entire spatial index in Redis. PostGIS often requires hitting the disk unless you have a massive amount of RAM.
2. No More "Stale" Stats: PostgreSQL’s query planner relies on table statistics. If your data changes rapidly (like thousands of moving Ubers), the statistics get stale, and the planner might choose a terrible query plan. S2 lookups are deterministic.
3. Horizontal Scaling: It is much easier to shard a Key-Value store based on Cell ID ranges than it is to shard a PostGIS instance. You can distribute different "faces" of the S2 cube to different physical machines.
4. CPU Efficiency: Calculating an S2 cell ID in your app code costs almost nothing. Doing a ST_DWithin check in PostGIS involves complex spherical geometry calculations for every candidate row.

The Gotchas: Precision and False Positives

This approach isn't a silver bullet. There are two main hurdles you'll face.

1. The "Filtering" Step

When you get a "covering" from S2, it's rarely a perfect fit for your circle. It’s a collection of squares that *covers* your circle. This means your query will return some points that are technically just outside your radius (False Positives).

You must perform a final distance check in your application code.
1. Query the index for Cell ID ranges.
2. Fetch the actual Lat/Lon for those results.
3. Calculate the Haversine distance between the user and each result.
4. Discard the ones that fall outside the radius.

Since you've already filtered out 99.9% of the world using the Cell ID range, calculating the Haversine distance for the remaining 50 results is trivial.

2. Edge Cases at the Poles and Antimeridian

One of the reasons to use S2 over Geohash or raw Hilbert curves is that S2 handles the wrap-around at the 180-degree meridian and the singularities at the poles correctly. If you try to roll your own Hilbert implementation, you’ll likely find that points across the "seam" of your map are numerically far apart despite being physically close. Stick to S2; the math is already solved.

A Practical Architecture

If I were building a real-time tracking system today—something like a "find nearby technicians" feature—here is exactly how I'd structure it:

1. Ingestion: As locations stream in (via WebSockets or MQTT), calculate the S2 Cell ID at Level 20.
2. Storage: Store the technician's full data in a document store (MongoDB/DynamoDB) but store their (CellID, TechID) in a Redis Sorted Set.
3. Querying:
* Client sends (lat, lng, radius).
* App server generates 8-12 S2 ranges using RegionCoverer.
* App server fires ZRANGEBYSCORE requests to Redis in parallel (or using a pipeline).
* App server pulls the Lat/Lon for those TechIDs from a local cache or fast KV.
* App server filters the results using the Haversine formula.
* Return results.

This architecture can easily handle hundreds of thousands of updates and queries per second on relatively modest hardware. You aren't fighting the database; you're using it for what it's best at: fast, sorted lookups.

When Should You Stick to PostGIS?

I’m not saying PostGIS is obsolete. It’s a masterpiece of software. You should stay with PostGIS if:
* You need to do complex polygon-on-polygon intersections (e.g., "Find all voting districts that overlap with this flood zone").
* You are doing offline data analysis where sub-millisecond latency doesn't matter.
* Your dataset is small enough that a single RDS instance handles it without sweat.
* You don't want to manage the complexity of "covering" logic in your application code.

But if you’re building the next great real-time "nearby" app, stop trying to tune your vacuum_settings and start looking at the geometry of the sphere. The speed gains of moving spatial logic to the application layer are simply too large to ignore.

By treating the Earth as a collection of 64-bit integers, you turn a difficult geographic problem into a simple sorting problem. And computers are very, very good at sorting.