Objects clustering and grouping with Django, PostGIS and Google Maps

Object clustering is a common practice when you need to show many objects on the map. When we began to develop crane-locator.com project one of the leading features we start to implement first was a world map with equipment on it. The original idea was similar to marinetraffic.com where we see a world map with clusters where we see digits which show how many ships are in the particular cluster.

When the map is zooming in then, we see ships as separate objects.

Map on the marinetraffic.com, just an axample

Map with vessels on the marinetraffic.com

Our situation was a bit different. We had a different heavy-lifting equipment instead of ships, like cranes, low loaders, push-pull trucks, barges, or another kind of equipment. However, instead of marine traffic, we had another challenge: very often equipment owners use the same GPS coordinates for a set of equipment. For example, 10 different trucks parked in the same parking can have identical coordinates. Sometimes they could have different coordinates but very close to each other. For such situations, we must use object grouping on the map.

In our case, we use Google maps. The range of zooms could be from 1 to 15. And first thing which we should do to describe a cluster grid for few zoom ranges. For this purposes we use a very simple model which describes a cluster on the map for particular range of zooms:

class GridCluster(models.Model):
    zoom_range = IntegerRangeField(_('Zoom range'), default='(1,4)')
    polygon = models.PolygonField(_('Polygon'))
    objects = models.GeoManager()

We use the next config for describing cluster sizes:

SRID = 4326
# Sizes of the clusters for different ranges of zoom
GRID_CLUSTER_STEPS = [
    {
        'zoom': '[1,3]',
        'height': 4,  # in degrees
        'width': 5,  # in degrees
    },
    {
        'zoom': '(3,5]',
        'height': 2.5,
        'width': 3,
    },
    {
        'zoom': '(5,8]',
        'height': 1.5,
        'width': 2,
    }
]
GRID_GAP = 1e-20

Here we specified different cluster sizes for different ranges of zoom. In the range between 1 and 3, each cluster is a rectangle with sizes 4x5 degrees. For larger zoom ranges we must decrease cluster sizes. After few experiments, it is not so hard to set appropriate sizes. Because we use degrees, we need to specify particular SRID (https://en.wikipedia.org/wiki/SRID) for choosing a world model because there are many different algorithms to convert degrees to kilometers. In our case, we use SRID 4326 which means WGS84 (https://en.wikipedia.org/wiki/World_Geodetic_System#WGS84). Cluster height is significant in our case because due the specific of world model clusters are stretching vertically as more as closer to poles. Moreover, it is a problem when you build a grid. However, for now, we do not think much about it because there is no equipment in our database close to poles. Another thing which we use it is a tiny gap (1e-20 degrees) between clusters for preventing the situation when vertical cluster edge is a part of both neighbor clusters simultaneously.

For building the grids we use the next code:

for step in GRID_CLUSTER_STEPS:
    grids = []
    i = 0
    for y in xfrange(-90, 90+1-step['height'], step['height']):
        for x in xfrange(-180, 180+1-step['width'], step['width']):
            wkt_str = \
                'SRID={srid};POLYGON(({x1} {y1},{x2} {y2},' \
                '{x3} {y3},{x4} {y4},{x1} {y1}))'.format(
                srid=SRID,
                x1=x,
                y1=y,
                x2=x,
                y2=y + step['height'] - GRID_GAP,
                x3=x + step['width'] - GRID_GAP,
                y3=y + step['height'] - GRID_GAP,
                x4=x + step['width'] - GRID_GAP,
                y4=y,
                )
            wkt = fromstr(wkt_str)

            i += 1
            grids.append(GridCluster(
                zoom_range=step['zoom'],
                polygon=wkt,
            ))
    GridCluster.objects.bulk_create(grids)

Here we use custom iterator xfrange because we need float type for step argument:

def xfrange(start, stop, step):
    while start < stop:
        yield start
        start += step

Data of the each equipment on the map is also stored in database. Here is a simplified code for Equipment model (a real code looks a bit more complex):

class Equipment(models.Model):
    name = models.CharField(_('Model name'), max_length=300)
    visible = models.BooleanField(_('Visible'), default=True, blank=True)
    blocked = models.BooleanField(_('Blocked'), default=False, blank=True)

    coordinates = models.PointField(
        _('Coordinates'), default=GEOSGeometry('POINT(0 0)'))
    objects = models.GeoManager()

Look at the field coordinates. Here we use a special geometry type in the same manner like at GridCluster name where we use a special field for storing a polygon coordinates. Now PostGIS gives us the ability to use special operations in queries for implementing a clustering. A sample query to get a list of clusters with equipment counters could be looked like that:

SELECT count(*) AS count, polygon
FROM "equipment_equipment", "equipment_gridcluster"
WHERE coordinates IS NOT NULL
AND equipment_equipment.visible = TRUE
AND equipment_equipment.blocked = FALSE
AND equipment_gridcluster.zoom_range @> 3

AND ST_Intersects(coordinates, polygon)
AND ST_Intersects(
    polygon,
    ST_Envelope('SRID=4326;LINESTRING(0 0, 180 180)'::geometry))
GROUP BY polygon;

How does it work?

  1. ST_Envelope (http://postgis.net/docs/ST_Envelope.html) function returns a geometry object which represents a boundary box for supplied geometry. String 'SRID=4326;LINESTRING(0 0, 180 180)' represents a diagonal line for active viewport. Operator ::geometry converts the string to geometry object.
  2. ST_Intersects (http://postgis.net/docs/ST_Intersects.html) returns TRUE if two specified geometries intersect in 2D.
  3. Function count(*) with GROUP_BY clause (http://www.postgresqltutorial.com/postgresql-count-function/) uses for a group the result on a polygon and calculate the count of objects for each of ones.

Result will be looked like that:

count polygon
14
"0103000020E61000000100000005000000000000000000000000000000000000C
000000000000000000000000000000040000000000000144000000000000000400
00000000000144000000000000000C0000000000000000000000000000000C0"
1
"0103000020E610000001000000050000000000000000002440000000000000494
000000000000024400000000000004B400000000000002E400000000000004B400
000000000002E40000000000000494000000000000024400000000000004940"
1
"0103000020E610000001000000050000000000000000003940000000000000474
0000000000000394000000000000049400000000000003E4000000000000049400
000000000003E40000000000000474000000000000039400000000000004740"
4
"0103000020E610000001000000050000000000000000003E40000000000000474
00000000000003E400000000000004940000000000080414000000000000049400
00000000080414000000000000047400000000000003E400000000000004740"
1
"0103000020E610000001000000050000000000000000804140000000000000434
000000000008041400000000000004540000000000000444000000000000045400
000000000004440000000000000434000000000008041400000000000004340"

The second column is an encoded geometry object in GEOS (https://trac.osgeo.org/geos/) format. For using it on the frontend part it should be converted to GeoJSON (http://geojson.org/). The sample code:

from django.contrib.gis.geos import GEOSGeometry
geos = GEOSGeometry(GEOS_data)
geojson = geos.geojson

Very easy isn’t it?

GEOSGeoJSON

"0103000020E61000000100000005
0000000000000000003E40000000000000474
00000000000003E4000000000000049400000
0000008041400000000000004940000000000
080414000000000000047400000000000003E
400000000000004740"

{
  "coordinates": [[
    [30.0, 46.0], [30.0, 50.0],
    [35.0, 50.0], [35.0, 46.0],
    [30.0, 46.0]]],
  "type": "Polygon"
}

The next screenshots show how clustering works on the frontend side with different zooms:

Small zoom, green blocks

Small zoom, green blocks

Zoom in a bit

Zoom in a bit

Zoom in a bit more

Zoom in a bit more

When user zoom map in more than the specified value we should change behavior. We must use object grouping instead of clustering. The first thing which we could do to query the list of equipment inside the current viewport. For this purposes, we can use simple query via Django ORM. First of all, specify the viewport (sample data):

_wkt = 'SRID=4326;POLYGON((33.84875297546387 49.35249363322452, 33.84875297546387 49.79774507314602, 35.01330375671387 49.79774507314602, 35.01330375671387 49.35249363322452, 33.84875297546387 49.35249363322452))'

Than get the list of objects:

from django.contrib.gis.geos import fromstr
_viewport = fromstr(_wkt)
queryset = Equipment.objects.filter(visible=True, blocked=False)
queryset = queryset.filter(
    coordinates__intersects=_viewport,
).values('latitude', 'longitude', 'id', 'name')
_result = []
for e in queryset:
    _result.append(e)

The result of that query is a list of dictionaries with ids, names, and coordinates:

[
    {'name': 'UT8879', 'longitude': 34.551417, 'latitude': 49.588267, 'id': 28},
    {'name': 'LTM 1040-2.1', 'longitude': 34.560858, 'latitude': 49.583148, 'id': 23}
]

The next challenge is to group equipment which is close to each other on the map. However, what means "close"? After the set of experiments, we found that when a distance between markers’ key points is less than 90 pixels than markers became unusable. Of course, it depends on the marker image size and map size itself. However, in the database, we have coordinates of objects in degrees. A distance between two points describe in degrees (via latitude and longitude) can be calculated in kilometers via particular world model. We use an elementary model where Earth is a simple sphere:

def distance_between_points_in_km(lat1, lng1, lat2, lng2):
    EARTH_RADIUS = 6373.0
    lat1 = radians(lat1)
    lng1 = radians(lng1)
    lat2 = radians(lat2)
    lng2 = radians(lng2)

    dlng = lng2 - lng1
    dlat = lat2 - lat1
    a = (sin(dlat/2))**2 + cos(lat1) * cos(lat2) * (sin(dlng/2))**2
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    distance = EARTH_RADIUS * c
    return distance

Of course, the result is not very precise, but it is enough for our particular purpose (object grouping). It is not a good idea to use more complex world model here because the function must be fast.

OK. But how to convert pixels on the user’s monitor to the kilometers on the map? We found the next solution:

def gmap_pixels_to_kms(pixels, c_lat, zoom):
    scale = 156543.03392 * cos(c_lat * pi / 180) / (2**zoom)
    return pixels * scale / 1000

In that code, c_lat is a latitude of the map viewport center point. Then we need a special data structure which allows us to group the sets of objects. UnionFind is a proper case for it - https://www.ics.uci.edu/~eppstein/PADS/UnionFind.py.

Now we can write a code which calculates a distances between objects and build groups:

def groupTPL(TPL, distance=1):
    """`distance` in kilemetres
    """
    U = UnionFind()

    for (i, x) in enumerate(TPL):
        for j in range(i + 1, len(TPL)):
            y = TPL[j]
            if distance_between_points_in_km(
                    x['latitude'], x['longitude'],
                    y['latitude'], y['longitude']) <= distance:
                U.union(x['id'], y['id'])

    disjSets = {}
    for x in TPL:
        s = disjSets.get(U[x['id']], set())
        s.add(x['id'])
        disjSets[U[x['id']]] = s

    return [list(x) for x in disjSets.values()]

Applying this function to the results which we got earlier could be looked like that:

_distance = gmap_pixels_to_kms(
        pixels=90, c_lat=float(c_lat), zoom=int(zoom))
    _group_by_distance = groupTPL(_result, distance=_distance)

After that variable _group_by_distance is containing something like that:

[[28, 23]]

As you see, here ids 28 and 23 are combined as a sublist inside a larger list. When a user zooms in more and more a moment when the group is breaking into two separate groups is happening:

[[28], [23]]

That is the main algorithm. After few improvements, we can do the next things on the client.

Equipment and group

Equipment and group

When client’s code receives a bunch of data from the server it renders it on the map with different types of markers. Everything is easy with equipment markers. Rendering groups is a bit more complicated. We use a set of images for different equipment count ranges and render the particular image when an actual number of equipment objects is in a particular range.

Map markers with counters

Map markers with counters

Adding markers to map is well described in the official documentation (https://developers.google.com/maps/documentation/javascript/markers#add).

Also, we use different behavior for click on the equipment or group marker.

Preview cluser
Preview equipment

As you see, implementation of clustering and grouping is a bit tricky but not very hard with PostgreSQL/PostGIS and Python/Django. You can use the same scenario could for a wide range of services where the rendering of many objects on the map is required.