Blog

Objects clustering and grouping with Django, PostGIS and Google Maps

- Wednesday, February 24, 2016

Object clustering is a common practice when you need to show a lot of objects on the map. When we began to develop crane-locator.com project one of the main features which should be implemented first was a world map with equipment on it. The original idea was inspired by marinetraffic.com where we see a world map with clusters where we see digits which show how many ships are in the particular cluster.

When the map is zooming in then we see ships as separate objects.

Our situation was a bit different. We had a different heavy-lifting equipment instead of ships, like cranes, low loaders, push-pull trucks, barges etc. But instead of marine traffic we had another challenge: very often equipment owners use the same GPS coordinates for a set of equipment. For example, 10 different trucks parked in the same parking can have identical coordinates. Sometimes they could have different coordinates but very close to each other. For such situations, we must use object grouping on the map.

In our case, we use Google maps. The range of zooms could be from 1 to 15. And first thing which we should do to describe a cluster grid for few zoom ranges. For this purposes we use a very simple model which describes a cluster on the map for particular range of zooms:

class GridCluster(models.Model):
    zoom_range = IntegerRangeField(_('Zoom range'), default='(1,4)')
    polygon = models.PolygonField(_('Polygon'))
    objects = models.GeoManager()

We use the next config for describing cluster sizes:

SRID = 4326
# Sizes of the clusters for different ranges of zoom
GRID_CLUSTER_STEPS = [
    {
        'zoom': '[1,3]',
        'height': 4,  # in degrees
        'width': 5,  # in degrees
    },
    {
        'zoom': '(3,5]',
        'height': 2.5,
        'width': 3,
    },
    {
        'zoom': '(5,8]',
        'height': 1.5,
        'width': 2,
    }
]
GRID_GAP = 1e-20

Here we specified different cluster sizes for different ranges of zoom. In the range between 1 and 3, each cluster is a rectangle with sizes 4x5 degrees. For larger zoom ranges we must decrease cluster sizes. After few experiments, it is not so hard to set particular sizes. Because we use degrees we need to specify particular SRID (https://en.wikipedia.org/wiki/SRID) for choosing a world model because there are many different algorithms to convert degrees to kilometers. In our case, we use SRID 4326 which means WGS84 (https://en.wikipedia.org/wiki/World_Geodetic_System#WGS84). Cluster height is very important in our case because due the specific of world model clusters will be stretched vertically as more as closer to poles. And it is a problem when you build a grid. But for now, we don’t think much about it because there is no equipment in our database close to poles. Yet another thing which we use it is a very small gap (1e-20 degrees) between clusters for preventing the situation when vertical cluster edge is a part of both neighbour clusters simultaneously.

For build the grids we use the next code:

for step in GRID_CLUSTER_STEPS:
    grids = []
    i = 0
    for y in xfrange(-90, 90+1-step['height'], step['height']):
        for x in xfrange(-180, 180+1-step['width'], step['width']):
            wkt_str = \
                'SRID={srid};POLYGON(({x1} {y1},{x2} {y2},' \
                '{x3} {y3},{x4} {y4},{x1} {y1}))'.format(
                srid=SRID,
                x1=x,
                y1=y,
                x2=x,
                y2=y + step['height'] - GRID_GAP,
                x3=x + step['width'] - GRID_GAP,
                y3=y + step['height'] - GRID_GAP,
                x4=x + step['width'] - GRID_GAP,
                y4=y,
                )
            wkt = fromstr(wkt_str)

            i += 1
            grids.append(GridCluster(
                zoom_range=step['zoom'],
                polygon=wkt,
            ))
    GridCluster.objects.bulk_create(grids)

Here we use custom iterator xfrange because we need float type for step argument:

def xfrange(start, stop, step):
    while start < stop:
        yield start
        start += step

Data of the each equipment on the map is also stored in database. Here is a simplified code for Equipment model (a real code looks a bit more complex):

class Equipment(models.Model):
    name = models.CharField(_('Model name'), max_length=300)
    visible = models.BooleanField(_('Visible'), default=True, blank=True)
    blocked = models.BooleanField(_('Blocked'), default=False, blank=True)

    coordinates = models.PointField(
        _('Coordinates'), default=GEOSGeometry('POINT(0 0)'))
    objects = models.GeoManager()

Look at the field coordinates. Here we use a special geometry type in the same manner like at GridCluster name where we use a special field for storing a polygon coordinates. Now PostGIS gives us the ability to use special operations in queries for implementing a clustering. A sample query to get a list of clusters with equipment counters could be looked like that:

SELECT count(*) AS count, polygon
FROM "equipment_equipment", "equipment_gridcluster"
WHERE coordinates IS NOT NULL
AND equipment_equipment.visible = TRUE
AND equipment_equipment.blocked = FALSE
AND equipment_gridcluster.zoom_range @> 3

AND ST_Intersects(coordinates, polygon)
AND ST_Intersects(
    polygon,
    ST_Envelope('SRID=4326;LINESTRING(0 0, 180 180)'::geometry))
GROUP BY polygon;

How does it work?

  1. ST_Envelope (http://postgis.net/docs/ST_Envelope.html) function returns a geometry object which represents a boundary box for supplied geometry. String 'SRID=4326;LINESTRING(0 0, 180 180)' represents a diagonal line for active viewport. Operator ::geometry converts the string to geometry object.
  2. ST_Intersects (http://postgis.net/docs/ST_Intersects.html) returns TRUE if two specified geometries intersect in 2D.
  3. Function count(*) with GROUP_BY clause (http://www.postgresqltutorial.com/postgresql-count-function/) uses for a group the result on a polygon and calculate the count of objects for each of ones.

Result will be looked like that:

count polygon
14
"0103000020E61000000100000005000000000000000000000000000000000000C
000000000000000000000000000000040000000000000144000000000000000400
00000000000144000000000000000C0000000000000000000000000000000C0"
1
"0103000020E610000001000000050000000000000000002440000000000000494
000000000000024400000000000004B400000000000002E400000000000004B400
000000000002E40000000000000494000000000000024400000000000004940"
1
"0103000020E610000001000000050000000000000000003940000000000000474
0000000000000394000000000000049400000000000003E4000000000000049400
000000000003E40000000000000474000000000000039400000000000004740"
4
"0103000020E610000001000000050000000000000000003E40000000000000474
00000000000003E400000000000004940000000000080414000000000000049400
00000000080414000000000000047400000000000003E400000000000004740"
1
"0103000020E610000001000000050000000000000000804140000000000000434
000000000008041400000000000004540000000000000444000000000000045400
000000000004440000000000000434000000000008041400000000000004340"

The second column is an encoded geometry object in GEOS (https://trac.osgeo.org/geos/) format. For using it on the frontend part it should be converted to GeoJSON (http://geojson.org/). The sample code:

from django.contrib.gis.geos import GEOSGeometry
geos = GEOSGeometry(GEOS_data)
geojson = geos.geojson

Very easy isn’t it?

GEOS GeoJSON
"0103000020E61000000100000005
0000000000000000003E40000000000000474
00000000000003E4000000000000049400000
0000008041400000000000004940000000000
080414000000000000047400000000000003E
400000000000004740"
{
  "coordinates": [[
    [30.0, 46.0], [30.0, 50.0], 
    [35.0, 50.0], [35.0, 46.0], 
    [30.0, 46.0]]],
  "type": "Polygon"
}

The next screenshots show how clustering works on the frontend side with different zooms:

When user zoom map in more than specified value we should change a behaviour. We must use object grouping instead a clustering. The first thing which we could do to query the list of equipment inside the current viewport. For this purposes, we can use very simple query via Django ORM. First of all, specify the viewport (sample data):

_wkt = 'SRID=4326;POLYGON((33.84875297546387 49.35249363322452, 33.84875297546387 49.79774507314602, 35.01330375671387 49.79774507314602, 35.01330375671387 49.35249363322452, 33.84875297546387 49.35249363322452))'

Than get the list of objects:

from django.contrib.gis.geos import fromstr
_viewport = fromstr(_wkt)
queryset = Equipment.objects.filter(visible=True, blocked=False)
queryset = queryset.filter(
    coordinates__intersects=_viewport,
).values('latitude', 'longitude', 'id', 'name')
_result = []
for e in queryset:
    _result.append(e)

Result will be looked as a list of dicts with ids and coordinates:

[
    {'name': 'UT8879', 'longitude': 34.551417, 'latitude': 49.588267, 'id': 28},
    {'name': 'LTM 1040-2.1', 'longitude': 34.560858, 'latitude': 49.583148, 'id': 23}
]

The next challenge it is to group equipment which a close to each other on the map. But what means “close”? After the set of experiments, we found that when a distance between markers’ key points is less than 90 pixels than markers became unusable. Of course, it depends on the marker image size, map size etc. But in the database, we have coordinates of objects in degrees. A distance between two points describe in degrees (via latitude and longitude) can be calculated in kilometers via particular world model. We use very simple model where Earth is a simple sphere:

def distance_between_points_in_km(lat1, lng1, lat2, lng2):
    EARTH_RADIUS = 6373.0
    lat1 = radians(lat1)
    lng1 = radians(lng1)
    lat2 = radians(lat2)
    lng2 = radians(lng2)

    dlng = lng2 - lng1
    dlat = lat2 - lat1
    a = (sin(dlat/2))**2 + cos(lat1) * cos(lat2) * (sin(dlng/2))**2
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    distance = EARTH_RADIUS * c
    return distance

Of course, the result will not be very precise but our particular purpose (object grouping) it is enough for. It is not a good idea to use more complex world model here because the function must be fast.

OK. But how to convert pixels on the user’s monitor to the kilometers on the map? We found the next solution:

def gmap_pixels_to_kms(pixels, c_lat, zoom):
    scale = 156543.03392 * cos(c_lat * pi / 180) / (2**zoom)
    return pixels * scale / 1000

Where c_lat is a latitude of the map viewport center point. Then we need a special structure will which allow us to group the sets of objects. UnionFind is a proper case fo it - https://www.ics.uci.edu/~eppstein/PADS/UnionFind.py

Now we can write a code which calculates a distances between objects and build groups:

def groupTPL(TPL, distance=1):
    """`distance` in kilemetres
    """
    U = UnionFind()

    for (i, x) in enumerate(TPL):
        for j in range(i + 1, len(TPL)):
            y = TPL[j]
            if distance_between_points_in_km(
                    x['latitude'], x['longitude'],
                    y['latitude'], y['longitude']) <= distance:
                U.union(x['id'], y['id'])

    disjSets = {}
    for x in TPL:
        s = disjSets.get(U[x['id']], set())
        s.add(x['id'])
        disjSets[U[x['id']]] = s

    return [list(x) for x in disjSets.values()]

Applying this function to the results which we got earlier could be looked like that:

    _distance = gmap_pixels_to_kms(
        pixels=90, c_lat=float(c_lat), zoom=int(zoom))
    _group_by_distance = groupTPL(_result, distance=_distance)

After that variable “_group_by_distance” will contain something like that:

[[28, 23]]

As you see, here ids 28 and 23 are combined as a sublist in a larger list. When user will zoom in more and more a moment when group will break into two separate groups will happen:

[[28], [23]]

This is the main algorithm. After few improvements, we can do the next things on the client.

When client’s code receives a bunch of data from the server it renders it on the map with different types of markers. Everything is easy with equipment markers. Rendering groups is a bit more complex. We use a set of images for different equipment count ranges and render the particular image when an actual number of equipment objects is in a particular range.

Adding markers to map is a trivial operation which is well described in the official documentation (https://developers.google.com/maps/documentation/javascript/markers#add).

Also, we use different behaviour for click on the equipment or group marker.

As you see, implementation of clustering and grouping is not very hard with PostgreSQL/PostGIS and Python/Django. The same scenario could be used for a wide range of services where the rendering of many objects on the map is required.

Oleh Korkh

Get A Free Quote

You will receive quarterly promotions and news. Unsubscribe with one click.

Drop us a line