Data Wrangling with MongoDB – OpenStreetMap Data Case Study

Summary

Map Area: Waterloo, Ontario, Canada
Problems Encountered in the Map:
After initially downloading the Waterloo area and running it against the script below, I noticed the following problems with the data:
• Inconsistent methods of describing street direction (e.g. “N” or “North”)
• Inconsistent street types (e.g. St. instead of Street, Dr instead of Drive)
• Incorrect street types (e.g. AVenue instead of Avenue, Cresent instead of Crescent)
• Inconsistent method of displaying the Postal Code (XXXXXX instead of XXX XXX)
• Inconsistent Methods of Describing Street Direction
Some of the street names end with a direction (North, East, South, West). Some of these directions however are entered in as abbreviations (N, E, S, W) as opposed to the full words. In order to keep the direction method consistent, the abbreviation was replaced with the full word. Inconsistent Street Types
Some of the street types were abbreviated as St. or Dr instead of Street or Drive, respectively. In order to ensure consistency, all the abbreviations were converted to full street types.
Incorrect Street Types
Some of the street types were spelt incorrectly such as AVenue and Cresent instead of Avenue and Crescent, respectively. These incorrect words were detected and replaced with the correct spelling.

Inconsistent Method if Displaying Postal Code
In Canada, postal codes are 6 characters long with alternating letters and numbers, with an optional space between the first and last 3 characters (i.e. LNL NLN where L = Letter and N = Number). Some of the postal codes in the dataset have a space between the first and last 3 characters and some do not (e.g. N2K4N2 or N2K 4N2).

Data Overview

Below are some of the basic statistics about the dataset, including the MongoDB queries used to produce the statistics.

File Size

waterloo-region_canada.osm – 155MB
waterloo-region_canada.json – 211MB

Number of Documents

db.waterloo.find().count()
→ 763051

Number of Nodes

db.waterloo.find({"type" : "node"}).count()
→ 670362

Number of Ways

db.waterloo.find({"type" : "way"}).count()
→ 92689

Number of Unique Users

db.waterloo.aggregate([{ "$group": { "_id": "$created.user"}}, {"$group": {"_id": 1, "count": { "$sum": 1}}}])
→ 503

As can be seen, some of the city names are duplicated with different names, such as Waterloo and City of Waterloo. One way to fix this issue would be to either include the type of city (e.g. City of, Township, etc.) for all cities or not include it for any of the cities. The issue with this approach is that the title is part of the official name for some of the cities and not for others, so it would inappropriate to force the names to be consistent in this case. The benefit of sticking to one method is that there will not be any duplicates, but forcing consistency may result in erroneous city names.

Conclusion

Although some of the inconsistent and incorrect data were fixed, there still remain some inconsistencies and errors. The street directions and types were mostly corrected for whereas the postal codes and city names remain inconsistent. Altering the postal codes to be more consistent is fairly straightforward whereas correcting the city names would pose a challenge.

The script below wrangles the data and transforms the OpenStreetMap data into dictionaries that look like this:
{ "id": "2406124091", "type: "node", "visible":"true", "created": { "version":"2", "changeset":"17206049", "timestamp":"2013-08-03T16:43:42Z", "user":"linuxUser16", "uid":"1219059" }, "pos": [41.9757030, -87.6921867], "address": { "housenumber": "5157", "postcode": "60625", "street": "North Lincoln Ave" }, "amenity": "restaurant", "cuisine": "mexican", "name": "La Cabana De Don Luis", "phone": "1 (773)-271-5176" }

  • Only 2 types of top level tags are procssed: "node" and "way"
  • All attributes of "node" and "way" are turned into regular key/value pairs, except:
    • attributes in the CREATED array should be added under a key "created"
    • attributes for latitude and longitude should be added to a "pos" array, for use in geospacial indexing. The values inside "pos" array are floats and not strings.
  • if the second level tag "k" value contains problematic characters, it is ignored
  • if the second level tag "k" value starts with "addr:", it is added to a dictionary "address"
  • if the second level tag "k" value does not start with "addr:", but contains ":", it is processed as a regular tag
  • if there is a second ":" that separates the type/direction of a street, the tag is ignored
# Import libraries
import xml.etree.cElementTree as ET
import pprint
import re
import codecs
import json

lower = re.compile(r'^([a-z]|_)*$')
lower_colon = re.compile(r'^([a-z]|_)*:([a-z]|_)*$')
problemchars = re.compile(r'[=\+/&<>;\'"\?%#$@\,\. \t\r\n]')
street_type_re = re.compile(r'\b\S+\.?$', re.IGNORECASE)

CREATED = [ "version", "changeset", "timestamp", "user", "uid"]

# List of expected street types
expected = ["Street", "Avenue", "Boulevard", "Drive", "Court", "Place", "Square", "Lane", "Road", 
            "Trail", "Parkway", "Commons","1","10","11","116","12","124","13","14","15","154","16",
           "17","18","2","20","21","22","24","3","30","32","34","4","45","5","51","6","7","8","86",
           "9","97","Baseline","Boardwalk", "Circle", "Close","Cove", "Crescent", "Crestway", "Cross",
           "Crossing","East","West","South","North","Estates","Gardens","Gate", "Grove", "Heights", "Highway",
           "Hill", "Hollow", "Line", "Walk", "Way"]

# Map of incorrect to correct street types
mapping = { "St": "Street",
            "St.": "Street",
            "Ave" : "Avenue",
            "Rd." : "Road",
            "Rd" : "Road",
            "AVenue" : "Avenue",
            "Cresent" : "Crescent",
            "Dr" : "Drive",
            "Dr." : "Drive",
            "N" : "North",
            "E" : "East",
            "S" : "South",
            "W" : "West",
            "road" : "Road",
            "g" : "",
            "canadatrust.com" : "",
            "45th" : "45"
            }

# Function to update the street types based on the mapping as defined above
def update_name(name, mapping):
    m = street_type_re.search(name)
    if m not in expected:
        if m.group() in mapping.keys():
            name = re.sub(m.group(), mapping[m.group()], name)
    return name

# Function to update the problematic nodes
def update(node, value, tag):
    key = value
    value = tag.attrib['v']
    if key.startswith('addr:'):
        if key.count(':') == 1:
            if 'address' not in node:
                node['address'] = {}
            if key == 'addr:street':
                value = update_name(value, mapping)
            if key == 'addr:postcode':
                if not re.match(r'^[A-Z]\d[A-Z] \d[A-Z]\d$', value):
                    value = value[0:3] + ' ' + value[3:6]
            node['address'][key[5:]] = value
    return node


# Function to convert XML input to JSON output
def shape_element(element):
    node = {}
    if element.tag == "node" or element.tag == "way" :
        node['type'] = element.tag
        node['created'] = {}
        if "lat" in element.keys() and "lon" in element.keys():
            pos = [float(element.attrib["lat"]), float(element.attrib["lon"])]
            node["pos"] = pos
        for tag in element.iter():
            for key, value in tag.items():
                if key in CREATED:
                    node['created'][key] = value
                elif key == 'k' and not re.search(problemchars, value):
                    node = update(node, value, tag)
                elif key == 'ref':
                    if 'node_refs' not in node:
                        node['node_refs'] = []
                    node['node_refs'].append(value)
                elif key not in ['v', 'lat', 'lon']:
                    node[key] = value
        element.clear()
        return node
    else:
        return None

# Function that wrangles the data
def process_map(file_in, pretty = False):
    file_out = "{0}.json".format(file_in)
    data = []
    with codecs.open(file_out, "w") as fo:
        for _, element in ET.iterparse(file_in):
            el = shape_element(element)
            if el:
                data.append(el)
                if pretty:
                    fo.write(json.dumps(el, indent=2)+"\n")
                else:
                    fo.write(json.dumps(el) + "\n")
    return data

def test():
    data = process_map('waterloo-region_canada.osm', True)
    pprint.pprint(data[-1])

if __name__ == "__main__":
    test()
{'created': {'changeset': '38463820',
             'timestamp': '2016-04-11T01:35:40Z',
             'uid': '951370',
             'user': 'fbax',
             'version': '1'},
 'id': '409382133',
 'node_refs': ['4113058876', '4113058877', '4113058878'],
 'type': 'way'}

This script queries the database

# Import libraries
import json
import pprint
from pymongo import MongoClient

# Initialize MongoDB client
client = MongoClient("mongodb://localhost:27017")
db = client.examples

# Print Postal Codes ordered by descending count
result =  db.waterloo.aggregate([{"$match" : {"address.postcode" : {"$exists" : 1}}},
                        {"$group" : {"_id" : "$address.postcode",
                        "count" : {"$sum" : 1}}},
                        {"$sort" : {"count" : -1}}])

# Print postal codes with count
print('Postal Codes with count:')
for item in result:
    pprint.pprint(item)


# Number of documents
documents = db.waterloo.find().count()
print('Number of documents:')
pprint.pprint(documents)

# Number of nodes
nodes = db.waterloo.find({"type" : "node"}).count()
print('Number of nodes:')
pprint.pprint(nodes)

# Number of ways
ways = db.waterloo.find({"type" : "way"}).count()
print('Number of ways:')
pprint.pprint(ways)

# Number of unique users
unique_users =  db.waterloo.aggregate([{ "$group": { "_id": "$created.user"}},
    {"$group": {"_id": 1, "count": { "$sum": 1}}}])
print('Number of unique users:')
for item in unique_users:
    pprint.pprint(item)
Postal Codes with count:
{'_id': 'N2K 4L7', 'count': 105}
{'_id': 'N2K 4N3', 'count': 99}
{'_id': 'N2K 4N2', 'count': 75}
{'_id': 'N2B 1A4', 'count': 63}
{'_id': 'N2M 3V1', 'count': 57}
{'_id': 'N1G 4C9', 'count': 45}
{'_id': 'N2T 0A6', 'count': 42}
{'_id': 'N2J 1K7', 'count': 39}
{'_id': 'N2E 0A3', 'count': 36}
{'_id': 'N3H 4R8', 'count': 34}
{'_id': 'N2K 4M3', 'count': 33}
{'_id': 'N2B 1G9', 'count': 30}
{'_id': 'N2B 1C2', 'count': 27}
{'_id': 'N2B 1G8', 'count': 27}
{'_id': 'N3H 4Z7', 'count': 24}
{'_id': 'N2N 0B1', 'count': 24}
{'_id': 'N2T 2Z7', 'count': 24}
{'_id': 'N2B 1C1', 'count': 24}
{'_id': 'N2K 4N6', 'count': 21}
{'_id': 'N1H 4G3', 'count': 21}
{'_id': 'N2V 0A2', 'count': 21}
{'_id': 'N2B 1H7', 'count': 18}
{'_id': 'N2E0A3', 'count': 18}
{'_id': 'N2B 3H2', 'count': 18}
{'_id': 'N3H4R8', 'count': 17}
{'_id': 'N2K 4L9', 'count': 15}
{'_id': 'N1H 4G1', 'count': 15}
{'_id': 'N3H 2W7', 'count': 15}
{'_id': 'N1H 4G2', 'count': 15}
{'_id': 'N2L 5Z4', 'count': 14}
{'_id': 'N1S 1V6', 'count': 12}
{'_id': 'N2B 3N1', 'count': 12}
{'_id': 'N2L 5S7', 'count': 12}
{'_id': 'N2N 3P8', 'count': 12}
{'_id': 'N2B 3H3', 'count': 12}
{'_id': 'N0B 1B0', 'count': 12}
{'_id': 'N2R 0A4', 'count': 12}
{'_id': 'N0B 1N0', 'count': 12}
{'_id': 'N3C 1B7', 'count': 12}
{'_id': 'N2B 1G6', 'count': 12}
{'_id': 'N2B 1H9', 'count': 12}
{'_id': 'N2B 1A2', 'count': 9}
{'_id': 'N2P 2E7', 'count': 9}
{'_id': 'N0B 2R0', 'count': 9}
{'_id': 'N1H 3V3', 'count': 9}
{'_id': 'N1H 3V5', 'count': 9}
{'_id': 'N2G 1A3', 'count': 9}
{'_id': 'N2B 3H1', 'count': 9}
{'_id': 'N2J 3H6', 'count': 9}
{'_id': 'N2G 1H6', 'count': 9}
{'_id': 'N2L 3G5', 'count': 8}
{'_id': 'N2L5Z4', 'count': 7}
{'_id': 'N2C 1K3', 'count': 7}
{'_id': 'N2B 1A5', 'count': 6}
{'_id': 'N2B 1E9', 'count': 6}
{'_id': 'N2K 1W9', 'count': 6}
{'_id': 'N2B 3G4', 'count': 6}
{'_id': 'N1R 7H2', 'count': 6}
{'_id': 'N2M 5C6', 'count': 6}
{'_id': 'N2L 2X2', 'count': 6}
{'_id': 'N2L 3G1', 'count': 6}
{'_id': 'N2G 4X6', 'count': 6}
{'_id': 'N2J 3J3', 'count': 6}
{'_id': 'N2J 1R1', 'count': 6}
{'_id': 'N1R 8P1', 'count': 6}
{'_id': 'N2M 1A1', 'count': 6}
{'_id': 'N2L 2Y5', 'count': 6}
{'_id': 'N1T 1K7', 'count': 6}
{'_id': 'N2L 3V4', 'count': 6}
{'_id': 'N0B 1E0', 'count': 6}
{'_id': 'N2L 3E9', 'count': 6}
{'_id': 'N1G 0C5', 'count': 6}
{'_id': 'N2L 6R5', 'count': 6}
{'_id': 'N2R0A4', 'count': 6}
{'_id': 'N2J 3Z3', 'count': 6}
{'_id': 'N2K 1J9', 'count': 6}
{'_id': 'N2G 1H2', 'count': 6}
{'_id': 'N2J 2G9', 'count': 6}
{'_id': 'N3A 1E7', 'count': 6}
{'_id': 'N2G 1X4', 'count': 6}
{'_id': 'N2E 1B6', 'count': 6}
{'_id': 'N2G 4Z2', 'count': 6}
{'_id': 'N2K 2E1', 'count': 6}
{'_id': 'N0B 1S0', 'count': 6}
{'_id': 'N1G 4Z1', 'count': 6}
{'_id': 'N3A 1K1', 'count': 6}
{'_id': 'N1L 0A6', 'count': 6}
{'_id': 'N2L 6C2', 'count': 6}
{'_id': 'N2L 3L2', 'count': 6}
{'_id': 'N2A 0E4', 'count': 6}
{'_id': 'N1M 3G4', 'count': 6}
{'_id': 'N2C 1X1', 'count': 6}
{'_id': 'N2L 3V3', 'count': 6}
{'_id': 'N2J 1P2', 'count': 6}
{'_id': 'N2B 1B7', 'count': 6}
{'_id': 'N1E 5R3', 'count': 6}
{'_id': 'N2N 2Y2', 'count': 6}
{'_id': 'N2G 1H3', 'count': 5}
{'_id': 'N2J 3H8', 'count': 5}
{'_id': 'N2J 3H4', 'count': 4}
{'_id': 'N0B 1K0', 'count': 4}
{'_id': 'N1H 6J2', 'count': 4}
{'_id': 'N2L 6A1', 'count': 4}
{'_id': 'N2M 3W1', 'count': 4}
{'_id': 'N1G 4G3', 'count': 3}
{'_id': 'N1H 4C6', 'count': 3}
{'_id': 'N1K 1S5', 'count': 3}
{'_id': 'N1G 4S2', 'count': 3}
{'_id': 'N2T 0C1', 'count': 3}
{'_id': 'N1L 1G9', 'count': 3}
{'_id': 'N2A 4H4', 'count': 3}
{'_id': 'N1E 5H5', 'count': 3}
{'_id': 'N1H 5H7', 'count': 3}
{'_id': 'N2K 0A8', 'count': 3}
{'_id': 'N2K 0A7', 'count': 3}
{'_id': 'N2K 0A2', 'count': 3}
{'_id': 'N1H 6J4', 'count': 3}
{'_id': 'N2L 3S2', 'count': 3}
{'_id': 'N2J 1P8', 'count': 3}
{'_id': 'N2H 2H1', 'count': 3}
{'_id': 'N2E 3W7', 'count': 3}
{'_id': 'N2L 3L1', 'count': 3}
{'_id': 'N2H 4V2', 'count': 3}
{'_id': 'N1R 7W6', 'count': 3}
{'_id': 'N2L 5W6', 'count': 3}
{'_id': 'N2N 3J2', 'count': 3}
{'_id': 'N2J 3Z4', 'count': 3}
{'_id': 'N2L 6G6', 'count': 3}
{'_id': 'N1E 2Y9', 'count': 3}
{'_id': 'N2J 3Z6', 'count': 3}
{'_id': 'N2J 4Z2', 'count': 3}
{'_id': 'N2L 6P5', 'count': 3}
{'_id': 'N2V 2J8', 'count': 3}
{'_id': 'N2J 2Z6', 'count': 3}
{'_id': 'N1H 3Y8', 'count': 3}
{'_id': 'N2L 5Z3', 'count': 3}
{'_id': 'N1H 3V7', 'count': 3}
{'_id': 'N2G 1C3', 'count': 3}
{'_id': 'N1L 1C8', 'count': 3}
{'_id': 'N1G 4W2', 'count': 3}
{'_id': 'N2V 1V8', 'count': 3}
{'_id': 'N1G 2X6', 'count': 3}
{'_id': 'N2L 6R2', 'count': 3}
{'_id': 'N2L 3E6', 'count': 3}
{'_id': 'N2L 3E7', 'count': 3}
{'_id': 'N1G 4N4', 'count': 3}
{'_id': 'N2L 1W4', 'count': 3}
{'_id': 'N2L 1W3', 'count': 3}
{'_id': 'N2G 1W2', 'count': 3}
{'_id': 'N2G 2V3', 'count': 3}
{'_id': 'N1E 7C4', 'count': 3}
{'_id': 'N3B 2N0', 'count': 3}
{'_id': 'N3H 3M4', 'count': 3}
{'_id': 'N2G 1T6', 'count': 3}
{'_id': 'N3H 3M6', 'count': 3}
{'_id': 'N2M 2C6', 'count': 3}
{'_id': 'N1L 0J3', 'count': 3}
{'_id': 'N2N 2A8', 'count': 3}
{'_id': 'N2H 1Z4', 'count': 3}
{'_id': 'N2T 2Z4', 'count': 3}
{'_id': 'N2J 5A3', 'count': 3}
{'_id': 'N1G 4M4', 'count': 3}
{'_id': 'N1R 5S2', 'count': 3}
{'_id': 'N2H 2Z6', 'count': 3}
{'_id': 'N2L 3E8', 'count': 3}
{'_id': 'N2L 3L3', 'count': 3}
{'_id': 'N2L 5G7', 'count': 3}
{'_id': 'N2L 4S7', 'count': 3}
{'_id': 'N2H 1A3', 'count': 3}
{'_id': 'N1H 3X3', 'count': 3}
{'_id': 'N2G 1W5', 'count': 3}
{'_id': 'N1T 1Z9', 'count': 3}
{'_id': 'N1G 5L4', 'count': 3}
{'_id': 'N2V 2A9', 'count': 3}
{'_id': 'N2C 1P7', 'count': 3}
{'_id': 'N3B 2Y9', 'count': 3}
{'_id': 'N2L 3B9', 'count': 3}
{'_id': 'N2B 3C9', 'count': 3}
{'_id': 'N1R 8L4', 'count': 3}
{'_id': 'N1P 1C7', 'count': 3}
{'_id': 'N2J 2Y2', 'count': 3}
{'_id': 'N2G 3J5', 'count': 3}
{'_id': 'N1H 7R4', 'count': 3}
{'_id': 'N2K 0A9', 'count': 3}
{'_id': 'N2M 5N9', 'count': 3}
{'_id': 'N2L 1W7', 'count': 3}
{'_id': 'N1H 1A1', 'count': 3}
{'_id': 'N1T 1C0', 'count': 3}
{'_id': 'N0B 2M0', 'count': 3}
{'_id': 'N2V 2G8', 'count': 3}
{'_id': 'N2R 0A3', 'count': 3}
{'_id': 'N1H 2T3', 'count': 3}
{'_id': 'N2N 1C1', 'count': 3}
{'_id': 'N2B 1E3', 'count': 3}
{'_id': 'N3H 3G5', 'count': 3}
{'_id': 'N2K 4A2', 'count': 3}
{'_id': 'N2C 1X3', 'count': 3}
{'_id': 'N1R 6J8', 'count': 3}
{'_id': 'N3A 1E3', 'count': 3}
{'_id': 'N2M 1R9', 'count': 3}
{'_id': 'N1S 2H4', 'count': 3}
{'_id': 'N1T 2K8', 'count': 3}
{'_id': 'N2L 2N6', 'count': 3}
{'_id': 'N3A 1T6', 'count': 3}
{'_id': 'N2L 3T7', 'count': 3}
{'_id': 'N2J 2X4', 'count': 3}
{'_id': 'N2G 1X1', 'count': 3}
{'_id': 'N2L 6J3', 'count': 3}
{'_id': 'N1K 1X3', 'count': 3}
{'_id': 'N2G 2G4', 'count': 3}
{'_id': 'N2J 2N8', 'count': 3}
{'_id': 'N1G 4W3', 'count': 3}
{'_id': 'N1H 7Y5', 'count': 3}
{'_id': 'N1R 5G6', 'count': 3}
{'_id': 'N1S 1P3', 'count': 3}
{'_id': 'N1R 8R3', 'count': 3}
{'_id': 'N2G 2M6', 'count': 3}
{'_id': 'N3A 1R3', 'count': 3}
{'_id': 'N2J 4V2', 'count': 3}
{'_id': 'N2L 4J4', 'count': 3}
{'_id': 'N2M 5J9', 'count': 3}
{'_id': 'N2L 5G4', 'count': 3}
{'_id': 'N2M 3C5', 'count': 3}
{'_id': 'N2B 1E8', 'count': 3}
{'_id': 'N2H 6R4', 'count': 3}
{'_id': 'N2M 1Y4', 'count': 3}
{'_id': 'N2V 2R6', 'count': 3}
{'_id': 'N2J 4Z8', 'count': 3}
{'_id': 'N2G 4X7', 'count': 3}
{'_id': 'N2L 2J3', 'count': 3}
{'_id': 'N1R 3H3', 'count': 3}
{'_id': 'N2V 1B4', 'count': 3}
{'_id': 'N2L 1V4', 'count': 3}
{'_id': 'N2V 1C5', 'count': 3}
{'_id': 'N2M 3C2', 'count': 3}
{'_id': 'N1G 1X8', 'count': 3}
{'_id': 'N2H 3X7', 'count': 3}
{'_id': 'N1L 0Z6', 'count': 3}
{'_id': 'N2A 2Y2', 'count': 3}
{'_id': 'N2V 1C2', 'count': 3}
{'_id': 'N2L 0C6', 'count': 3}
{'_id': 'N2J 2Y9', 'count': 3}
{'_id': 'N1H 3W1', 'count': 3}
{'_id': 'N1G 3P4', 'count': 3}
{'_id': 'N3A 1R1', 'count': 3}
{'_id': 'N2E 2M6', 'count': 3}
{'_id': 'N2K 0B3', 'count': 3}
{'_id': 'N2P 2G2', 'count': 3}
{'_id': 'N2E 4K6', 'count': 3}
{'_id': 'N1L 1G1', 'count': 3}
{'_id': 'N1G 4W1', 'count': 3}
{'_id': 'N2G1H6', 'count': 3}
{'_id': 'N3A 1J3', 'count': 3}
{'_id': 'N2L 5E2', 'count': 3}
{'_id': 'N1H 3A4', 'count': 3}
{'_id': 'N1H 3W3', 'count': 3}
{'_id': 'N2J 1V5', 'count': 3}
{'_id': 'N2L 3V1', 'count': 3}
{'_id': 'N2K 3T6', 'count': 3}
{'_id': 'N2L 1T2', 'count': 3}
{'_id': 'N2T 2W1', 'count': 3}
{'_id': 'N1E 7H1', 'count': 3}
{'_id': 'N2L 6E3', 'count': 3}
{'_id': 'N2L 3Y9', 'count': 3}
{'_id': 'N2J 1N9', 'count': 3}
{'_id': 'N2L 2J2', 'count': 3}
{'_id': 'N2G1H2', 'count': 3}
{'_id': 'N2T 3A3', 'count': 3}
{'_id': 'N1H 3T9', 'count': 3}
{'_id': 'N1G 3W6', 'count': 3}
{'_id': 'N1G 3A2', 'count': 3}
{'_id': 'N2J 2X0', 'count': 3}
{'_id': 'N2G 1R9', 'count': 3}
{'_id': 'N2M 4W9', 'count': 3}
{'_id': 'N3H 0A8', 'count': 2}
{'_id': 'N2L 1H1', 'count': 2}
{'_id': 'N2L 6B5', 'count': 2}
{'_id': 'N2A 3G6', 'count': 2}
{'_id': 'N2L 5P6', 'count': 2}
{'_id': 'N2G 1M8', 'count': 2}
{'_id': 'N2C 2N9', 'count': 2}
{'_id': 'n2j 2h4', 'count': 2}
{'_id': 'N2G 1V6', 'count': 2}
{'_id': 'N2G 1X5', 'count': 2}
{'_id': 'N2L 4M3', 'count': 2}
{'_id': 'N2L 5J4', 'count': 2}
{'_id': 'N1R 3B2', 'count': 2}
{'_id': 'N2L 5V4', 'count': 2}
{'_id': 'N2J 2V9', 'count': 2}
{'_id': 'N2G 1A7', 'count': 2}
{'_id': 'N2k  3T', 'count': 2}
{'_id': 'N2G 3C5', 'count': 2}
{'_id': 'N2J 2W9', 'count': 2}
{'_id': 'N2C1K3', 'count': 2}
{'_id': 'N1S 1H3', 'count': 2}
{'_id': 'N2K  3v', 'count': 2}
{'_id': 'N2G 1H1', 'count': 2}
{'_id': 'N1H6J2', 'count': 2}
{'_id': 'N2M 3W4', 'count': 2}
{'_id': 'N2L 5Z7', 'count': 2}
{'_id': 'N2G 1B7', 'count': 2}
{'_id': 'N2H 6R2', 'count': 2}
{'_id': 'N0B1K0', 'count': 2}
{'_id': 'N2J 2X7', 'count': 2}
{'_id': 'N2R 1S1', 'count': 2}
{'_id': 'N2M3W1', 'count': 2}
{'_id': 'N2G 3N1', 'count': 2}
{'_id': 'N2M 3H5', 'count': 2}
{'_id': 'N2E 4H9', 'count': 2}
{'_id': 'N2C 1A4', 'count': 2}
{'_id': 'N2L6A1', 'count': 2}
{'_id': 'N2L 6C1', 'count': 2}
{'_id': 'N2M 3R3', 'count': 2}
{'_id': 'N1G 0A9', 'count': 2}
{'_id': 'N2G 1A1', 'count': 2}
{'_id': 'N2J3H4', 'count': 2}
{'_id': 'N3H0A8', 'count': 1}
{'_id': 'N2L1H1', 'count': 1}
{'_id': 'N2E4H9', 'count': 1}
{'_id': 'N2L6B5', 'count': 1}
{'_id': 'N2A3G6', 'count': 1}
{'_id': 'N2M3R3', 'count': 1}
{'_id': 'N2L5P6', 'count': 1}
{'_id': 'N2G1H3', 'count': 1}
{'_id': 'N2M3H5', 'count': 1}
{'_id': 'N1S1H3', 'count': 1}
{'_id': 'N2R1S1', 'count': 1}
{'_id': 'N2J3H8', 'count': 1}
{'_id': 'N2L5V4', 'count': 1}
{'_id': 'N2k 3T6', 'count': 1}
{'_id': 'N2H6R2', 'count': 1}
{'_id': 'N2G1X5', 'count': 1}
{'_id': 'N2G1A7', 'count': 1}
{'_id': 'N2G3N1', 'count': 1}
{'_id': 'N2M3W4', 'count': 1}
{'_id': 'N2C2N9', 'count': 1}
{'_id': 'N2L5Z7', 'count': 1}
{'_id': 'N2L4M3', 'count': 1}
{'_id': 'N2G1A1', 'count': 1}
{'_id': 'N2J2W9', 'count': 1}
{'_id': 'N2L5J4', 'count': 1}
{'_id': 'N2L3G5', 'count': 1}
{'_id': 'N2G3C5', 'count': 1}
{'_id': 'N2G1M8', 'count': 1}
{'_id': 'N1R3B2', 'count': 1}
{'_id': 'N2K 3v3', 'count': 1}
{'_id': 'N2L6C1', 'count': 1}
{'_id': 'N2G1V6', 'count': 1}
{'_id': 'N2C1A4', 'count': 1}
{'_id': 'N2G1H1', 'count': 1}
{'_id': 'N2G1B7', 'count': 1}
{'_id': 'N1G0A9', 'count': 1}
{'_id': 'N2J2X7', 'count': 1}
{'_id': 'n2j2h4', 'count': 1}
{'_id': 'N2J2V9', 'count': 1}
Number of documents:
2289153
Number of nodes:
2011086
Number of ways:
278067
Number of unique users:
{'_id': 1, 'count': 503}

social