How to Ethically Harvest Online Data with Python for ESRI Integration

GeoJSON data is available from ramm.com about our cities bridges. To make it available for our engineers it needs to loaded to a local database. There are lots of metrics stored there too which can change daily, so we need to do this every night. Let’s automate it with python!

Please note I have obtained permission from RAMM NZ to use their web services. You can do this by requesting an API key from their customer support.

JSON1.PNG

The Main Steps

To populate a database with web JSON data nightly we need to do a few things

  • Delete all features in the target dataset – we don’t want any duplicates
  • Request the data from the web by building urls
  • Read data response
  • Write the data

Simple right? Conceptually yes, the devil’s in the details. First we want to check that the URL’s content is valid. I’ve done this by hitting the url to check its content and then displaying it in QGIS.

JSON 3.PNG

To view a geojson file in QGIS follow these steps:

  • Copy and paste the url contents into a text file
  • Name is with a .json extension eg data.json
  • Drag the .json file into QGIS and make sure it displays
  • Id the feature to check its attributes are working correctly

JSON8.PNG

To read the GeoJSON from the web into a usable format we can use the python modules urllib and json. Using the command json.loads(response.read()) we can grab the data as a dictionary and then request elements within the json structure by name, like this feature[‘geometry’][‘type’] and feature[‘geometry’][‘coordinates’]. Receiving a response for each object looking something like this:

Polygon 
[[[1821228.2874631872, 5531032.432156478], [1821241.408314114, 5531039.600723921], [1821248.6960629625, 5531026.261731579], [1821235.5752120358, 5531019.093164136], [1821228.2874631872, 5531032.432156478]]]

To get a better understanding of the data structure we’re working with I like to copy and paste the JSON into a JSON to human readable converter which should look like this:JSON2.PNG

I’m working on an ESRI’s SDE database built on top on Microsoft SQL Server. In this case I make a connection to the database like this:

connection = "C:\\Users\\username\\AppData\\Roaming\\ESRI\\Desktop10.4\\ArcCatalog\\SQLGIS.sde\\"

Then build a variable connection to the specific data set, using the name of the connection name

BRIDGE_FC = "{0}{1}".format(connection, "SDE_DATA.SDEADMIN.RAMM_BRIDGE")

Build a list of the fields in the database we would like to update

BRIDGE_FIELDS = ["bridge_id", "bridge_type", "age", "length_m", "rail_to_rail", "bridge_name", "bridge_no", "SHAPE@", "sync_date"]

Use the arcpy.da.insertcursor to insert the data into the target data set by passing the list of features to be updated, and the values from the geojson dictionary object. Because of this we can call components of the bridges datasets by name. Here my function responsible for inserting geojson into the database:

def insert_ramm(fields, featList):
"""The function to write the data to SDE"""
cursor = arcpy.da.InsertCursor(BRIDGE_FC, fields)

for feat in featList:
#print feat['properties']['bridge_id']
cursor.insertRow((feat['properties']['bridge_id'],
feat['properties']['bridge_type'],
feat['properties']['age'],
feat['properties']['length_m'],
feat['properties']['rail_to_rail'],
feat['properties']['bridge_name'],
feat['properties']['bridge_no']))

#Delete cursor object
del cursor

This has updated 124 features, which is the total number in our GeoJSON web scrape, however there is no geometry. So all we have so far is a list of bridges in the database but we don’t know where they are spatially. The following code will return a Polygon object which can be added to the update to give geometry. At first this code block didn’t work until I realized you need to add the spatial reference when instantiating the Polygon object.

sr = arcpy.SpatialReference(2193)
cursor = arcpy.da.InsertCursor(BRIDGE_FC, fields)
for feat in featList:

#find the geometry type
if feat['geometry']['type'] == "Polygon":
print "Polygon"
array = arcpy.Array()
for coordinates in feat['geometry']['coordinates']:
#print coordinates
for vertex in range(len(coordinates)):
array.add(arcpy.Point(coordinates[vertex][0], coordinates[vertex][1]))
#print coordinates[vertex]

geom = arcpy.arcpy.Polygon(array, sr)

The final step is to automate automate this so that it runs every night. Fortunately there is a helpful blog post here from ESRI about automating tasks with task scheduler. If you’re working on a Linux system you can use a cron job for this.

JSON56.PNG

I like to run python scripts from a .bat file rather than specifying the actual .py file. That way we have some flexibility and can change the target script in the future without having to change the scheduled job.

Here is the complete program if you would like to try the same process, enjoy:


import os, sys
import arcpy
import urllib, json
import datetime

#SQLGIS CONNECTION
connection = "C:\\Users\\YOUR_ID\\AppData\\Roaming\\ESRI\\Desktop10.4\\ArcCatalog\\SQLGIS.sde\\"

#GLOBAL VARS
BRIDGE_URL = r"YOURURL"
BRIDGE_FC = "{0}{1}".format(connection, "SDE_DATA.SDEADMIN.RAMM_BRIDGE")
BRIDGE_FIELDS = ["bridge_id", "bridge_type", "age", "length_m", "rail_to_rail", "bridge_name", "bridge_no", "SHAPE@", "sync_date"]

#OTHER VARS
today = datetime.date.today()

def delete_all(fc):
arcpy.DeleteFeatures_management(fc)

def read_ramm(url):
""" The function to read url data and return as a dictionary"""
response = urllib.urlopen(url)

#writing the response to a dict
data = json.loads(response.read())
featList = data["features"]

"""for feat in featList:
#print i['type'] #not useful
#print i['id'] #not useful
print feat['geometry']['type']
print feat['geometry']['coordinates']
print feat['properties']['bridge_id']
print feat['properties']['bridge_id']
print feat['properties']['bridge_id']
print feat['properties']['bridge_id']
print feat['properties']['bridge_id']"""

return featList

def ramm_bridge(fields, featList):
"""The function to write the data to SDE"""
sr = arcpy.SpatialReference(2193)
cursor = arcpy.da.InsertCursor(BRIDGE_FC, fields)
for feat in featList:

#find the geometry type
if feat['geometry']['type'] == "Polygon":
print "Polygon"
array = arcpy.Array()
for coordinates in feat['geometry']['coordinates']:
#print coordinates
for vertex in range(len(coordinates)):
array.add(arcpy.Point(coordinates[vertex][0], coordinates[vertex][1]))
#print coordinates[vertex]

geom = arcpy.arcpy.Polygon(array, sr)

else:
print "failed validation"
break

#print feat['properties']['bridge_id']
cursor.insertRow((feat['properties']['bridge_id'],
feat['properties']['bridge_type'],
feat['properties']['age'],
feat['properties']['length_m'],
feat['properties']['rail_to_rail'],
feat['properties']['bridge_name'],
feat['properties']['bridge_no'],
geom,
today))
#Delete cursor object
del cursor

#### BRIDGES ####
delete_all(BRIDGE_FC)
featList = read_ramm(BRIDGE_URL)
ramm_bridge(BRIDGE_FIELDS, featList)

print "welldone"

Thanks for reading

2 Replies to “How to Ethically Harvest Online Data with Python for ESRI Integration”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s