Hints and Tips for Python Programmers

This page is a quick reference for you for the course, including quick reference guides for installing the understandinggis Python environment, code presentation, using the forum and a set of Python snippets to demonstrate the functionality that we use in the course. It is intended as an aide memoire to save you having to search through the website to remind yourself how to do something - if you want more detail, then you can always check back on the course materials.

If there is anything that you would like me to add to this page, just let me know!

General Information:

Python Examples (alphabetical):


General Information

Installing the Python Environment

The understandinggis Python environment (including all of the libraries that you need to follow the course) is already installed in the MECD, HBS and ALB computer labs on campus. However, if you would like to install it on your own machine, you need to install it.

There are three approaches that you can use to install the understandinggis Python environment in Anaconda. If one of them doesn’t work for you, you can simply try the next one.

Before you do anything, you must first ensure that you have Anaconda installed on your machine. If not, install it using these instructions for your operating system:Ensure that you have Anaconda installed on your machine. If not, install it using these instructions for your operating system:

Now you can follow one of the three approaches below:

Note: some Windows users are experiencing some problems with their installations - until this is resolved, I recommend using method 3 to install the environment (install via console without a specification file).

1. Install via Anaconda Navigator using a Specification File

One simple approach is to use the Anaconda Navigator software that comes with Anaconda:

  1. Download the Specification File for the understandinggis environment for your operating system. This is a YML file, which is simply is simply a type of structured text file (you can read it if you open it in Notepad, TextEdit or similar), and in this case it simply contains a list of all of the libraries that Anaconda should install to create the understandinggisenvironment.

    • For Windows users, right click on this link, select Save Link As and save the file somewhere on your machine as understandinggis.yml.
    • For Mac users, right click on this link, select Save Link As and save the file somewhere on your machine as understandinggis.yml.
    • For Linux users I have not prepared a file, so you should scroll down to option 3
  2. Open Anaconda Navigator and select Environments in the bar on the left
  3. Select Import from the buttons at the bottom
  4. Set the Name to understandinggis and the Specification File to the above understandinggis.yml file that you downloaded
  5. Press OK

The understandingis Python environment will then install on your machine and appear in the Environments List. This means it is installed

2. Install via Console using a Specification File

Another approach that achieves an identical approach to the above is to install the contents of the specification file using a Console (a command window for your machine).

  1. Download the Specification File for the understandinggis environment for your operating system. This is a YML file, which is simply is simply a type of structured text file (you can read it if you open it in Notepad, TextEdit or similar), and in this case it simply contains a list of all of the libraries that Anaconda should install to create the understandinggisenvironment.

    • For Windows users, right click on this link, select Save Link As and save the file somewhere on your machine as understandinggis.yml.

    • For Mac users, right click on this link, select Save Link As and save the file somewhere on your machine as understandinggis.yml.

  2. Open the Anaconda Command Prompt (on Windows) or the Terminal (on a Mac) and paste in the following command, then press Enter:

conda env create -f understandinggis.yml

Note: if you saved the yml file with a different name, then you will need to update the command to reflect this.

The understandingis Python environment will then install on your machine. You can test if it has worked by running this command:

conda activate understandinggis

If (base) changes to (understandinggis) on the command prompt, then you know it has worked.

3. Install via Console without a Specification File

If you can’t seem to get the above working for any reason, you can achieve the same thing by installing the software manually. This is less preferable, as you won’t get exactly the same versions of the software as you would using the specification file, but this will only cause issues on rare occasions (e.g. if a library has changed relative to the class instructions), and these issues are generally easily resolved.

Open the Anaconda Command Prompt (on Windows) or the Terminal (on a Mac) and paste in the following command, then pressing Enter:

conda create --name understandinggis --channel conda-forge --override-channels --yes python=3 pyogrio pygeos geopandas rasterio pysal networkx osmnx pandana urbanaccess dill mapclassify spyder textdistance scikit-image matplotlib-scalebar flask openpyxl

Depending on your system and internet connection, this may take a while to install, and might spend some time saying Solving Environment... - this is fine, just be patient.

Installation Troubleshooting

Problems with fiona

Some Windows users are reporting the following errors:

ImportError: the 'read_file' function requires the 'fiona' package, but it is not installed or does not import correctly.
Importing fiona resulted in: DLL load failed while importing ogrext: The specified procedure could not be found.

or

AttributeError: partially initialized module 'fiona' has no attribute '_loading' (most likely due to a circular import). 

This is due to an issue with Windows that is preventing the proper operation of the fiona library, which geopandas uses to reand and write vector datasets. This is resolvable by installing pyogrio, which is an alternative to fiona that geopandas can use instead.

To do this, simply follow the below steps:

  1. Close Spyder
  2. Open Anaconda Prompt
  3. Type the following commands, followed by Enter
conda activate understandinggis
conda install --channel conda-forge --override-channels --yes fiona pyogrio

Now you can re-open Spyder (remembering to use the understandinggis version!) and it should work fine.

ModuleNotFoundError (missing libraries)

If you are having problems with your installation, including if it looked like it worked, but you are now getting errors like:

ModuleNotFoundError: No module named 'geopandas'

then it is likely that either:

  1. You have forgotten to activate the Virtual Environment (likely by opening the wrong version of Spyder - see the instructions in the Week 1 practical)
  2. the installation was interrupted or otherwise failed

If you are sure that the issue is #2, then you can simply try again, using the same method, or a different one, from the three options above. If that does not work, there are two more things to try:

1. Try updating the environment

If you are having any problems with your installation, the first thing that you can try is to update it. To do this, open the Anaconda Command Prompt (on Windows) or the Terminal (on a Mac) and paste in the following two commands one by one, pressing Enter each time:

conda activate understandinggis
conda update --update-all

This should update your packages, and add any ones that are missing etc.

Note that you should keep an eye on the output an see if there are any obvious errors. If you get one and you can’t understand / resolve it - feel free to post it on the forum for help!

2. Try reinstalling the libraries

If updating the environment is no help, then it is likely that the installation has not worked properly. In this case, we can try re-installing the libraries.

Open Anaconda Prompt (on Windows) or Terminal (on Mac or Linux) and paste in the following command and press Enter (note that if understandinggis is already the active environment then you can skip that command).

conda activate understandinggis
conda install --channel conda-forge --override-channels --yes pyogrio pygeos geopandas rasterio pysal networkx osmnx pandana urbanaccess dill mapclassify spyder textdistance scikit-image matplotlib-scalebar flask openpyxl

This will force all of the libraries to re-install, which should hopefully fix any previous errors. Note that if your error is with a specific library (e.g. geopandas) then you can just re-install that one, which would be much faster than re-installing the whole lot!

3. Try reinstalling the environment

If reinstalling the libraries is no help, then we might need to reinstall the whole virtual environment.

Open Anaconda Prompt (on Windows) or Terminal (on Mac or Linux) and paste in the following lines, pressing Enter each time.

conda deactivate
conda env remove --name understandinggis --yes
conda create --name understandinggis --channel conda-forge --override-channels --yes python=3 pyogrio pygeos geopandas rasterio pysal networkx osmnx pandana urbanaccess dill mapclassify spyder textdistance scikit-image matplotlib-scalebar flask openpyxl

Note that you should keep an eye on the output an see if there are any obvious errors. If you get one and you can’t understand / resolve it - feel free to post it on the forum for help!


Reading List

Reading is not required in this course (as I expect you to spend your time practicing coding!) - but here is an indicative reading list for those who are interested:

Articles referenced in the lectures and practicals

  • Battersby, S. E., Finn, M. P., Usery, E. L., & Yamamoto, K. H. (2014). Implications of web Mercator and its use in online mapping. Cartographica: The International Journal for Geographic Information and Geovisualization, 49(2), 85-101.
  • Brent, R. P. (1973). An Algorithm with Guaranteed Convergence for Finding a Zero of a Function. In Algorithms for Minimization without Derivatives, Englewood Cliffs, NJ: Prentice-Hall,
  • Bresenham, J. E. (1965). Algorithm for computer control of a digital plotter. IBM Systems journal, 4(1), 25-30.
  • Canters, F., Deknopper, R., & De Genst, W. (2005). A new approach for designing orthophanic world maps. In Proceedings of the 22nd International Cartographic Conference (pp. 9-16).
  • Crameri, F., Shephard, G. E., & Heron, P. J. (2020). The misuse of colour in science communication. Nature communications, 11(1), 1-10.
  • Gosling P., C. & Symeonakis E. (2020). Automated map projection selection for GIS, Cartography and Geographic Information Science, 47:3, 261-276.
  • Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (pp. 47-57).
  • Hart, P. E.; Nilsson, N. J.; Raphael, B. (1968) A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics SSC4 4 (2): 100-107.
  • Huck, J., Whyatt, D., & Coulton, P. (2015). Visualizing patterns in spatially ambiguous point data. Journal of Spatial Information Science, 2015(10), 47-66.
  • Kaučič, B., & Zalik, B. (2002, April). Comparison of viewshed algorithms on regular spaced points. In Proceedings of the 18th spring conference on Computer graphics (pp. 177-183).
  • Mandelbrot, B. (1967). How long is the coast of Britain? Statistical self-similarity and fractional dimension. science, 156(3775), 636-638.
  • Matsumoto, M., & Nishimura, T. (1998). Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation (TOMACS), 8(1), 3-30.
  • Schelling, T. C. (1969). Models of segregation. The American Economic Review, 59(2), 488-493.
  • Schelling, T. C. (1971). Dynamic models of segregation. Journal of mathematical sociology, 1(2), 143-186.
  • Usery, L. E., & Seong, J. C. (2001). All equal-area map projections are created equal, but some are more equal than others. Cartography and Geographic Information Science, 28(3), 183-194.
  • Visvalingam M. & Whyatt J., D. (1993) Line generalisation by repeated elimination of points, The Cartographic Journal, 30:1, 46-51.
  • Wolfram, S. (1983). Statistical mechanics of cellular automata. Reviews of modern physics, 55(3), 601.

Useful Books

  • Canters, F. (2002). Small Scale Map Projection Design. CRC Press, Boca Raton.
  • Crooks, A. et al. (2019) Agent Based Modelling & Geographical Information Systems: A Practical Primer. Sage, London.
  • Iliffe, J. & Lott, R. (2008). Datums and Map Projections for Remote Sensing, GIS and Surveying (2nd ed.). CRC Press, Boca Raton.
  • Lawhead, J. (2013). Learning Geospatial Analysis with Python. Pakt, Birmingham, UK.
  • Van Sickle, J. (2010) Basic GIS Coordinates (2nd ed.). CRC Press, Boca Raton.
  • Westra, E. (2010). Python Geospatial Development. Pakt, Birmingham, UK.
  • Wolfram, S. ()
  • Worboys, M. & Duckham, M. (2004) GIS, a Computing Perspective (2nd ed.). CRC Press, Boca Raton.
  • Xiao, N (2015) GIS Algorithms. Sage, New York.

Useful Websites


Code Presentation

Presenting neat and tidy code is an important part of Python - one of the key ideas that Guido van Rossum had when creating Python was that code is read much more often than it is written, and so it makes sense to prioritise readability over writability.

Good, tidy, readable code demonstrates that It will demonstrate that you are a completent programmer, and will even help to get you good marks in your assessments! Key elements of well presented code include:

  • Follow the Rule of Thirds and make sure that your comments explain exactly what is going on
  • Make sure that variable, function and class names are all descriprive of what they do
  • Use snake_case for variable and function names,
  • Use CamelCase for class names
  • Don’t let lines get too long - use line breaks where appropriate!

To learn all about how best to present your code, check out PEP8, the official Style Guide for Python Code.


Forum Etiquette

One of the reasons that we use a forum is that it is vitally important that modern programmers can use a forum properly (both asking and answering questions). For this reason, the course staff will not answer emails about Python directly - only through the forum. Don’t worry about the efficiency of this, in practice the forum just sends us an email anyway, so responses will be just as quick!

Key elements of a good post include:

  1. Before posting, make sure that you have searched both the forum and the web for an answer - these days it is quite likely that your question has been asked many times before, and finding them will get you an answer more quickly. Most online forums will reject duplicate questions.
  2. Make sure that your title explains the issue clearly - this will help attract people to answer your question, as well as help others benefit from the answer (see point 1).
  3. Explain the problem and ask a question - if you just post some code stating that “this doesn’t work” you are highly unlikely to attract an answer. You need to demonstrate that you have put in the effort (see point 1) to encourage people to put in the effort to help you.
  4. Paste code as text, not a screenshot - people need to be able to copy and paste it to test it and solve the issue, and are highly unlikely to go to the trouble of typing it all out from an image! When you paste into the editor, set the formatting to Preformatted using the drop-down, and the font of your code snippet to somethig appropriate like Courier New. You can also make use of tools such as the highlighter to help illustrate your problem.
  5. Similarly, you should also paste error messages in the same manner as code (see point 4) - someone helping you will likely want to search for information online, and will be unlikely to wish to type out the error message to do so.
  6. When including an error message, you should include the entire error message - not just an excerpt.
  7. Do not just attach your code in a zip file and hope that people will want to read it - you need to identify the sections where the error is and paste it as per point 4. Only attach your code if directly asked for it.
  8. Spelling, grammar and punctuation are important: remember to proof read before posting your question or answer.
  9. Be polite! People are much less likely to help you if you are rude.

You can learn more about how to write a good forum post in the official Stack Overflow guide here.


Creating a Map Projection

Often, when producing a map, you will be able to use a projection that has been created before, such as using the British National Grid when mapping in the UK, for example. IOn other cases, you might need to make one for yourself. For this, we have a useful online tool called the Projection Wizard. The tool is described in detail in Šavrič et al (2017) and is based on a range of studies into the selection of map projections, including: Jenny et al (2017)., Šavrič et al., 2015. and Snyder, 1987.

Basically, all you have to do is tell the wizard which region you want to map (either by typing the coordinates in or dragging the box on the map), and what kind of projection you want (equal area, conformal, equidistant, compromise) - it will then make one or more suggestions for which you should use. Select one from the list and get either the proj4 or WKT String to use in your code. Simple!


Python Examples

Classes

In Object Oriented Programming, perhaps the most important element in a program is the class. In simple terms, an object is used to represent something. A class, is the way in which we define that object.

Consider the below code snippet:

# create a point
p = Point(x, y)

Here, we are creating an object called p, which is an instance of the class Point, which looks like this:

from math import sqrt

# this creates the class
class Point:
  
	def __init__(self, x, y)
  	"""
  	* This function is called when an instance of the class is created
  	* It handles the setup of the object (instance of this class)
  	"""
    # this creates two instance variables to store the coordinates
    self.x = x
    self.y = y
    
    # the coordinates might also be available in combination as a list
    self.coords = [x, y]

  def distance(self, point):
    """
    * This function measures the distance between the current point object
    * and another point object using Pythagoras Theorem
    """
    return sqrt( (self.x - point.x)**2 + (self.y - point.y)**2 )

Note how the __init__(self, x, y) function takes in three arguments, whereas the call to the constructor (Point(x, y)) has only two. This is because, in a class definition, there is always an extra argument called self that is passed first. This is just something to remember, and not something that you need to be concerned about.

Once we have our p variable containing our Point object, we are able to access these instance variables in the way that we are used to:

print(p.coords)

[345678, 456789]

Remember that term: Instance Variables - these are variables that are defined as part of a class, but would be different between two instances of the same class (e.g. the coordinates would be different between two point objects). Instance Variables always start with self. and area available to any part of the class.

This class also contains a function. This function calculates the distance between the point to which it belongs and another instance of a Point object that is passed as an argument, e.g.:

point_a = Point(345678, 456789)
point_b = Point(346678, 456789)
print(f"{point_a.distance(point_b)}km")

1000

Collections

There are four main types of collection in Python, each with slightly different characteristics:

  Ordered Changeable Duplicates Index Brackets
list Y Y Y Y []
tuple Y N Y Y ()
dictionary Y Y N key - value pairs {}
set N add and remove, but not edit N N {}

We look at them each in turn here:

Lists

The list is one of the most versatile datatypes available in Python - it allows you to store multiple values of multiple different types in a single variable. It is written as a list of comma-separated values (items) between square brackets (e.g. ["one", 2, "iii", fish, "potato"]). One of the best things about a list is that items in a list do not need to be of the same type (as you can see in the example, which is comprised of strings, numbers and variables).

Can you see which are strings, variables and numbers? If you aren’t sure, ask!

Values stored in a list can be referenced by their index, which is a number describing their position in the list. The index is 0 for the first value, and then increases by 1 for each subsequent value. Values are accessed using their index by using square brackets to indicate which value you are referring to, e.g.:

myList = ["I", "love", "understanding", "GIS"]

print myList[1]

which returns:

love

Often you will want to initialise an empty list into which you will add items later using .append(). This is achieved like this:

myList = []

List Slicing is a convenient way of taking a copy of all or part of a list

a[start:stop]  			# get all elements from start to stop-1
a[start:]      			# get all elements from start to the end of the array
a[:stop]       			# get all elements from the beginning to stop-1
a[:]           			# get a copy of the whole array
a[start:stop:step] 	# get every step element from start to stop
a[start::step] 			# get every step element from start the end of the array

…for example:

my_list = [1, 2, 3, 4, 5]
print(my_list[])

…which returns…

[3, 4, 5]

Tuples

A tuple is just like a list, except that it cannot be altered once it has been created. It is created with round brackets (()) rather than square ([]). Elements from the tuple are still accessed by an index using square brackets, exactly the same as a list.

# create a tuple representing a coordinate pair
coords = (345678, 456789)

# extract the northing from the coordinate pair
northing = coords[1]

Dictionaries

Dictionaries in Python are a little bit like lists, but instead of just loading elements in a specific order (as you would in a list), you actually label each entry in a dictionary. Dictionaries are defined using curly braces { }, and each element inside them is given in the form of a key-value pair (key: value, e.g. "name": "Jonny") - values can be any type, but keys are aleays strings. Note that values are retrieved using square brackets ([]), not curly braces!

Here is an example of a Python dictionary in action:

# make an object to describe me (give or take the odd decade...)
my_dictionary = {"name": "Jonny", "age": 25}

# retrieve a value
print(my_dictionary["name"])

Sets

A set is like a list, but is does not allow duplicates (it wqill just ignore them if you add them) and the elements do not necessarily maintain the order in which you set them. They are therefore very useful for keeping track of lists, such as a list of cells in a raster that we need to check for a certain value or similar.

Because they have the same curly braces ({}) as Dictionaries, it is often better to create them using the set() function. We can add elements to a set using .add(), and remove one using .pop().

# set for cells to be assessed
to_assess = set()

# add a value to the set
to_assess.add(val)

# get the value out of the set (removing it at the same time)
val = to_assess.pop()

Comments

You should comment your code so that you can remember what it does when you come back to it in the future, and to make it understandable to others. Comments are particularly important in this course as they are how I gauge how well you understand what you have done when I am marking!

To add a comment to a like of code, just use the # symbol - anything that comes after this in a line is a comment, and so will be ignored by the Python interpreter:

 # This doesn't do anything...
 print("Hello World!")	# ...neither does this!

If you want to do a multiline comment, you can do that like this:

'''
Everything between the speechmarks is a comment
You can use as many lines as you like
'''

When defining a function, you should add a multiline comment to the top of it like this:

def myFunction():
  """
  Explain what the function does here
  """

Errors

It is an interesting fact that, however good you are, you will spend more of your time debugging code (identifying problems with it), then you will actually writing code. This is true for all programmers, the better you get, the more complex the problems that you need to solve!

The good thing is that the Python interpreter will do its best to help you fix any code that it cannot run by providing you with a detailed error message.

Consider the following code, which I have saved in a file called error.py and tried to run:

a = 2
c = a + b
print(c)

Clearly, this will not run, because the variable b does not exist. Python thererfore returns this into the console at the bottom of the screen:

.../error.py
Traceback (most recent call last):
  File ".../error.py", line 2, in <module>
    c = a + b
NameError: name 'b' is not defined

Process finished with exit code 1

As you can see, tells us the line number (line 2) where the error was found, the line itself (c = a + b), and a description of the error: NameError: name 'b' is not defined.

These are all helpful bits of information that are here to help you - do not just ignore them and think “it doesn’t work” or “I can’t do it” - this is what programming is - programmers will spend about 20% of their time writing code and 80% of their time fixing it. This doesn’t change as you get better at it - the problems just get more complicated! The good news is that Python is always doing its best to help you to find any problems with your code, so that you can fix them!

Learning to understand and interpret error messages is a vital skill for any programmer. If you have not been able to resolve the problem or you don’t understand the message, you should then Google it (normally by just typing “Python” followed by the error message into Google), e.g.

https://www.google.com/search?q=Python+NameError+name+b+is+not+defined

If you are still unable to resolve it, then this is where you would use the Python Programming Forum on Blackboard (or ask for help if it is during a class).


Functions

We have used a variety of different functions in Python so far, but now it is time to learn how to make them for ourselves! A Function is simply a bit of code that does something (e.g. print() prints something to the console or g.inv() works out the ellipsoidal distance between two locations). Functions are a very useful part of a programmers toolkit, as it means that you never have to repeat a line of code, you can simply write a function once and call it as many times as you like!

Defining a Function

In week 2 I told you about a (relatively) simple calculation to get a spherical distance called the spherical cosine method. Below is the code to create a function that calculates a distance between two points using the spherical cosine method:

def sphericalDistance(x1, y1, x2, y2):
	"""
	* This is a special kind of comment used to describe what a function does
	* Calculate the distance between two locations along a sphere using the 
	*		Spherical Law of Cosines.
	"""
  
	# Average radius of WGS84 spherical Earth model in m
	R = 6371008.771415
	
	# convert to radians
	a = radians(y1)
	b = radians(y2)
	c = radians(x2 - x1)
	
	# calculate spherical distance and return
	return acos(sin(a) * sin(b) + cos(a) * cos(b) * cos(c)) * R

Calling the function

Using a function is referred to as calling it. When you call the above function, you pass it some values called arguments (in this case there are 4 arguments: x1, y1, x2, y2). The function processes those values and returns another value (as specified by the return statement in the function definition - in this case, the distance between the two locations described by the four arguments).

To call this function, you would use:

distance = sphericalDistance(-2.79, 54.04, -2.75, 54.10)

This will calculate the distance between -2.79, 54.04 and -2.75, 54.10 and store the result in a variable called distance.


geopandas

geopandas handles much of the data manipulation for vector datasets (e.g. Shapefiles). Here are some of the operations that you can do with it:

Open a dataset with geopandas:

# import the read_file function from geopandas
from geopandas import read_file

# open a shapefile of countries
world = read_file("../data/natural-earth/naturalearth_lowres.shp")

Test the contents of a dataset with geopandas

# print the first five rows of data
print(world.head())

# print a list of the column names
print(world.columns)

# print the number of rows in the dataset
print(len(world.index))

Project a dataset with geopandas

You can transform a GeoSeries or GeodataFrame in geopandas simply by using the .to_crs() function:

# change the CRS of the dataset to Equal Earth
world = world.to_crs('+proj=eqearth')

Note that you can pass either a Proj string or a WKT string to .to_crs.

Filter a dataset based on attribute values with geopandas

You can query a GeoDataFrame to extract data that meets certain conditions. For example, to extract the row from the world dataset where the ISO3 column contains the value 'USA', you would do the following:

# extract usa and mexico
usa = world.loc[(world.ISO3 == 'USA')]

If you want to access a specific value, such as to get the geometry from that query result, you can do so like this:

# extract geometries
usa_geom = usa['geometry'].iloc[0]

Topological Operations with geopandas

There are lots of topological operations available in geopandas (via shapely), some of the main ones are listed here:

Name Syntax Description
Contains a.contains(b) Returns True or False for whether or not a completely contains b.
Crosses a.crosses(b) Returns True or False for whether or not a crosses b.
Equals a.equals(b) Returns True or False for whether or not a is exactly the same as b.
Intersects a.intersects(b) Returns True or False for whether or not a intersects b.
Touches a.touches(b) Returns True or False for whether or not a touches b.
Within a.within(b) Returns True or False for whether or not a is completely contained by b.
Difference a.difference(b) Returns the geometry from a that does not intersect with b.
Intersection a.intersection(b) Returns the geometry from a that intersects with b.
Union a.union(b) Returns a new geometry made by adding from a to b.
Buffer a.buffer(bufferSize) Returns a new geometry created by drawing a buffer of size bufferSize around a.
Convex Hull a.convex_hull Returns a new geometry created by drawing a convex hull around a.

For a complete list, see the manual here.

For example, to calculate the intersection between usa_geom and mex_geom:

# calculate intersection
border = usa_geom.intersection(mex_geom)

Looping through a GeoDataFrame with geopandas

Because of the complexity of the GeodataFrame data format, you cannot simply loop through it as you could through a list or any other iterable object. Instead, you have to use its .iterrows() function to return a data structure that you can loop through.

When you use this approach, .iterrows() returns two values in each iteration of the loop (as opposed to the one value that we were getting last week). For this reason, we have two variables in our for statement: index and row (remember that these are variables so you can call them whatever you like):

  • index contains the index of the dataset - this is a unique identifier for each line, and by default will simply be a series of numbers (0, 1, 2, 3 etc…). This is the number that .iloc (index location) was using last week.
  • well contains the row of data in the loop, giving you access to all of the columns of data, including the geometry.
# loop through each row in the dataset gdf
for index, row in gdf.iterrows():
  # code goes here

Faster looping through a GeoDataFrame with geopandas

In some cases, where you are looping through large amounts of data, .iterrows() is actually quite a slow way to loop through your data, and much better performance can be achieved using the following:

# convert the dataset to a dictionary, and loop through that
for i, row in enumerate(gdf.to_dict(orient="records")):
	# code goes here

If you do not need the index number, then you can omit the enumerate() function call like this:

# convert the dataset to a dictionary, and loop through that
for row in gdf.to_dict(orient="records"):
	# code goes here

however, it is worth testing this, as in some cases (generally smaller loops) it might turn out to be slower!


if Statements

An if statement is an example of a conditional statement, it allows us to make decisions about whether or not a line of code should run. The components of an if statement are very simple, with the minimum requirement being simply:

if condition:
	# this code only runs if the condition was true

e.g.

if value == 'Jonny':
    print('Hi Jonny!')

Note the difference between the assignment operator = (which assigns a value to a variable), and the equality operator == (which tests whether or not two values are equal)!

You can have as many if statements in a row as you like, and they will all be evaluated each time the code runs, even if one of them turns out to be true:

if name == 'Jonny':
    print('Hi Jonny!')
if name == 'Matt':
    print('Hi Matt!')
if name == 'Kirsty':
    print('Hi Kirsty!')

However, because we know that if name == 'Jonny', then name is definitely not also going to == 'Matt' or == 'Kirsty', we can make the process more efficient by using an elif statement (which means ‘else if’) that will only be evaluated if the previous statement evaluates to False:

name = 'Matt'

if name == 'Jonny':		# this will be tested, and is False
    print('Hi Jonny!') # this will therefore NOT run
elif name == 'Matt':	# this will be tested, and is True
    print('Hi Matt!') # this will therefore run
elif name == 'Kirsty':	# this will NOT be tested, as we already know the answer
    print('Hi Kirsty!')  # this will therefore NOT run

You can then add an else statement to the end of that, which will run only if all of the above conditions evaluate to False:

if name == 'Jonny':
    print('Hi Jonny!')
elif name == 'Matt':
    print('Hi Matt!')
elif name == 'Kirsty':
    print('Hi Kirsty!')
else:
	print('I do not know you')

Loops

for Loops

In simple terms, for statements in Python are a way of making a certain block of code run for every member of a list. To do this, it iterates (moves one item at a time) along the list, in the order that it is provided, and then runs a block of code that can access each member of the list in turn. This saves a lot of coding time in comparison to writing everything out multiple times!

Let’s have a simple example:

# create a variable that holds a list of words
words = ['First', 'Second', 'Third']

# print each of those words in turn
for word in words:
	# everything in the indented block is run for each of the 3 list members
	print(word)

This will print:

 First
 Second
 Third

A common variation on this is where you want to do something a certain number of times, but do not have a list to give it. The simple way around this is to create a list to give it, which you can do with the range() function. range() simply creates a list of consecutive numbers, starting at 0, that is the length that you request with an argument (arguments** are the values that you put between the brackets in a Python **function). For example:

print(list(range(10)))

gives:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

To use range() in a for loop is very easy, simply:

# loop through a list 0-2
for i in range(3):
     print(i)

which gives:

0 
1 
2 

You will notice that, this time, the variable i refers to the member of the list that range() returns (a number), and not to the corresponding member of words. If you want the member of words you need to access it using words[i].

List Comprehension

If you want a loop that does nothing more than convert one list to another, you can use this shorthand form of a for Loop called a List Comprehension:

singles = [1, 2, 3, 4, 5]

doubles = [ x*2 for x in singles ]

print(doubles)
[2, 4, 6, 8, 10]

This is no more efficient than a normal loop, it just looks a little more elegant!

Looping through rows in a geodataframe:

see Looping through a GeoDataFrame

while Loops

You can think of a while loop as a cross between a loop and an if statement, because rather than running a block of code for each member of a list (as with a for loop), it runs until the provided condition evaluates to False

# initialise i
i = 0

# loop while i is less than 10
while i < 10:
  
  # print the current value of i
  print(i)

  # increment i
  i++
  
0 1 2 3 4 5 6 7 8 9

Networks

Creating a MultiGraph with networkx

If you know the structure of the graph that you want to make, you can make it as follows:

from networkx import MultiGraph

# Create a graph object
graph = Graph()

# Add nodes and label them with an unique ID like this:
graph.add_node('A')
graph.add_node('B')

# Add edges between the nodes and give them unique IDs like this:
graph.add_edge('A','B', id=1)	

Creating a MultiGraph with osmnx

If you want to load the graph from OpenStreetMap data, then you can do so using the osmnx library:

from osmnx import graph_from_xml

# create graph from OSM XML dataset
graph = graph_from_xml('../data/kaliningrad/kaliningrad.xml')

Operators

Operator Explanation Symbol(s) Example
add / concatenate Used to add two numbers together,
OR
concatenate two strings together.
+ 6 + 9 (15)

"Hello " + "world!" (Hello World)
arithmetic These do what you’d expect them to do in basic maths. -, *, / 9 - 3 (6)
power Can be used to calculate a number to the power of another number. ** 3 ** 2 (9)
assignment operator You’ve seen this already, it assigns a value to a variable.
They can also be combined with the arithmetic operators to
=

+=, -= etc.
myVariable = 7;

myvariable += 1
equality operator You’ve seen this too, it compares two operands, returning True if they are the same, and False if not. == myVariable = 7;

print

Print enables you to write to the console, and is good to give feedback to the user and test the values of variables:

print("hello world!")
print(2 + 2)

…which gives…

hello world
4

pyplot

We use pyplot to draw and export our maps:

# import subplots and savefig from pyplot
from matplotlib.pyplot import subplots, savefig

# create map axis object
fig, ax = subplots(1, 1, figsize=(15, 8))

# remove axes
ax.axis('off')

# add a title to the plot
ax.set(title="World Map")

# plot a GeoDataFrame stored in the world variable
world.plot(
    ax = ax,
    )

# save the result
savefig('../out/1.png')

# close any plots left in memory
close_plots('all')

Sometimes you will want to control the bounds of the resulting map. In the below snippet, we zoom the map to a 1000m buffer around the bounds GeoSeries:

# set bounds
buffer = 10000
ax.set_xlim([border.geometry.iloc[0].bounds[0] - buffer, border.geometry.iloc[0].bounds[2] + buffer])
ax.set_ylim([border.geometry.iloc[0].bounds[1] - buffer, border.geometry.iloc[0].bounds[3] + buffer])

pyproj

Ellipsoidal Distance with pyproj

This calculates the ellipsoidal distance between two given locations

from pyproj import Geod

# set which ellipsoid you would like to use
g = Geod(ellps='WGS84')

# measure the forward azimuth, back azimuth and distance between two points
azF, azB, distance = g.inv(start_longitude, start_latitude, end_longitide, end_latitude)

Calculate a point offset with pyproj

This calculates the coordinates of a location a given distance and direction (forward azimuth) from a given location across an ellipsoid

from pyproj import Geod

# set which ellipsoid you would like to use
g = Geod(ellps='WGS84')

# calculate a point at a given distance and forward azimuth from another point
longitude, latitude, bAz = g.fwd(start_longitude, start_latitude, forward_azimuths, distance)

Transform coordinates with pyproj

This transforms coordinates from one one CRS to another, where geo_string and proj_string describe the original and desired CRS respectively.

from pyproj import Transformer

# initialise a PyProj Transformer to transform coordinates
transformer = Transformer.from_crs(CRS.from_proj4(geo_string), CRS.from_proj4(proj_string), always_xy=True)

# transform from geo to projected
lon, lat = transformer.transform(x, y, direction='FORWARD')

# transform from projected to geo
x, y = transformer.transform(lon, lat, direction='INVERSE')

rasterio

Opening a dataset with rasterio

When we use rasterio to open raster datasets, it is best to use the with keyword to handle the complexities of opening and closing the file. In the below example, when we are inside the withstatement, we have access to the dem variable (which you could call anything that you like as it is a variable). As soon as you leave the with block, the file is safely closed by Python and the dem variable is destroyed. This is important, because opening and closing files is a great way to crash a script if you don’t do it carefully.

from rasterio import open as rio_open

# open a raster file into a variable called ds
with rio_open('../data/helvellyn/Helvellyn.tif') as dem:

	# everything inside the with block can access ds
  
# once you leave the block, the file automatically closes for you

Note that I have also used a slightly unusual import statement here, where I have re-named the open function from rasterio to rio_open uaing the as keyword. This is simply because open is a very generic name and I could easily end up adding two libraries that both have a function called open. For this reason, it is often handy to re-name imports as I have done here to avoid confusion!

rasterio data structures

rasterio has its own data structure. The structure is simple, but you need to make sure that you understand it in order to be able to access the data that you want correctly.

Raster data are organised into bands, which can be thought of like layers. Each band is the same size, and covers the same area, and the cells of each band align perfectly. A DEM is a simple raster dataset, because it only contains a single band (representing elevation). A satellite image, for example, is a little more complex, because it contains three bands: the first of which contains the red component of the colour for each cell, the second of which contains the green, and the third of which contains the blue. (the same is also true of normal digital photographs). When we open an image file for viewing on our computer, the software combines the red, green and blue components of each cell to give the final colour that we see on our screen (each pixel of which is made up of three tiny LED’s - one red, one green, one blue…). Raster datasets can have many bands, particularly when you are dealing with multispectral datasets like you might have seen in the remote sensing course.

You can see the structure of a dataset like this:

print(dem.profile)

This allows you to see lots of information about the raster dataset that you have opened, including:

Property Explanation Value
driver Which drive was used to open the dataset, so effectively what format the data is in (we will generally use GeoTiff’s, but there are lots of types available) GTiff (GeoTiff)
dtype The type of data held in the raster (normally an Integer (round number) or float (decimal number) float32 (32 bit decimal number)
nodata What values is used to represent the absence of data (NoData) for a given cell None
width The width, in pixels, of the dataset 4000
height The height, in pixels, of the dataset 4000
count The number of bands in the dataset 1
crs The coordinate reference system (CRS) of the dataset (a CRS definition)
transform An Affine Transformation Martrix - this is used alongside the CRS to position the dataset in space. It simply describes the resolution, coordinates of the x-coordinate of the upper-left corner of the upper-left pixel and rotation of the dataset in this order: [pixel width, rotation on x axis, x coordinate, rotation on y axis, pixel height, y coordinate] Affine(5.0, 0.0, 320000.0,0.0, -5.0, 530000.0)
tiled Whether or not the dataset is stored in tiles or in a single big block. Sometimes tiling is useful for very large datasets, but is relatively unusual for most purposes. False
interleave This refers to the way that the data is stored within the dataset (you can pretty much ignore this - it makes little difference to the user). It refers to whether or not the data are stored band by band ('band'), line by line across all bands ('line') or pixel by pixel across all bands ('pixel'). The default for GeoTiff data is to store data band by band ('band'). 'band'

The bands of a raster are ordered in a list, which is accessed using the .read() function. We then access the bands as we would any other list (remembering that this particular list only has a single element in it):

# get the single data band from a dem
elevation_data = dem.read(1)

If we didn’t have a DEM, but had an aerial photograph instead (for example…) then we could have three bands to access like this:

# get the rgb components from a photograph
bands = ds.read()
red_band = bands[0]
green_band = bands[1]
blue_band = bands[2]

Or like this:

# get the rgb components from a photograph
red_band = ds.read(1)
green_band = ds.read(2)
blue_band = ds.read(3)

Managing coordinate space

Once you have the band that you want, you need to deal with one of the fundamental issues of raster data: the fact that computers store cells of data referenced using the x,y coordinates from the top left (which we call image space), whereas coordinate reference systems (and so GIS…) store them using x,y coordinates from the bottom left (which we call coordinate space). See here:

Image Space

Translating projected coordinates (e.g. 345678, 456789) into raster coordinates (e.g. 263, 100) has traditionally been a major source of error in GIS because it is easy to get wrong, but fortunately rasterio makes this very easy for us.

All we have to do is use the .index() and .xy() methods to convert between x,y coordinates in a spatial reference system (coordinate space), and column,row locations in a raster dataset (image space). Under the hood, this is done using the Affine Transformation Matrix that we saw earlier, but these handy functions mean that we no longer have to deal with these directly.

To make this process even easier, I tend to use the two functions below to handle the situation:

# convert from coordinate space to image space
r, c = d.index(x, y)

# convert from image space to coordinate space
x, y = d.xy(row, col)

Spatial Index

Create a spatial index from a geodataframe:

from rtree.index import Index as Spatial_Index

# initialise an rtree index
idx = Spatial_Index()

# loop through each row and construct spatial index on the larger dataset (population)
for id, feature in water_points.iterrows():
	idx.insert(id, feature.geometry.bounds)

Create a spatial index from a networkx Graph or DiGraph:

# create spatial index from graph
idx = Spatial_Index()
for id, data in list(graph.nodes(data=True)):
	idx.insert(int(id), (data['x'], data['y'], data['x'], data['y']))

Get the nearest feature in a Spatial Index to a given location

# use the spatial index to get the index of the closest well
nearest_well_index = list(idx.nearest(house.geometry.bounds, 1))[0]

# extract that well from the geodataframe
water_points.iloc[nearest_well_index]

Filter a dataset to a given bounding box:

# get the indexes of wells that intersect bounds of the district
possible_matches_index = list(idx.intersection(polygon.bounds))

# use those indexes to extract the possible matches
possible_matches = water_points.iloc[possible_matches_index]

try-except Statements

In programming, when something goes wrong, the programming environment (Python in this case) raises an Exception - this is basically a message that something has gone wrong, and it forms the basis of the error messages that you see when you make a mistake in your code.

Normally, Exceptions are unexpeted, and are resolved by editing your code so that, when the final version is ready, there is no need for your users to ever see them. In some cases, however, exceptions can be triggered in ‘normal’ circumstances, and we (the programmer) need to be able to handle tham gracefully to make sure that our users do not see a load of messy error messages that make it look like our software is faulty.

Such a situation arises when calculating routes: If networkx is not able to calculate a route (normally because it is not possible, such as if you wanted to go from one side of a river to another but there was no bridge), then it will raise an Exception called NetworkXNoPath, indicating that no path can be found. We can avoid the user seeing this by using a try-except statement, in which we ask Python to try a block of code, and in the event that an Exception is raised from it, it will stop what it is doing and run a different block of code that is identified by the keyword except.

This sounds complex but is actually very simple, here is how we would apply it to our route calculation:

from sys import exit
# use try statement to catch exceptions
try:
	# calculate the shortest path across the network and extract from graph
	shortestPath = astar_path(graph, source=fromNode, target=toNode, heuristic=ellipsoidalDistance)

# catch exception for no path available
except NetworkXNoPath:
	print("Sorry, there is no path between those locations in the provided network")
	exit()

Variables

Variables are used to store values, you can assign a value to a variable using the assignment operator (=). Remember that the names should be in snake_case:

# assign a string to a variable
my_variable = "some text"

# assign values to some variables
value_1 = 2
value_2 = 4

# use those variables to calculate a value for another variable
value_3 = value_1 + value_2

Variables come in a number of different types; the main ones are described in this table:

Variable Explanation Example
String Some text. To signify that the variable is a string, you should enclose it in either single ('...') or double ("...") quote marks. my_variable= "Bob"
Integer A ‘round’ number with no decimal places. Numbers don’t have quotes around them. my_variable= 10
Float A number with decimal places. Numbers don’t have quotes around them. my_variable= 10.5
Boolean True or False my_variable= True