Maps and Python
Recently I've been working a lot with maps and geographical data. Most of the work has been dome in Stata and I've been using mostly QGIS for visualization. Also I did some simple spatial operations in GDAL and ArcGIS. But when it comes to producing maps, QGIS was my tool of choice. Ideas change quite a lot and QGIS is nice tool for data visualization, but unfortunately I've found it's Python API a bit troublesome to help me automatize production of maps. So I kept doing things manually.
Tools¶
Here I turn to Python and its libraries to help me automatize map production. It turn out not much effort is needed and the maps being produced are visually appealing just after some minor tweaks.
In this short excercise I use Geopandas which enables a simple way to work with Geo data. Except that I also use the standard scientific libraries like Matplotlib, Pandas and Numpy.
Data sources¶
In this example I use administrative boundaries for Poland and connect it with Local Data Bank, BDL data from the Polish national statistics office.
Shapefiles for administrative boundaries for Poland are available for free to download here. Unfortunately only last year is available, but still it's something.
First we need to setup our environment. I always try to use absolute paths, so it's easy for me to switch between computers.
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import geopandas as gpd
import os
# Project directory
pDir = "/home/mateusz/projects/gis/"
Openinig a Shapefiles is extremly easy. Geopandas documentation says that the only thing that you jus need one line of code to do that. Lets try reading the municipalities data...
try:
pl_gminy = gpd.read_file(os.path.join(pDir, 'data', 'prg2018', 'gminy.shp'))
except ValueError as e:
print(e)
Unfortunately I got an error. That's probably because the SHP files are created on Windows machines using Polish localization. The default encoding in use is the CP1250. To be able to read the municipalities file I had to open the original first in QGIS indicating the encoding and then save the file with UTF-8 encoding selected. It worked like a charm.
I also left just the columns with names and municipality codes (TERYT) jpt_kod_je
.
pl_gminy = gpd.read_file(os.path.join(pDir, 'data', 'prg2018', 'gminy_utf.shp'))
pl_gminy.head()
So the geometry is coded in the geometry
column. Nice to know. Let's see what is the projection in use.
pl_gminy.crs
The used units are meters, so we can easily calculate area.
pl_gminy['area'] = pl_gminy['geometry'].area
pl_gminy['area'].describe()
np.sum(pl_gminy['area'])/1e6 # I divide by 1000x1000 m to get km^2
Google says that area of Poland is 312 679 km$^2$, so there is lack of precission here.
Let's try to plot the map.
pl_gminy.plot()
It worked, but the map still needs some minor improvements.
fig, ax = plt.subplots(1, figsize=(14, 14))
pl_gminy.plot(ax=ax, linewidth=0.5, edgecolor='0.4', color='0.8')
ax.axis('off')
Much better! Lets try to add some color! First let's plot the area.
fig, ax = plt.subplots(1, figsize=(14, 14))
pl_gminy.plot(ax=ax, edgecolor='0.4', linewidth=0.5,
column='area', cmap='Greens', legend=True, scheme='quantiles', k=5)
ax.axis('off')