Introduction to Geospatial Data

Malcolm Katzenbach
4 min readApr 25, 2021
Photo by Brett Zeck on Unsplash

Geospatial data is all around us. The streets, cities, states, countries and events can all be considered geospatial data. We see the evidence of the use of geospatial data in our lives every day. One of the most common areas of use is if you watch the weather channel, or look at the weather forecast. The maps rendered are examples of this type of data.

Another common application is GPS, or Global Positioning Services, that allow us to get from point A to point B. Or when you get lost, and need to find yourself on the map.

These different locations are usually determined by coordinates on the Earth, such as longitude and latitude or even elevation. More information can be added to this location as attributes. An example of an attribute could be what the data observation is describing. Is the point a road, or could it be a building? The locations can also be static or dynamic. Static locations can be considered permanent, at least for a short period time. Examples would be the different streets and cities or it could be a sudden event like where an earthquake originated. In cases of dynamic locations, they are moving. If you are reading this on a mobile device while walking down the street, you are a dynamic location. The spread of infectious diseases could be considered another case of dynamic locations.

There are multiple ways one can use geospatial data. There is great software that use it or Geographic Information System (GIS) files. But we can also use this data coding with Python.

Using Geospatial Data with Python

If you have been using python for a short period of time, you have already likely heard about Pandas as way of looking at data. Well for geospatial data there is a specific package called GeoPandas, and this is what we will be looking at today.

GeoPandas is a great package for working with geospatial data, but it can sometimes be difficult to install and have it work. The reason for this is that there are number of other unusual packages that GeoPandas need for it to run. If you were to check the documentation, there are four vital packages that need to installed before GeoPandas. They are fiona, pyproj, rtree, and shapely.

The easiest way to install GeoPandas and its necessary base packages is if you already have Anaconda. You can install the latest version of the packages by using the following code in your terminal.

conda install geopandas

If you don’t already have Anaconda, another way of installing the package is to use the pip installer.

pip install geopandas

However, if you are using the pip installer, make sure that you install the required packages before installing GeoPandas or it won’t work.

Now that GeoPandas is installed, we can take a closer look at the different types of attributes you might find in a geospatial data file. For example we could look at the boundary files for the United States from the US Census Bureau. To import a boundary file, you have to first read it into a GeoPandas data frame.

country = geopandas.read_file("gz_2010_us_040_00_5m.json")

The above data file is the shape file for the States of the USA. If we use the head() method to look at the first few observations, you would see the following.

As you can see we have number of columns.

  1. There is a GEO_ID number.
  2. The STATE number
  3. The NAME of the State
  4. CENSUSAREA of the State
  5. And the geometry of the State.

The geometry in the first few observations are polygons and multipolygons. These shapes are a type of shapely object. There are also point and line shapely objects. They are a finite sequence of coordinates that collaborate to form the interior, boundary, and exterior of the shapely object. Points could be for a building, lines could be streets, and as seen above, the polygons can represent larger structures such as States.

There a number of goals we could complete with this data, one of which is to map the data. The plot function can be used to render an image of the data geometry.

country[country['NAME'].isin(['Alaska', 'Hawaii']) == False].plot(figsize = [30, 20], color = 'Green');

And above, we have an image of the USA. Alaska and Hawaii were taken out due to the distance from the rest of the country.

GeoSpatial Data can be used in many facets of society and are an integral part of its running . I hope you enjoyed this quick introduction to Geospatial Data and using it with Python.

For documentation on GeoPandas:

https://geopandas.org/docs.html

--

--