Extracting Open Street Map (OSM) street data from data files using PyOsmium

Maksym Kozlenko 🇺🇦
5 min readApr 25, 2022

There are a number of ways to extract data from OpenStreetMap. For example, using OSMnx as I’ve already shown. It allows you to load data for suburbs or cities. Unfortunately it won’t work for large areas, such as a country since the amount of data to load is large.

But there is an option to load the .osm.pbf file and process it on your computer. Latest OSM data is available for download from GeoFabrik.de. You can load data for a region, country or even whole continent. These files could be very large. For example, file for Europe currently is almost 25 Gb.

Data types to process

OSM.PBF file file contains OSM nodes, ways and relations including geometry and tags. I would like to get information for all streets in Kyiv Region, so let’s check first how that data is stored.

For example one street Kyiv, Ukraine on which I’ve lived for a long time: Kuchmyn Yar Street. Its OSM relation ID is 417092 . You can view it on https://www.openstreetmap.org/relation/417092

associatedStreet relation on OSM

As you can see, this relation has a unique ID, number of tags containing street names in different languages, old names, type, Wikidata ID and Wikipedia page name. Wikidata ID is very useful, since it allows you to get information about this street from Wikipedia resources such as Wikidata, structured graph data, Wikimedia Commons containing images of this street and Wikipedia articles.

Each relation has a number of members. For associatedStreet it could be either houses located on this street or ways representing sections of this street. Why multiple sections? Some streets can be long and have sections with different maximum drive speeds, number of lanes, etc. Each street part can be assigned its own set of tags.

highway residential way on OSM

OSM Way with ID 182921640 (https://www.openstreetmap.org/way/182921640) has parent relations and number of member nodes. Each node represents a point on the street section line.

To get a list of streets we need to process relations and ways from a data file.

Processing data file

Latest OSM data file for Kyiv region is available for download from:

https://download.openstreetmap.fr/extracts/europe/ukraine/kiev_oblast-latest.osm.pbf

To extract data from file I will use PyOsmium library. It can be installed using pip with following command:

pip install osmium

File processing is done by defining a handler class StreetsHandler which has methods for processing nodes, ways and relations.

Since we are processing ways and relations only, two methods are defined.

way() method checks if an entry has tags “highway” and “name” defined. If yes, way ID and its geometry is extracted and added as a dict object to the array.

relation() method checks if an entry has a tag “type” with value equal to “associatedStreet” and name tag. If yes, ID, tags and relation members are stored into arrays.

To start file processing run:

It may take a while to process the file. You should get the following output:

num_relations: 40361
num_ways: 278849
num_nodes: 0

As a result three data frames containing ways, relations and relation members will be created:

street_relations_df
street_relation_members_df
street_ways_df

Let’s join ways and relation data frames into single data frame streets_gdf:

Merged data frame streets_gdf:

streets_gdf

Since same street could be represented by multiple ways, as shown in specific street example above, we need to join ways and relations, and merge multiple ways belonging to same relation into one entry with geometries merged into MULTILINESTRING

relation_streets_df data frame has relations only. ID column has Relation ID stored with “r” prefix and “geo” column contains multiple LINESTRING (more details about WKT geometries) values for each way

relation_streets_df

Now we can concatenate these relation-type streets with streets which do not have relations defined and represented by single way entry. Also this data frame will be saved to disk in CSV format:

Done. Data frame rw_streets_gdf now has street entries with way or relation ID, name and Wikidata ID

rw_streets_gdf

Let’s plot these streets to see result. All streets in Kyiv region. In total in contains more than 20 000 streets. X and Y axis represent latitude and longitude.

rw_streets_gdf.plot(figsize=(30,30))
Streets in Kyiv region

We can also plot single street on top of OSM basemap to check if street geometry was correctly merged from multiple ways:

Street with relation ID 417092 extracted into GeoDataFrame
Street shown on OSM website

As you can see street geometry matches the shape shown on OpenStreetMap website and does not include houses located on this street.

In the next article I would like to show how to process street data.

Complete code example and its data output could be viewed and run as a Kaggle notebook.

Check also:
Getting administrative boundaries from Open Street Map (OSM) using PyOsmium

--

--