Pull to refresh
1.55

OpenStreetMap *

OpenStreetMap is a collaborative project to create a free editable map of the world.

Show first
Rating limit
Level of difficulty

«Where, where have you gone», or searching for missing stations on public transport routes in OpenStreetMap

Level of difficultyMedium
Reading time6 min
Views1K

OpenStreetMap (OSM) is a global project formed around a geographic information database which is being filled by all comers — both enthusiasts and interested companies. Anybody can contribute, but the openness has its downside: incorrect edits often get into the database. Hence plenty of validators of OSM data have been written which allow to maintain the data quality at an acceptable level.

Since 2016 there exists an open source subway preprocessor that validates (generates error reports) rapid transit routes in OSM for completeness and logical/topological errors, and converts them into formats that are suitable for routing and rendering, e.g. GTFS. Besides OSM data it takes a list of public transport (PT) networks which contains the checking information about the number of lines, stations etc. per a PT network. The preprocessor has successfully proven itself in the preparation of PT data for applications such as Maps.me and Organic Maps.

In this article, I would like to share an approach to detecting one of the types of errors that occur quite often in OSM data and automatic detection of which is somewhat challenging. It's an accidental loss of a station from a route. The source code of the validator and the described algorithm are open source. But first, let's define the concepts used to represent PT data in OpenStreetMap.

Read more

«Divide and Conquer» for OpenStreetMap world inside PostgreSQL

Level of difficultyMedium
Reading time28 min
Views1.7K

I will continue the story "How to put the whole world into a regular laptop: PostgreSQL and OpenStreetMap" with secrets about OpenStreetMap geodata, on which many companies have built their business, but not everyone shares the details... Well, today we will open crucial details.

The OSM database in PosgreSQL after loading from the dump takes up more than 587 GB. This is already a large database by the standards of a DBMS, and one huge table for each type of object will not work. For manageability, such data must be partitioned, it's good that PostgreSQL supports declarative data partitioning. It remains only to figure out how to split geographical data. After searching and comparing, the H3 hierarchical hexagonal geospatial indexing system came to rescue. All this was implemented in my openstreetmap_h3 project for fast processing and loading of the world dump into the PostGIS database.

I considered following options from geopartitioning systems...

Read more

How to put the whole world into a regular laptop: PostgreSQL and OpenStreetMap

Level of difficultyEasy
Reading time12 min
Views3K

When a person used to say that he controls the whole world, he was usually placed in the next room with Napoleon Bonaparte. I hope that these times are in the past and everyone can analyze the geodata of the entire Earth and get answers to their global questions in minutes and seconds. I published Openstreetmap_h3 - my project, which allows you to perform geoanalytics on data from OpenStreetMap in PostGIS or in any query engine that can work with Apache Arrow / Parquet.

First of all, I say hello to the haters and skeptics. What I developed is really unique and solves the problem of transforming and analyzing geodata using the usual and familiar tools available to every analyst and data science specialist without bigdata, GPGPU, FPGA. What looks easy to use and code now is my personal project where I have invested my vacations, weekends, sleepless nights and a lot of personal time over the past 3 years. Maybe I will share the background of the project and the rake that I went through, but first I will still describe the end result.

Read more

Roads and building density in North America. 100GB geodata processing OSM data in PostgreSQL

Reading time15 min
Views1.5K

Today I will discover America to you based on OpenStreetMap data in PostgreSQL15/PostGIS and my project openstreetmap_h3. Let's run the query and compare its execution time on the Citus column store in PostgreSQL and on the standard 100GB database partitioned by H3 geoindex.

We will find the top15 buildable locations in North America and the total length of roads, as well as their type and surface. I will not overload the publication with program logs, let's focus on the data! You can easily repeat all requests yourself on your laptop/computer.

Read more

The work on the historical development plan case Vladimir city (Russia)

Reading time3 min
Views1.1K
I am a russian cartographer Alexander Pronin and I am going to tell you how we together with my colleague Nikita Slavin, a map of the age of the houses of the city of Vladimir was created.

Data


For the layer with buildings I used a free map from OpenStreetMap. This layer contains the address, which is divided into columns, names of the objects and numbers of storeys. I collect data on the year of construction, address, name of the object and photos from several sources.

The first one is my project владимирдом.рф It’s a plot that I created to show the residents of my hometown how our city was built up. In this case I took the year of construction data from the website of the Ministry of Housing and Communal Services, for kindergartens and schools — from their official websites. A lot of information was sent also by site users. The map only shows the year of construction.

Read more →

How to write the home address right?

Reading time16 min
Views1.4K

How Tax Service, OpenStreetMap, and InterSystems IRIS
could help developers get clean addresses


image
Pieter Brueghel the Younger, Paying the Tax (The Tax Collector), 1640

In my previous article, we just skimmed the surface of objects. Let's continue our reconnaissance. Today's topic is a tough one. It's not quite BIG DATA, but it's still the data not easy to work with: we're talking about fairly large amounts of data. It won't all fit into RAM at once, and some of it won't even fit on the drive (not due to lack of space, but because there's a lot of junk). The name of our subject is FIAS DB: the Federal Information Address System database — the databases of addresses in Russia. The archive is 5.5 GB. And it's a compressed XML file. After extraction, it will be a full 53 GB (set aside 110 GB for extraction). And when you start to parse and convert it, that 110 GB won't be enough. There won't be enough RAM either.
Read more →

Authors' contribution