Opened 7 years ago

Closed 6 years ago

#4651 closed defect (fixed)

import massively drops data in multi process mode

Reported by: Nop Owned by: jburgess777@…
Priority: major Milestone:
Component: osm2pgsql Version:
Keywords: Cc:

Description

osm2pgsql works fine when called with number-processes=1. If called with a parameter > 1 it massively drops data. For 4 processes, DB size is about 40% smaller than for working import. No error, osm2pgsql commits the corrupted tables. Swapping behaviour in Munin looks the same in all cases. Latest SVN version, built on 64bit Debian.

call gzip -d -c update.osm.gz | osm2pgsql/osm2pgsql -c --slim -d topo -p data -C 5000 --number-processes=1 -S topo_import.style /dev/stdin

Change History (5)

comment:1 Changed 7 years ago by amm

Assuming the bug is caused by what I think it is caused by, this only happens if there is not enough memory available to execute the forks for the helper processes.

The helper processes independently go through the pending way / relations array in a stride length of the number of processes. However, if not all helper processes start up and process their part of the share, then that fraction of the pending ways never get processed and are missing in the rendering tables.

There was a fallback that should have prevented this in the code, but the information of the changed number of processes was only communicated to the parent process. So the other helper processes still processed the wrong number of ways / relations.

This is now fixed in commit r28864.

Given that the ways were in the ways / relations table of the database, they would have gotten correctly processed the next time one does an update. However, given that going over pending ways / relations seem orders of magnitude slower in append mode than initial mode, that would have likely been prohibitively expensive.

comment:2 Changed 7 years ago by Nop

Allowing overcommit of swap space with "echo 1 > /proc/sys/vm/overcommit_memory" did not work. Maybe I did not apply it properly, maybe there is a different problem.

comment:3 Changed 7 years ago by Nop

According to Sven, overcommit is enabled on Debian by default, that explains why there was no change.

But this would indicate that the drop of data in multi-process mode is not caused by insufficient swap space as assumed.

comment:4 Changed 7 years ago by amm

So far I don't think I have been able to reproduce this issue.

Could you post the full log of imports both with num-proccesses = 1 and > 1? Also, could you do a count on all of the tables to see where the data is lost?

comment:5 Changed 6 years ago by Nop

Resolution: fixed
Status: newclosed

I have built another version from the latest SVN and conducted a series of extended tests. A huge data set was required to provoke the problem. With your fixes, it works with 4 processes and is now live on the server. There's a noticeable difference in the munin protocol: The working version shows a huge peak in committed memory (ca. 35GB) during import that was missing when data was lost before the fix (ca. 12GB). So I assume that it is fixed now, though for slightly different reasons.

Ticket can be closed.

Note: See TracTickets for help on using tickets.