Skip to content
This repository has been archived by the owner on Jul 24, 2021. It is now read-only.

import massively drops data in multi process mode #4651

Closed
openstreetmap-trac opened this issue Jul 23, 2021 · 5 comments
Closed

import massively drops data in multi process mode #4651

openstreetmap-trac opened this issue Jul 23, 2021 · 5 comments

Comments

@openstreetmap-trac
Copy link

Reporter: Nop
[Submitted to the original trac issue database at 1.49pm, Thursday, 25th October 2012]

osm2pgsql works fine when called with number-processes=1. If called with a parameter > 1 it massively drops data. For 4 processes, DB size is about 40% smaller than for working import. No error, osm2pgsql commits the corrupted tables. Swapping behaviour in Munin looks the same in all cases. Latest SVN version, built on 64bit Debian.

call gzip -d -c update.osm.gz |
osm2pgsql/osm2pgsql -c --slim -d topo -p data -C 5000
--number-processes=1 -S topo_import.style /dev/stdin

@openstreetmap-trac
Copy link
Author

Author: amm
[Added to the original trac issue at 8.32pm, Saturday, 27th October 2012]

Assuming the bug is caused by what I think it is caused by, this only happens if there is not enough memory available to execute the forks for the helper processes.

The helper processes independently go through the pending way / relations array in a stride length of the number of processes. However, if not all helper processes start up and process their part of the share, then that fraction of the pending ways never get processed and are missing in the rendering tables.

There was a fallback that should have prevented this in the code, but the information of the changed number of processes was only communicated to the parent process. So the other helper processes still processed the wrong number of ways / relations.

This is now fixed in commit r28864.

Given that the ways were in the ways / relations table of the database, they would have gotten correctly processed the next time one does an update. However, given that going over pending ways / relations seem orders of magnitude slower in append mode than initial mode, that would have likely been prohibitively expensive.

@openstreetmap-trac
Copy link
Author

Author: Nop
[Added to the original trac issue at 8.02am, Monday, 29th October 2012]

Allowing overcommit of swap space with "echo 1 > /proc/sys/vm/overcommit_memory" did not work. Maybe I did not apply it properly, maybe there is a different problem.

@openstreetmap-trac
Copy link
Author

Author: Nop
[Added to the original trac issue at 11.17am, Sunday, 18th November 2012]

According to Sven, overcommit is enabled on Debian by default, that explains why there was no change.

But this would indicate that the drop of data in multi-process mode is not caused by insufficient swap space as assumed.

@openstreetmap-trac
Copy link
Author

Author: amm
[Added to the original trac issue at 4.42pm, Saturday, 12th January 2013]

So far I don't think I have been able to reproduce this issue.

Could you post the full log of imports both with num-proccesses = 1 and > 1? Also, could you do a count on all of the tables to see where the data is lost?

@openstreetmap-trac
Copy link
Author

Author: Nop
[Added to the original trac issue at 10.55am, Sunday, 3rd February 2013]

I have built another version from the latest SVN and conducted a series of extended tests. A huge data set was required to provoke the problem. With your fixes, it works with 4 processes and is now live on the server. There's a noticeable difference in the munin protocol: The working version shows a huge peak in committed memory (ca. 35GB) during import that was missing when data was lost before the fix (ca. 12GB).
So I assume that it is fixed now, though for slightly different reasons.

Ticket can be closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant