Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interest in addition of spatial index archive? #81

Open
boydjohnson opened this issue Aug 24, 2024 · 6 comments
Open

Interest in addition of spatial index archive? #81

boydjohnson opened this issue Aug 24, 2024 · 6 comments

Comments

@boydjohnson
Copy link

I thought to add a spatial index to osmflat-rs, and have it partially implemented here (master...boydjohnson:osmflat-rs:feature/geospatial-archive). It is implemented for Nodes and Ways, but not relations.

The index is this implementation (https://docs.rs/space-time/latest/space_time/xzorder/xz2_sfc/struct.XZ2SFC.html).

It allows for indexing Nodes, Ways, and Relations with the same curve, based on bounding box.

Aside from asking if you would consider a PR, I was wondering if there would be crates.io releases in the future for osmflat and osmflatc?

Best regards.

@boxdot
Copy link
Owner

boxdot commented Aug 24, 2024

We definitely would consider a PR. Also we can easily release a new version on crates.io.

@VeaaC
Copy link
Collaborator

VeaaC commented Aug 25, 2024

Did you manage to also reorder nodes/way based on the index to increase data locality?

I would assume that an index for nodes is "for free" / implicit as long as their are sorted by (some) space filling curve (z-order, hilbert, etc).

Ways and relations require extra data (but there's less of them to go around).

@boydjohnson
Copy link
Author

I wasn't able to reorder nodes/ways due to the lack of sort functionality on ExternalVector. I keep the ZOrderIndexEntries in an in-memory vector and sort by index at the end. I can do a US state on my laptop, but know users might want to run it on a planet file.

I asked ChatGPT to make an ascii diagram of how the ZOrderIndexEntry works.

+-----------------------------------------------------------+
|                   ZOrderIndexEntry                        |
|-----------------------------------------------------------|
| z_order (64 bits) | tag (8 bits) | idx (64 bits)          |
+-----------------------------------------------------------+
                           |
                           v
      +------------------------------------------------+
      |                GeoType (tag)                   |
      |------------------------------------------------|
      |  Node = 0    |    Way = 1    |  Relation = 2   |
      +------------------------------------------------+
            |                |                |
            v                v                v
  +----------------+  +----------------+  +----------------+
  |   Nodes Slice  |  |    Ways Slice  |  | Relations Slice|
  +----------------+  +----------------+  +----------------+
  | idx -> Node[ ] |  | idx -> Way[ ]  |  | idx -> Relation[ ] |
  +----------------+  +----------------+  +----------------+

Here's a code snippet of using osmflat to query an archive.

let curve = XZ2SFC::wgs84(Z_ORDER_RESOLUTION);

        let ranges = curve.ranges(x_min, y_min, x_max, y_max, None);

        let spatial = self.osm.spatial_index();
        let ids = self.osm.ids().unwrap();

        ranges
            .iter()
            .map(|i| (i.lower(), i.upper()))
            .flat_map(|v| {
                let element = spatial.z_order_index().partition_point(|e| e.z_order() < v.0);

                spatial.z_order_index()[element..]
                    .iter()
                    .take_while(move |e| e.z_order() <= v.1)
                    .filter_map(|entry| match entry.tag() {
                        osmflat::GeoType::Node => {
                            let item = &self.osm.nodes()[entry.value() as usize];
                            let id = &ids.nodes()[entry.value() as usize];

@boydjohnson
Copy link
Author

So I had a thought. Your idea of ordering the nodes, ways, relations each in index order is a good one. It would allow for less data to be stored. To get all the nodes, ways, relations at one spatial index would require 3 binary searches.

The data that wouldn't have to be stored would be (8 bits for the GeoType, 64 bits for the index of the node/way/relation) ~100GB on a planet file.

Then all that needs to be stored would be an optional spatial_index on each Node, Way, and Relation, 64 bits.

To allow for reordering nodes, ways, relations I would use RocksDB during the compilation process, since RocksDB allows iterating all key values by a sorted key. This would allow for the data to never be held in memory, but still be sorted.

@VeaaC
Copy link
Collaborator

VeaaC commented Sep 1, 2024

That sounds like a good idea. You might not even need those extra 64 bit for nodes, as the spatial index of a node is (most likely) just the morton code of its coordinates, and could be computed on-the-fly?

@boydjohnson
Copy link
Author

Yeah, That makes sense. Since we have the latitude and longitude, we can calculate the index on the fly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants