A feed_id for regional analysis path results #920

edasmalchi · 2023-12-20T01:05:22Z

@hhmckay and I are working on a process for using Conveyal/r5 to find common transfer points within a given region in California. We can provide our own origin/destination points and have a working proof of concept using the regional analysis export paths feature for Sacramento, but many trips involve multiple transit operators and thus multiple GTFS feeds.

All feeds are present in the Network Bundles we've updated so they're correctly considered in routing, but we would like to link the path results back to our own GTFS data warehouse. This will be very difficult without a way to tell which GTFS feeds the routes and stops in the path results refer to.

Describe the solution you'd like

Some sort of GTFS feed id present for routes in the path analysis output CSV. routeFeeds?

We can accept an arbitrary feed id and do some work on our end, but it would be even better if they can be assigned in some sort of documented, deterministic way.

@ansoncfit mentioned that feed_info.txt may be used to generate these, it would be great to have that process documented (which fields within feed_info contribute to the id, etc). If so we might consider editing feeds on our end prior to network bundle upload to ensure feed_info.txt contains our preferred identifiers...

Our GTFS data warehouse uses various keys derived from our internal organization records, feed urls, etc. Since that context isn't present within Conveyal, it's not possible to regenerate exact matches.

Describe alternatives you've considered

Nothing compelling, would potentially involve significant manual effort, or standing up another trip planner/routing engine.

Additional context

https://calsta.ca.gov/subject-areas/sb125-transit-program
cal-itp/data-analyses#888
https://github.com/cal-itp/data-analyses/tree/main/conveyal_update
https://dbt-docs.calitp.org/#!/model/model.calitp_warehouse.dim_gtfs_datasets#details

The text was updated successfully, but these errors were encountered:

abyrd · 2023-12-29T10:25:56Z

I just looked into this a bit. Adding some notes that could assist with future work:

On the specific place where CSV path output fetches IDs: Currently PathResult.summarizeIterations() calls RouteSequence.detailsWithGtfsIds() which in turn calls TransitLayer.routeString() and TransitLayer.stopString(). The routeString is from a RouteInfo which does not include the feed ID. On the other hand, the stopString is taken from stopIdForIndex which does include the feed ID (see TransitLayer.loadFromGtfs around L209), but that feed ID is being removed in the method.

More generally on which IDs are available within R5: The core internal data model for public transit is rooted at TransitLayer, which retains gtfs-lib Stop objects in the stopForIndex field. These objects have a feed ID field. On the other hand, in TransitLayer.routes we are storing RouteInfo objects (which do not have a feed ID) rather than gtfs-lib Route objects (which do have the feed ID). Around TransitLayer L292 where we construct the TripPattern, the Route is converted to a RouteInfo and discarded. However, the feed ID is included in the route ID in each newly constructed TripPattern (all trips grouped together under a TripPattern come from the same Route).

Considering all that, it would be relatively straightforward to include the feed ID in CSV output, though the question arises of whether this should always happen or it should be configurable, as it changes the existing format and adds some repetitive noise to the output from the perspective of users with only a single feed. The ID is most readily available on the stops (not the routes), but references to stops are always from routes in the same GTFS feed so we need only one feed ID per routeId / boardStopId / alightStopId triple. The structure of the CSV rows as parallel pipe-separated arrays means it should be possible to just add another pipe-separated feedIds array in a supplemental column.

As for the nature of the feed IDs and how they are set: It is possible for feeds to specify their own identifier in the feed_id column of feed_info.txt, but Conveyal overrides this ID with a random unique identifier (BundleController.java:198). Conveyal often needs to handle several different versions of the same feed from different times or with modifications applied, and it's useful in many places for the feed IDs within the application to match the unique upload IDs rather than a feed-specified feed ID that could collide across several uploaded feeds.

This is related to issue #909, where the system also depends on entities having unique IDs that won't collide across feeds, so can't reliably fall back on user-specified IDs, so other fields are under consideration (the route short and long name).

So, in the CSV output would it be most useful to see:

The random unique GTFS file upload ID used in Conveyal's internal database
The ID of the agency operating the route
The name of the agency operating the route
The feed_publisher_name, feed_publisher_url, or feed_id from feed_info
Some other user-specified string injected into the feed file or specified at upload to Conveyal

The first three should be more straightforward to retrieve and add as a new column to the CSV. The last two would be a bit more tricky as we'd need to retain them in the Bundle or TransportNetwork or TransitLayer, but they seem possible in principle.

edasmalchi · 2024-01-04T23:22:23Z

Thanks @abyrd!

From your list, I think "Some other user-specified string injected into the feed file or specified at upload to Conveyal" would be ideal for our needs.

That said, we could totally work with "The random unique GTFS file upload ID used in Conveyal's internal database", especially if that would be a lot faster to get up and running.

edasmalchi · 2024-01-16T17:04:45Z

Hey @abyrd @ansoncfit

Just checking in to see if you have had any more time to think about this, having this capability would be a huge help for us.

Happy to jump on a call sometime if that would help.

Thanks!

ansoncfit · 2024-01-16T17:46:54Z

Hi @edasmalchi,

Apologies for the delayed reply, I'm still catching up on things after TRB preparations/travel.

We should have a prototype ready for you to try next week, based on the Conveyal-generated UUID. For initial testing, you can grab these ids by using DevTools to inspect network requests to the https://analysis.conveyal.com/api/db/bundles endpoint.

More soon!

ansoncfit · 2024-02-29T03:37:49Z

references to stops are always from routes in the same GTFS feed so we need only one feed ID per routeId / boardStopId / alightStopId triple

One wrinkle: with a reroute modification, a route from feed x can actually reference stops from feed y.

ansoncfit · 2024-04-25T00:55:06Z

Added in #936

abyrd mentioned this issue Dec 29, 2023

Routes added by modifications are not recognizable in paths output #909

Open

ansoncfit mentioned this issue Jan 25, 2024

Add feed id column to path results #927

Closed

ansoncfit mentioned this issue Feb 1, 2024

Human-readable names for destination layers in CSV output #929

Open

ansoncfit closed this as completed Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A feed_id for regional analysis path results #920

A feed_id for regional analysis path results #920

edasmalchi commented Dec 20, 2023

abyrd commented Dec 29, 2023 •

edited

Loading

edasmalchi commented Jan 4, 2024

edasmalchi commented Jan 16, 2024

ansoncfit commented Jan 16, 2024

ansoncfit commented Feb 29, 2024

ansoncfit commented Apr 25, 2024

A feed_id for regional analysis path results #920

A feed_id for regional analysis path results #920

Comments

edasmalchi commented Dec 20, 2023

abyrd commented Dec 29, 2023 • edited Loading

edasmalchi commented Jan 4, 2024

edasmalchi commented Jan 16, 2024

ansoncfit commented Jan 16, 2024

ansoncfit commented Feb 29, 2024

ansoncfit commented Apr 25, 2024

abyrd commented Dec 29, 2023 •

edited

Loading