[Go] Making GTFS query more convenient

Source: Internet
Author: User

url:http://ontrakinfo.wordpress.com/2012/10/29/making-gtfs-query-more-convenient/

That's what I'm saying.

I have been spending a lot of time parsing the GTFS database. On the surface it's just a simple CSV files. Extract useful information from GTFS is often unexpected difficult. For example, find the stops from a bus line in sequential order might sounds as basic thing to do. But it's actually non-trivial with GTFS.

One reason is transit service are more complex it seems. It might seems a bus service just hits all the stops in sequence. But the actual service had a lot of variables. The schedule is often different in weekend compare to weekdays. And so does the exact route, that it covers. Sometimes a bus is scheduled to run a short route rather than covering the whole length. In + complex case there can be branching where there are a common main trunk and then the buses split to serve both or MO Re alternative destination.

This is the reason, why in GTFS one "route" may associate with multiple "shapes". To find out what shapes is associate with a route, we'll have the make a query like this

SELECT shape_idfrom route join trips join Shapegroup by shape_id;

To find out the stops are even more complex. Here we need to join one more table the stop_times. It's also the biggest tables in the GTFS. Also the most computation intensive query to do.

SELECT shape_id, Stop_idfrom route join trips join Stop_times join Stopsgroup by shape_id, stop_id;

Still most people has a clear concept of what a transit line is where it runs. It shouldn ' t is such a pain to compute. A more useful structure should look like below.

    GTFS             more useful  Structure         &NBSP ; structure    Route              line     |                   |     |                   v     |                 route*     |                   |      |    shape          |  +-> route_shape     |     ^             |  |     |    /             |  +-> route_stops*     |                 |     v  /        &NBSP ; &nbsP    V    trips              trips     |                   |     |        stops      |          stops     |        ^          |     |      /          |     v      /    &NBSP ;      V    stop_times         Stop_times

Here a shift the terminology a bit. The top level entity was a line (i.e. GTFS ' route). This was service that people know of, like a numbered bus line or a metro line. Below is routes. These is the collection of alternative routes a line may run. The routes is not explicitly represented in GTFS. You can find this by querying all unique shape_id using the first SQL. Another missing piece is the stops. If we can pre-compute all the route_stops using the second SQL once, for the most part we don ' t need the giant stop_times Table. For applications that does not deal with scheduled time, the This is a huge saver. The is one assumption my structure makes though. It is that different lines does not shape that same route. If should be a reasonable assumption. And if there is indeed share route and shape, we should just replicated them as a separate entities.

The original GTFS structure seems to has a transit operator centric view. It allows them maximum flexibility to author and publish their service data. But for application developers, it's not structured to easy traversal. By adding the route and Route_stops tables as indicated, it would greatly facilitate the query and operation of transit INF Ormation.

[Go] Making GTFS query more convenient

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.