Graphing Transit Systems
I’ve been away from the blogging world for a while! The last few months included a fantastic and inspiring trip to Transportation Camp NYC and loads of (mostly) fun weekend work on transit graphs.
Why transit graphs?
While reviewing algorithms on Jason Park’s algorithm visualizer, I thought, “WE CAN APPLY THESE TO TRANSIT.” It was a moment of pure destiny. To call it multidisciplinary intrigue would be underselling my excitement. Of course, I was not the first person to connect transit and graphs; Google Maps, Open Trip Planner, and Mapzen’s Valhalla are all built on graph representations.
My original goal was to display an animated graph traversal of the New York City subway system. I’ve ended up with a platform to study graph algorithms on transit maps. (I learned that if I’m unsure what I’m building, just call it a platform. The solutions will follow.)
As you can see when using the graph demo, there are three modes. A bit on each…
Dijkstra’s is the classic gateway algorithm to finding shortest paths in graphs. Wikipedia’s explanation is as clear-worded as I’ve read, so I’ll defer to them:
It picks the unvisited vertex with the lowest distance, calculates the distance through it to each unvisited neighbor, and updates the neighbor’s distance if smaller.
Fire up the algorithm visualizer for to help picture this. In my graph, the edge weights are the time between stations. After running Dijkstra’s, we have an ordered sequence of nodes which represent the shortest path between the origin and destination and the time it would take to do so.
Sound familiar? Google Maps transit directions do the exact same time. And much better! Knowing when to switch trains becomes a luxury after using my tool.
A depth-first search, or DFS as the real algorithm geeks call it, is a classic traversal method for both graphs and trees. The idea of a traversal is to visit all the nodes in the graph which can be reached given a starting node. The depth-first variety is contrasted with the breadth-first procedure (up next!) in that, given a starting node, one of its neighbors is visited, then one of it’s neighbors is visited, then one of it’s neighbors is visited, then one of it’s neighbors is visited, then one of it’s neighbors is visited, then one of it’s neighbors is visited, then one of it’s neighbors is visited, then one of it’s neighbors is visited, then one of it’s neighbors is visited, and so on. Was there anything weird about that last sentence? This is a recursive algorithm! When a visited node has no unvisited neighbors, the algorithm pops back up the call stack, testing for unvisited neighbors at each level.
A breadth-first search is another traversal variant whose lofty goal is to identify connected components of a graph while providing zero valuable info to passengers riding transit. (Now would be a good time to say that identifying connected components will play a key role in merging nodes during a later step in this project. Traversals are a necessary part of any graph analyzer’s toolkit!) As you may have guessed, a BFS goes wide before it goes deep. From a given node, all of its neighboring nodes are visited before any of their neighbors are visited. This produces a different exploration pattern, which is illustrated in the following three images.
gtfs-graph (the GitHub project name for now – please help me come up with a better one!) is built to be system-agnostic. I have graph representations for Boston and Paris in addition to New York City. While the GTFS standard allowed me to construct all three graphs in similar ways, there were still a few quirks, resulting mainly from how the different systems represent sub-stops (parent/child or northbound/southbound).
Recently, I have been implementing centrality algorithms to see how the results varied from system to system. Paris’ RATP heavy rail lines certainly look to have higher connectivity than Boston’s hub-centric design, and I’m working to find the numbers to prove this. If I can indeed prove this, I’d like to use a genetic algorithm to efficiently enhance (add lines and stops) a system to match the connected-ness/centrality distribution/equity/whatever-metric-I-end-up-with of a higher quality system.
After implementing Google’s PageRank algorithm, I decided it is a poor model for transit. The rankings currently displayed are a modified version of closeness centrality. I really enjoyed this white paper on a node importance algorithm and plan to implement this soon. It uses random walks to calculate the entropy of a given node after a given number of steps.
I hope to have a much more detailed most on these metrics in the coming weeks! I would love to hear any thoughts or ideas you might have about any or all of this!
Let’s build awesome things to help transit, cities, and, most of all, people.