Alex Wroblewski Capstone

​Abstract

Using openly available data provided by SEPTA, I developed a program in order to track buses and predict their arrival time at the next scheduled stop. This project was inspired by the lack of a solution to receive accurate data as to the actual headway (Time between arrivals) of the bus. How can SEPTA passengers receive more accurate information about their commute? How can they get a more dynamic schedule? In November 2016, an anonymous survey was put out, wherein the data gathered represented a need of tools for a more accessible commute. This inspired me to engineer a solution to improve commuting in Philadelphia Using provided discrete data points from the SEPTA TransitView API (Application Programming Interface), the average velocity of the vehicle over the tracked interval can be found and used to predict arrival time. A lack of 100% data coverage does not allow for this method to be completely accurate, an implementation of the Monte Carlo method for headways is discussed in order to potentially provide probability. The time difference between the arrival time and the actual time is calculated and a simulation from the data is run many times over in order to generate the probability for each possible arrival window. A simulation from the data is run many times over in order to generate the probability for each possible arrival window. Applications of this solution include a multi platform phone app, a prototype of which has been developed. While the final version of the program is not one hundred percent complete, the process to run such a simulation has been thoroughly researched and documented, and will be used to continue development through the coming summer.


Deliverables

Over the past year, I have learned how to develop multiplatform (iPhone, Android, and Web) compatible apps, as well as the basics of predictive data analytics. I have gained proficiency in the Python programming language, as well as its unique capacities for handling large amounts of data.

https://github.com/mediocrelogic/septa-dispatch - is where the code for my ever developing simulator, written in Python, lives. It is still non functional, because of ever evolving implementation ideas and research. The next actions for this project are to implement a system to load the SEPTA data with an SQL database, and to evolve the trip class to a point where it can track active trips (and disable those that are not active).

https://github.com/mediocrelogic/accusepta-app - This is a multiplatform app I developed at the ExCITe Center at Drexel University during an internship. The primary goal was to investigate how I could make the information gained during data analysis useful to regular commuters. Currently, it is barebones, but it taught me how to to present information, make a multi platform app, and focus on simple, accessible design. (page will be updated with screenshots)

https://goo.gl/photos/4j77SQFz89aGvPqH9  - This link shows documentation of the research and design I embarked on over the year.

`Cohen, G., and K. M. Crawford. "A Problem in Estimating Bus Stop Times." Applied Statistics 27.2 (1978): 139. JSTOR. Web. 3 Feb. 2016.

This journal by The Royal Statistical Society demonstrates a linear regression for the time a bus spends at a stop, as dependent on the amount of passengers boarding and alighting. Other models provide algorithms to determine whether a boarding/alighting event is definite, improbable, or unlikely. The amount of time a bus would spend not actually moving isn’t something usually taken into account when models of tracking based on actual vehicle location are made, however the creation of such a linear regression based on typical SEPTA data, or gathered by users, would highly increase the accuracy of the tracking by adding another layer of adaptation to only getting new data every 3-5 minutes.


Forbes, M. A., J. N. Holt, P. J. Kilby, and A. M. Watts. "BUDI: A Software System for Bus Dispatching." J Oper Res Soc Journal of the Operational Research Society 45.5 (1994): 497-508. JSTOR. Web. 3 Feb. 2016.

A software system known as BUDI is described for the dispatching of buses operated by a public transport organization in Austrailia. Organization of transit terminology is an early focus, defining routes and the difference between a route and a timetabled instance of one, for example. Definitions and rostering of the BCC depots mentioned will provide a theoretical model for database construction of similar data received from the aforecited SEPTA API. While BUDI’s focus is on sorting and dispatching, the concepts behind it’s design, especially in terms of the database it relies on, is universally applicable to any analysis of transit data.


Golshani, Forouzan. "System Regularity and Overtaking Rules in Bus Services." J Oper Res Soc Journal of the Operational Research Society 34.7 (1983): 591-97. JSTOR. Web. 3 Feb. 2016.

To passengers, the most important part of a service is its reliability. The authors of this study analyzed average waiting time and under what circumstances would it be appropriate for one bus to overtake another. A simulation was run to determine the average headways and overtaking during a typical day of bus service. Though the original simulation is unavailable and outdated (written in fortran for a mainframe), the mathematics are available in this research journal. There are many different models used to simulate service, and each will have to be evaluated as to it’s potential accuracy. A theoretical application to Accusepta would be to create a new simulation for every time new data is received, allowing for more accurate estimates of headways over time.


Jansson, Jan Owen. "A Simple Bus Line Model for Optimisation of Service Frequency and Bus Size." Journal of Transport Economics and Policy 14.1 (1980): 53-80. JSTOR. Web. 3 Feb. 2016.

This analysis of Swedish buses in the 1980’s shows a model of total trip time as dependent on the time taken to travel the distance plus the total time spent boarding and alighting. This algorithm was originally used for economical reasons, determining the most profitable optimized frequency of bus travel. However, this model is relevant as it can be used to provide a “larger picture” estimation of total travel time for a run of a SEPTA bus.


Jennings, Norman H., and Justin H. Dickins. "Computer Simulation of Peak Hour Operations in a Bus Terminal." Management Science 5.1 (1958): 106-20. JSTOR. Web. 3 Feb. 2016.

The Monte Carlo method is a statistical simulation built upon the principle of thousands of random estimations within a set of constraints. This method is most often used to replace costly real world trial and error with a computer simulation. In 1958, Port Authority employees built a simulation to build a histogram of the distribution of the bus arrival times to be used when organizing dispatch for the day. This algorithm was used in the original Accusepta design, and is applicable to, say, estimating the probability of making a connection depending on the estimated travel time from the vehicle’s last known location.  


Mcleod, F. "Estimating Bus Passenger Waiting times from Incomplete Bus Arrivals Data." J Oper Res Soc Journal of the Operational Research Society 58.11 (2006): 1518-525. JSTOR. Web. 3 Feb. 2016.

Operations Researchers in Southhampton UK have built a model to determine average waiting time based on bus headways, the time between busses at a stop using an AVL, an automatic vehicle location system akin to that used and provided by SEPTA. The main problem with using an AVL is that missing data is almost a guarantee. SEPTA only provides locations every three to five minutes, for example. A lack of total data coverage creates gaps that have to be worked with. The authors main goal is to contribute to the theory of estimating headway variance, the difference between the frequency of busses, with incomplete data. Various methods are tested on different data sets. Previous research on AVL based models is hard to find, and adapting to the gaps, where the bus could potentially make multiple stops or travel a significant distance, is hard to manage while striving for accuracy.


Pratelli, A., and F. Schoen. "A Mathematical Programming Model for the Bus Deviation Route Problem." J Oper Res Soc Journal of the Operational Research Society 52.5 (2001): 494-502. JSTOR. Web. 3 Feb. 2016.

Researchers at The University of Piza and the University of Firanze in Italy have contributed to the creation of a mathematical model of a deviated bus route. While SEPTA does not operate on a deviated bus route. Such a route supports the main route along a corridor, while collecting and distributing passengers from neighboring blocks. The increased route flexibility causes an increase in both travel time and wait time. Despite the fact that SEPTA does not possess a deviated bus route system, the modeling is applicable in terms of analyzing theoretical inconvenience to passengers as well as tracking busses that do not follow their schedules due to traffic or other factors. Thus, this discrepancy between the schedules and the effects of real life can be equivalent to a deviated stop in terms of analyzing inconvenience to passengers. Pratelli. A, has proposed a mixed integer linear programming problem that highlights “many-to-many”, where it takes into account that passengers alight and board at every stop on the route. Other elements it proposes include the concept of net inconvenience for passengers, based off of travel time, waiting time, and any increases to that, alongside delay.

Pratelli. A and F. Schoen both created analytical models of the feasibility of a transit system based on deviation, and a core notation is the usage of arcs between two points. Arcs are used as an efficient and inclusionary way in Italy to capture most every location the bus could have passed in the interim between the last location signaled to central command, deviated stop or not.


"Public Transport, Walking and Cycling Directions - Citymapper." Citymapper. Web. 04 Feb. 2016. <https://citymapper.com/philadelphia>.

Citymapper is a Web/iOS/Android app that implements transit data from cities all over the world, including Philadelphia and implements the SEPTA API, however does not provide real time tracking. Citymapper does provide a transit focused view that services such as, say, Google Maps, do not provide as clearly. This aggregation of all available transit services provides an interesting perspective on user interface. Citymapper also provides a directions and travel time interface capable of being used in one’s own applications.


"SEPTA API Documentation." SEPTA API. SEPTA DEV. Web. 03 Feb. 2016. <http://www3.septa.org/hackathon/>.

SEPTA provides documentation on their API (Application Programming Interface). SEPTA provides http links with the ability to make specific requests for data depending on route or location. Features include the TransitView API, a specific system to make requests as to the location of either a specific bus or all buses currently active in the SEPTA network. TransitView will be the crux of the ACCUSEPTA model, as it can provide active tracking of every bus on duty, sending data every 3-5 minutes. Other documentation focuses on interfacing with raw schedule data for other SEPTA services. This framework will be used to build custom route objects for every bus in Philadelphia.


"Septadev/SEPTA-Android." GitHub. SEPTA/Github. Web. 03 Feb. 2016. <https://github.com/septadev/SEPTA-Android>.

SEPTA makes all source code for both their iOS and Android apps available online on Github. This allows for a resource as to the implementation of access to the SEPTA database, alongside typical ways it is accessed. The official SEPTA app offers some features that are useful, however the transitview is lackluster and only provides a general location as was last received. Not much analysis is done, as is the goal with Accusepta. However, the base of real time tracking is there for both buses and regional rail, thus providing a theoretical base on which to build up important Accusepta features based on the official SEPTA implementation of their API.


Comments