Loading Events

« All Events

Virtual Event Virtual Event
  • This event has passed.

Beyond Static Open Data: A Citi Bike Time Series Case Study

Mar 11 at 9:00 am - 10:00 am

Virtual Event Virtual Event

Much open data is static. It often corresponds to either a snapshot in time or some historical summary. While static data is surely useful, looking at how data changes over time opens up new avenues for exploration. Looking backward, we can identify trends and garner insights. Looking forward, we can generate forecasts and try to predict the future. 

Citi Bike is the primary bikeshare in NYC, and they open up a lot of their data. They publish datasets about trips that riders have taken, and they have a real-time API that publishes the current information about all stations, such as the number of bikes and docks available. However, this data is largely static. If I want to answer questions like, “Will there be a dock available for me by the time I get to my destination station?”, I need to be able to forecast the number of docks at the destination station. And for that forecast, I need a time series of historical data about the number of docks at that station in order to build a forecasting model.

To answer such questions, I started pinging the Citi Bike API every 2 minutes back in 2016, and I have been collecting this data ever since. Data from August 2016 to December 2021 is publicly available on Kaggle. In this event, I will show how the data collection system works, and how I keep its operations cheap and worry-free. This data collection system can be reused for other open, real-time APIs. I’ll then show how we can analyze and visualize the data in order to learn about different Citi Bike stations in NYC. Finally, I’ll answer my original question by building a model to forecast the number of bikes available at a given station.


Mar 11
9:00 am - 10:00 am
Event Category:
Event Tags:




Ethan Rosenthal


Event Type
Event Format
Social Media URL