Cities face the constant challenge of traffic congestion, which is intrinsically linked to our quality of life. Congested streets impact not only our economies but also the environment and our collective well-being. To build smarter cities, we need a quantitative understanding of how traffic behaves, just as Google’s Project Green Light explores how to improve traffic flow.
Central to understanding traffic are congestion functions, which provide a mathematical way to capture congestion at the level of individual roadway segments: as vehicle volume increases, congestion tends to grow, and travel speeds tend to reduce. The challenge of identifying congestion functions — accurately estimating speed based on observed vehicle volume — is key to several applications, such as real-time navigation, traffic flow simulation, and traffic management.
Mathematical models for road network congestion have a long and impactful history. Most prior models are based on physics and are applied to individual road segments. Unfortunately, traffic sensors are typically only installed on major roadways, leading to sparse or non-existent data for many urban streets and thus incomplete model coverage. While solutions for these issues have historically been limited, the recent rise of vehicle telematics and smartphones enables vehicles to act as moving sensors and collect real-time estimates of vehicle speed and volumes over a much wider set of roads. With these new data sources, perhaps a data-driven approach to identify congestion functions could succeed, even at a global scale for any road in a city and any city in the world.
In “Scalable Learning of Segment-Level Traffic Congestion Functions”, we explore this challenge systematically. Our goal is to fuse data across all road segments of a city to yield a single model for the city, enabling more robust inference on roadways with limited data. We assess our framework’s ability to identify congestion functions and predict segment attributes on a large, multi-city dataset. Despite the challenges posed by data sparsity, our approach demonstrated strong performance, particularly in generalizing to unobserved road segments.