What did we learn from the postdoc project?

Author: Dmitry Pavlyuk

The research project “Spatiotemporal urban traffic modelling using big data”[i] is coming to the end and it’s time to summarise the results. The primary objective of the project was methodological: enhancement of urban traffic flow forecasts by responsive spatiotemporal models based on big data. “Spatiotemporal” means use of relationships that appear between traffic flows at remote locations with time lags: e.g., a traffic incident on one bridge can lead to congestions at nearby bridges in the nearest time. Three years of intensive research, 14 conference presentations, 7 conference papers and 7 journal publications led us to many interesting observations and ideas about future development of traffic forecasting methodology.

Four years ago, when we started the project, spatiotemporal models had been just one of promising directions in urban traffic forecasting. Nowadays, this is a mainstream in scientific researches. If somebody uses a non-spatiotemporal approach (try to forecast traffic flows at one road segments without taking nearby traffic into account) and presents it at an international scientific or business conference, they immediately become a target of questions like “Why don’t you use relationships with other road segments?”. When I review a paper on traffic forecasting methodology, I expect that the list of state-of-the-art models will include spatiotemporal specifications. Spatiotemporal relationships are everywhere, and we were lucky to join this stream when it was a rivulet.

At the beginning of the research project we expected a new spatiotemporal model specification as an important contribution of the project. Soon, after the intensive literature review, the primary focus was shifted to the identification of spatiotemporal relationships. Existing forecasting models (for example, multivariate statistical models and artificial neural networks) work well when spatiotemporal relationships are known, but this is not the case of traffic flows. Which road segments will be affected by a congestion at this crossroad? The question is straightforward, but the answer is not. Obviously, the congestion will affect nearby roads, but not only them – well-informed drivers will prefer alternative paths to avoid the congestion and will create unusual traffic at others, not directly connected road segments. Learning of these relationships has become one of the primary research directions of the project[ii].

The science encourages diversity, and the traffic forecasting models are not an exception. When we have a problem of spatiotemporal relationship’s learning, a wide range of methods is ready to be used or adopted from similar domains. As a result, we have dozens of good methods, which we can use for problem solving. The question is: which method to choose? And one of modern answers is: use them all, in ensemble! “Two heads are better than one” the proverb says, and dozens of heads can be even better. The only problem is to organise the methods into a good ensemble, which uses advantages of each methods and hides disadvantages. Thus, the learning of spatiotemporal relationships with an ensemble of modern methods has become a promising contribution to this project [iii].

The next small methodological step forward was made after a short comment from an anonymous reviewer: ”Why do you use and compare only the methods from traffic forecasting domain? Why not to look into other domains, where spatiotemporal relationships are widely acknowledged?” Really, why? And we start reading a lot of paper of weather forecasting, video stream prediction, spreading viruses, and so on. And found that the methodologies are very similar up to domain-specific assumptions and historical designations. Moreover, the models can be transferred between different domains (e.g., models, trained for video prediction, can be adopted for traffic forecasting). This simple observation led us to the second important research direction – transferring of spatiotemporal models from other domains to traffic forecasting. And we successfully transferred models from video prediction[iv],[v] and financial analysis[vi]. The pretty property of methodological advances – they can be useful in other areas, even in those the researcher was not aware!

The salt of any models is data. “Garbage in, garbage out”, a well-known principle sounds very good for traffic forecasting models. A lot of traffic data sources are available: old-school loop detectors, video and satellite surveillance, floating car data, modern sensors of self-driving cars, social networks. An abundance of data sources can play a cruel joke – more data does not mean more information, but definitely means more garbage (unnecessary data) and noise (incorrect data)[vii]. Sounds strange, but traffic forecasting models suffer from these big data, and many research efforts are made to fuse data from different sources and carefully select the most important information pieces. Perhaps, this direction will become a direction for one of our future projects.

The key word of recent methodological studies is reproducibility. Having a lot of different models and algorithms, the researcher needs to compare them. And, if studies are conducted on different data sets and differently transformed data, it becomes almost impossible to choose the best one. “My method works better than other methods (selected by me) on one data set (also selected by me)” is not a good scientific explanation, and papers with such “supporting” experimental studies are rarely published nowadays. All interested researchers should be able to reproduce the methodological study (repeatability) and test the efficiency of the proposed solution on other data sets (reproducibility). Open science and open data (and source codes) principles ensure this mandatory option and the science should be open in its next generation.

At the end of the project, I would like express my personal gratitude to the funders (acknowledgements in my publications are not just necessary words!), to my colleagues in Transport and Telecommunication Institute, who always supported my ideas, and to my new friends among researchers, transport engineers, geographers, mathematicians, who encouraged me to the progress on this project, even by a couple of words during a conference coffee break.

[i] Research project No. “Spatiotemporal urban traffic modelling using big data” (Principal investigator: Dmitry Pavlyuk, Transport and Telecommunication Institute) implemented under the specific support objective activity “Post-doctoral Research Aid” (Project id. N. of the Republic of Latvia, funded by the European Regional Development Fund.

[ii] Dmitry Pavlyuk, ‘Feature Selection and Extraction in Spatiotemporal Traffic Forecasting: A Systematic Literature Review’, European Transport Research Review 11, no. 1 (December 2019), https://doi.org/10.1186/s12544-019-0345-9.

[iii] Dmitry Pavlyuk, ‘Towards Ensemble Learning of Traffic Flows’ Spatiotemporal Structure’, Transportation Research Procedia 47 (2020): 361–68, https://doi.org/10.1016/j.trpro.2020.03.110.

[iv] Dmitry Pavlyuk, ‘Make It Flat: Multidimensional Scaling of Citywide Traffic Data’, in Reliability and Statistics in Transportation and Communication, vol. 117, Lecture Notes in Networks and Systems (Cham: Springer International Publishing, 2020), 80–89, https://doi.org/10.1007/978-3-030-44610-9_9.

[v] Dmitry Pavlyuk, ‘Transfer Learning: Video Prediction and Spatiotemporal Urban Traffic Forecasting’, Algorithms 13, no. 2 (13 February 2020): 39, https://doi.org/10.3390/a13020039.

[vi] Dmitry Pavlyuk, ‘Spatiotemporal Forecasting of Urban Traffic Flow Volatility’, in Reliability and Statistics in Transportation and Communication (Accepted), n.d.

[vii] Dmitry Pavlyuk, Maria Karatsoli, and Eftihia Nathanail, ‘Exploring the Potential of Social Media Content for Detecting Transport-Related Activities’, in Reliability and Statistics in Transportation and Communication, vol. 68 (Cham: Springer International Publishing, 2019), 138–49, https://doi.org/10.1007/978-3-030-12450-2_13.