CTF Update 4: Building a Crystal Ball for the Airlines

It’s been a while since we had an update on what the COVID-19 Task Force (CTF) is doing. At the time of this blog's publishing, we have a total of 22 global enterprises participating, including some of the largest airlines in the world.

This is truly unprecedented! Because airline businesses are so competitive and their margin so thin, airlines have never willingly shared data historically. But in times of crisis, we unite! And we are proud to see that our global partners can look beyond themselves for the greater good of the entire industry. And we are so proud to be a part of this historical moment.

We have done a lot since our last update, so let’s quickly summarize our progress.

Designing Predictive Features and Training Predictive Models

If you remember from the previous updates, one of our goals is to model the chain of causality between the epidemic and consumer demand (e.g. bookings, or other demand signals). Leveraging third-party data sources (e.g. epidemic data from Johns Hopkins, and government response data from Oxford Univ.), we have derived several predictive features from this data. These include simple normalization of the epidemic rate by the population density to the restrictiveness of government policies along a specific flight route. The figure below illustrates the relative importance of the different feature categories we’ve used to predict airline bookings in different regions.
Feature importance: regional differences infographic
Using these features, we have trained several different kinds of models—from the highly interpretable linear models with limited predictive power, to a random forest model that’s hard to interpret, but predicts much better (shown in the figure below). These models were tested on a held-out dataset, and their prediction accuracy is measured using Pearson’s correlation coefficient. Due to the significant difference in predictive performance, we will be using the random forest as our model of choice in future investigations.
Model test accuracy: linear vs. Nonlinear models infographic

If you want to dive deeper into the modeling details, our task force team members have presented this work several times through different venues. Feel free to check out the video recording of the following events. But rest assured that no carrier-specific results were revealed.

  1. COVID-19 Task Force Webinars should have the most detail (however, these are accessible for COVID-19 Task Force participants only). You should’ve joined!
  2. Virtual Outperform 2020
  3. AGIFORS Symposium 2020
  4. Ken Kennedy Institute Data Science Conference at Rice Univ.

What Can We Do with a Predictive Model?

Now we have a model that can predict booking, so what? Keep in mind that this model will provide you with a predicted level of booking based on the input data (i.e. epidemic and government response). So there are really only 2 things we can do:

  1. Wait
  2. Simulate scenarios

The first approach is that we can simply wait until the epidemic slows down and the governments’ travel restrictions are relaxed, then we can feed our model with real data from that future time. No doubt our model would produce a realistic booking forecast based on the real input of the future. However, this often doesn’t provide businesses with enough lead time to plan, prepare, and respond. Because when bookings are recovering and changing, businesses usually need to deal with many moving parts to optimize their operation.

The alternative approach is that we can run some scenario simulations. We can make an educated guess about what the epidemic condition and government response might be some time in the future. Then we feed this potential scenario to our model and let it make a booking forecast based on the hypothetical scenario.

This approach is beneficial for 2 reasons. The first benefit is that we can simulate multiple scenarios and observe potential recovery trajectories. This will allow us to get a sense of what we could expect under all the different simulated scenarios, and this would allow us to prepare for the likely scenarios. Second, we can do this before the demand is actually returning while most of the business operation is still fairly static.

Due to these practical benefits for an airline, we will be focusing on the second approach. Now, the challenge becomes how can we come up with a good guess about the future condition of the epidemic and government response.

Coming Up with Good Scenarios

So how can we generate good scenarios (i.e. those that represent plausible futures, and not just any possible future)? Remember, our random forest model requires 2 sets of inputs (i.e. epidemic data and government response data). Therefore, we need a good estimate of the future epidemic condition and future government response stringency. To do this, we look to another data source for help.
Institute for Health  Metrics and Evaluation (IHME) logo
The Institute for Health Metrics and Evaluation (IHME) at UW Medicine publishes future projections of the COVID-19 pandemic at a global scale with a high degree of granularity. IHME provides the future projection of COVID-19 cases, deaths, and many other COVID-19 related metrics (e.g. testing, hospital resources, mask usage, etc) at the country level. These are precisely the epidemic inputs we need for our random forest booking-prediction model.

What about the government response? Although it’s very hard to accurately predict how different governments would respond to future epidemic conditions, we can leverage a proxy. Since the most important government responses to COVID-19 involve restricting its citizens' mobility to curtail COVID-19 spread, a country’s mobility data should be highly correlated to its government’s response stringency.

Evidently, IHME also tracks people’s mobility based on anonymized cell phone movement data as a measure of a country’s social distancing effectiveness. A time series projection of this mobility would serve as a good proxy for the government response stringency, which is the second set of inputs to our booking-forecast model.

Now we have an educated guess of the future epidemic trajectory and a good proxy for the future government response stringency. This means we now have all the ingredients we need to make a reasonable prediction of how booking would return in the future. The following flow chart illustrates our forecasting pipeline. We are now using this pipeline to forecast booking recovery for our COVID-19 Task Force participants. And next time we should be able to share some aggregated regional-level predictions.
Covid taskforce forecast flow chart


Together with the participants in our COVID-19 Task Force, we’ve built a crystal ball for the airline industry by leveraging multiple third-party data sources. This crystal ball (i.e. our random forest booking-forecast model) is unique in several ways:

  1. It’s built by experts in the airline industry.
  2. It’s trained with real booking data aggregated from more than a dozen of some of the largest airlines in the world. This is truly unique and unprecedented.
  3. It’s validated for its ability to predict by testing on a held-out data set.
  4. It’s a data-dependent model. We must provide 2 sets of inputs (i.e. epidemic conditions and government response stringency) that represent realistic or reasonably good guesses about the future state, so we are not just making a forecast out of thin air.

Now that we have a unique crystal ball, let’s see what it will tell us next time. So stay tuned for the next COVID-19 Task Force update, because the future will be revealed. Remember, it’s not too late to join us. Please consult your CSM about your eligibility.

If you missed any of the previous updates, they are all accessible here:

  1. PROS Assembles COVID-19 Task Force (CTF) to Help Customers
  2. CTF Update 1: Managing Your Business Under Crisis
  3. CTF Update 2: Revenue Management Under Lockdown
  4. CTF Update 3: What Are We Doing With Your Data

About the Author

Michael Wu

Dr. Michael Wu is one of the world’s premier authorities on artificial intelligence (AI), machine learning (ML), data science, and behavioral economics. He’s the Chief AI Strategist at PROS (NYSE: PRO), an AI-powered SaaS provider that helps companies monetize more efficiently in the digital economy. He’s been appointed as a Senior Research Fellow at the Ecole des Ponts Business School for his work in Data Science. Prior to PROS, Michael was the Chief Scientist at Lithium for a decade, where he focuses on developing predictive and prescriptive algorithms to extract insights from social media big data. His research spans many areas, including customer experience, CRM, online influence, gamification, digital transformation, AI, etc. His R&D won him the recognition as an Influential Leader by CRM Magazine along with Mark Zuckerberg, Marc Benioff and other industry giants. Michael has served as a DOE fellow at the Los Alamos National Lab conducting research in face recognition and was awarded 4 years of full fellowship under the Computational Science Graduate Fellowship. Prior to industry, Michael received his triple major undergraduate degree in Applied Math, Physics, and Molecular & Cell Biology; and his Ph.D. from UC Berkeley’s Biophysics program, where he uses machine learning to model visual processing within the human brain. Michael believes in knowledge dissemination, and speaks internationally at universities, conferences, and enterprises. His insights have inspired many global enterprises and are made accessible through “The Science of Social,” and “The Science of Social 2”—two easy-reading e-books.

More Content by Michael Wu
Previous Article
What Matters to the C-Suite Right Now?
What Matters to the C-Suite Right Now?

Get insights on how to get and keep the attention of executives today.

Next Article
Remedying Revenue Leakage with Surgical Precision
Remedying Revenue Leakage with Surgical Precision

Learn how this medical technology manufacturer boosts its sales effectiveness using PROS science-driven pri...