blog

Run-way Functions: Predict Reconfigurations at US Airports - Benchmark


by Allen Downey, Robert Gibboni

NASA Airport Configuration Benchmark

Coordinating the national airspace is a fascinating problem affecting individual travelers, the economy, and the environment. One important piece of the puzzle is how airport operators make best use of an airport's many runways. The specific set of runways active for departing and arriving flights is the "airport configuration," and knowing when to use which configuration is an important piece of the air traffic management puzzle.

Thanks to NASA, the FAA, airlines, and other agencies, there is a massive stream of real-time data available to aid in effective decision making. Now it's up to you to learn how to best use it! Below is a diagram of an airport configured in a "north flow" with 2 runways active for departures (orange arrows), 2 for arrivals (blue arrows) and 4 inactive runways (thick black lines).

The Run-way Functions: Predict Reconfigurations at US Airports competition sets you up with historical air traffic, airport configuration, and weather data to learn how runways at 10 airports will be configured in the future. Check out the Problem description for more background.

This blog post is intended to familiarize you with the competition data and to get started thinking about how to approach this challenge. It reads the training data and one of the data files, makes predictions with a relatively simple algorithm, and computes the scoring function.

You can use snippets from this notebook in your solution and you can use this algorithm as a starting place (but you don't have to!).

In [1]:
from pathlib import Path
from typing import Sequence, Tuple

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

mpl.rcParams["figure.dpi"] = 100
sns.set_style("whitegrid")
pd.options.display.width = 200

Get the data

To run this notebook, you'll first need to download the following files from the Data download page (available after you've joined the competition):

  • Training features (train_features.tar): A tar archive containing air traffic, airport configuration, and weather features for the 10 airports from Nov. 2020 to Nov. 2021. Each feature is saved as a bzip2-compressed CSV.

  • Training labels (open_train_labels.csv.bz2): Labels for the open arena training period.

  • Submission format (open_submission_format.csv): A simple submission that predicts a uniform probability across all configurations. Use this an example of a properly formatted submission.

Once you have downloaded the data files, you can unpack the tar file like this:

tar -xf train_features.tar

And then you can delete the tar file.

Make sure that the following path leads to the data files you downloaded. If the files are in the same directory as the notebook, set the path to ".".

In [2]:
DATA_PATH = Path("../data")

The following cell lists the data directories for each airport.

In [3]:
airport_directories = sorted(path for path in DATA_PATH.glob("k*"))
airport_directories
Out[3]:
[PosixPath('../data/katl'),
 PosixPath('../data/kclt'),
 PosixPath('../data/kden'),
 PosixPath('../data/kdfw'),
 PosixPath('../data/kjfk'),
 PosixPath('../data/kmem'),
 PosixPath('../data/kmia'),
 PosixPath('../data/kord'),
 PosixPath('../data/kphx'),
 PosixPath('../data/ksea')]

There are 10 airports spread throughout the continental US. Here's a map showing their locations.

Location of airports

Picking one airport as an example, here are the data files.

In [4]:
airport_directory = airport_directories[0]
data_files = list(airport_directory.glob("*"))
data_files
Out[4]:
[PosixPath('../data/katl/katl_airport_config.csv.bz2'),
 PosixPath('../data/katl/katl_tfm_estimated_runway_arrival_time.csv.bz2'),
 PosixPath('../data/katl/katl_departure_runway.csv.bz2'),
 PosixPath('../data/katl/katl_etd.csv.bz2'),
 PosixPath('../data/katl/katl_mfs_runway_arrival_time.csv.bz2'),
 PosixPath('../data/katl/katl_mfs_stand_arrival_time.csv.bz2'),
 PosixPath('../data/katl/katl_first_position.csv.bz2'),
 PosixPath('../data/katl/katl_tbfm_scheduled_runway_arrival_time.csv.bz2'),
 PosixPath('../data/katl/katl_lamp.csv.bz2'),
 PosixPath('../data/katl/katl_mfs_stand_departure_time.csv.bz2'),
 PosixPath('../data/katl/katl_arrival_runway.csv.bz2'),
 PosixPath('../data/katl/katl_mfs_runway_departure_time.csv.bz2')]

The one we'll use in this notebook is katl_airport_config.csv.bz2, which contains the actual configurations during the training period.

In [5]:
airport_code = airport_directory.name
filename = f"{airport_code}_airport_config.csv.bz2"
filepath = airport_directory / filename
filepath
Out[5]:
PosixPath('../data/katl/katl_airport_config.csv.bz2')

We can read the file like this, and display the first few lines.

In [6]:
airport_config_df = pd.read_csv(filepath, parse_dates=["timestamp"])
print(airport_config_df.shape)
airport_config_df.head()
(12431, 2)
Out[6]:
timestamp airport_config
0 2020-11-01 01:11:00 D_8R_9L_A_10_8L_9R
1 2020-11-01 01:57:00 D_8R_9L_A_10_8L_9R
2 2020-11-01 02:53:00 D_8R_9L_A_10_8L_9R
3 2020-11-01 03:54:00 D_8R_9L_A_10_8L_9R
4 2020-11-01 04:52:00 D_8R_9L_A_10_8L_9R
  • timestamp is the time when the configuration was recorded; they are a roughly one-hour intervals.

  • airport_config is a string that represents the runways in use at the given time. For example, D_8R_9L_A_10_8L_9R means that the runways in use for departure were 8R and 9L, and the runways in use for arrivals were 10, 8L, and 9R.

The following function takes the path to the data directory for a given airport, reads the airport configuration file, and returns the airport code and a Pandas DataFrame containing the data.

In [7]:
def read_airport_configs(airport_directory: Path) -> Tuple[str, pd.DataFrame]:
    """Reads the airport configuration features for a given airport data directory."""
    airport_code = airport_directory.name
    filename = f"{airport_code}_airport_config.csv.bz2"
    filepath = airport_directory / filename
    airport_config_df = pd.read_csv(filepath, parse_dates=["timestamp"])
    return airport_code, airport_config_df

I'll use this function to make a dictionary that maps from each airport code to the airport configuration DataFrame.

In [8]:
airport_config_df_map = {}

for airport_directory in sorted(airport_directories):
    airport_code, airport_config_df = read_airport_configs(airport_directory)
    print(airport_code)
    airport_config_df_map[airport_code] = airport_config_df
katl
kclt
kden
kdfw
kjfk
kmem
kmia
kord
kphx
ksea

To check the contents of this data structure, here are the airport codes and the lengths of the airport configuration DataFrames.

In [9]:
for airport_code, df in airport_config_df_map.items():
    print(f"{airport_code}: {len(df):>8,}")
katl:   12,431
kclt:   11,141
kden:   12,145
kdfw:   10,997
kjfk:   10,894
kmem:   12,958
kmia:   11,400
kord:   14,355
kphx:    9,050
ksea:   10,776

Read the training labels

Let's load the training data.

In [10]:
open_train_labels = pd.read_csv(
    DATA_PATH / "open_train_labels.csv.bz2", parse_dates=["timestamp"]
)
open_train_labels.shape
Out[10]:
(22640544, 5)

There are about 22 million rows. Here's what the first few rows look like.

In [11]:
open_train_labels.head()
Out[11]:
airport timestamp lookahead config active
0 katl 2020-11-06 23:00:00 30 katl:D_10_8L_A_10_8L 0.0
1 katl 2020-11-06 23:00:00 30 katl:D_10_8R_9L_A_10_8L_9R 0.0
2 katl 2020-11-06 23:00:00 30 katl:D_10_8R_A_10_8R 0.0
3 katl 2020-11-06 23:00:00 30 katl:D_26L_27L_A_26R_27L_28 0.0
4 katl 2020-11-06 23:00:00 30 katl:D_26L_27R_28_A_26R_27L_28 0.0

The columns are:

  • airport: string airport code
  • timestamp: pd.Timestamp with the time when we should generate a prediction.
  • lookahead: number of minutes in the future we're trying to predict.
  • config: string representing a possible configuration
  • active: float 0 or 1 indicating whether a particular configuration was actually active during the future we're trying to predict.

In open_train_labels, only one configuration is active (assigned probability 1) and the others are inactive (assigned probability 0).

Our job is to generate predictions in the same format as the training labels, except that the values in active will be probabilities between 0 to 1.

Get a sample frame

One way to think about open_train_labels is as a sequence of "prediction frames", where each frame represents a prediction for a given airport, at a particular time, for a given point in the future.

Each frame contains one row for each of the most common configurations (for that airport), plus a catch-all bucket called other.

To iterate through the frames in predictions, we can make a groupby object.

In [12]:
sample_frame_groups = open_train_labels.groupby(["airport", "timestamp", "lookahead"], sort=False)

As an example, I'll select just the first frame.

In [13]:
for name, pred_frame in sample_frame_groups:
    print(name)
    break
('katl', Timestamp('2020-11-06 23:00:00'), 30)
In [14]:
pred_frame
Out[14]:
airport timestamp lookahead config active
0 katl 2020-11-06 23:00:00 30 katl:D_10_8L_A_10_8L 0.0
1 katl 2020-11-06 23:00:00 30 katl:D_10_8R_9L_A_10_8L_9R 0.0
2 katl 2020-11-06 23:00:00 30 katl:D_10_8R_A_10_8R 0.0
3 katl 2020-11-06 23:00:00 30 katl:D_26L_27L_A_26R_27L_28 0.0
4 katl 2020-11-06 23:00:00 30 katl:D_26L_27R_28_A_26R_27L_28 0.0
5 katl 2020-11-06 23:00:00 30 katl:D_26L_27R_A_26L_27L_28 0.0
6 katl 2020-11-06 23:00:00 30 katl:D_26L_27R_A_26L_27R 0.0
7 katl 2020-11-06 23:00:00 30 katl:D_26L_27R_A_26L_27R_28 0.0
8 katl 2020-11-06 23:00:00 30 katl:D_26L_27R_A_26R_27L 0.0
9 katl 2020-11-06 23:00:00 30 katl:D_26L_27R_A_26R_27L_28 0.0
10 katl 2020-11-06 23:00:00 30 katl:D_26L_27R_A_26R_27R_28 0.0
11 katl 2020-11-06 23:00:00 30 katl:D_26L_27R_A_26R_28 0.0
12 katl 2020-11-06 23:00:00 30 katl:D_26L_27R_A_27L_28 0.0
13 katl 2020-11-06 23:00:00 30 katl:D_26L_28_A_26L_28 0.0
14 katl 2020-11-06 23:00:00 30 katl:D_26L_28_A_26R_27L_28 0.0
15 katl 2020-11-06 23:00:00 30 katl:D_26L_28_A_26R_28 0.0
16 katl 2020-11-06 23:00:00 30 katl:D_26R_27R_A_26R_27L_28 0.0
17 katl 2020-11-06 23:00:00 30 katl:D_26R_28_A_26R_28 0.0
18 katl 2020-11-06 23:00:00 30 katl:D_8L_9L_A_10_8L_9R 0.0
19 katl 2020-11-06 23:00:00 30 katl:D_8R_9L_A_10_8L_9R 1.0
20 katl 2020-11-06 23:00:00 30 katl:D_8R_9L_A_10_8R_9R 0.0
21 katl 2020-11-06 23:00:00 30 katl:D_8R_9L_A_10_9R 0.0
22 katl 2020-11-06 23:00:00 30 katl:D_8R_9L_A_8L_9R 0.0
23 katl 2020-11-06 23:00:00 30 katl:D_8R_9L_A_8R_9L 0.0
24 katl 2020-11-06 23:00:00 30 katl:D_8R_9R_A_10_8L_9R 0.0
25 katl 2020-11-06 23:00:00 30 katl:D_9L_A_9R 0.0
26 katl 2020-11-06 23:00:00 30 katl:other 0.0

This frame represents a prediction for katl at 2020-11-06 23:00:00 for 30 minutes in the future. Our job is to fill the active column with probabilities.

Given a prediction frame, we can get the airport code, timestamp, and lookahead like this:

In [15]:
first = pred_frame.iloc[0]
airport_code = first["airport"]
timestamp = first["timestamp"]
lookahead = first["lookahead"]

print(f"Airport code: {airport_code}")
print(f"Timestamp: {timestamp}")
print(f"Lookahead: {lookahead}")
Airport code: katl
Timestamp: 2020-11-06 23:00:00
Lookahead: 30

Uniform prediction

As a simple example, suppose we have no reason to think one configuration is more likely than another. In that case, we could generate a uniform distribution where each configuration has the same probability.

The following function takes a prediction frame and returns a Pandas Series that represent a predictive distribution.

In [16]:
def make_uniform(pred_frame: pd.DataFrame) -> pd.Series:
    indices = pred_frame["config"].values
    uniform = pd.Series(1, index=indices)
    uniform /= uniform.sum()
    return uniform

Here's a the result using the prediction frame from the previous section as an example.

In [17]:
pred_dist = make_uniform(pred_frame)
pred_dist
Out[17]:
katl:D_10_8L_A_10_8L              0.037037
katl:D_10_8R_9L_A_10_8L_9R        0.037037
katl:D_10_8R_A_10_8R              0.037037
katl:D_26L_27L_A_26R_27L_28       0.037037
katl:D_26L_27R_28_A_26R_27L_28    0.037037
katl:D_26L_27R_A_26L_27L_28       0.037037
katl:D_26L_27R_A_26L_27R          0.037037
katl:D_26L_27R_A_26L_27R_28       0.037037
katl:D_26L_27R_A_26R_27L          0.037037
katl:D_26L_27R_A_26R_27L_28       0.037037
katl:D_26L_27R_A_26R_27R_28       0.037037
katl:D_26L_27R_A_26R_28           0.037037
katl:D_26L_27R_A_27L_28           0.037037
katl:D_26L_28_A_26L_28            0.037037
katl:D_26L_28_A_26R_27L_28        0.037037
katl:D_26L_28_A_26R_28            0.037037
katl:D_26R_27R_A_26R_27L_28       0.037037
katl:D_26R_28_A_26R_28            0.037037
katl:D_8L_9L_A_10_8L_9R           0.037037
katl:D_8R_9L_A_10_8L_9R           0.037037
katl:D_8R_9L_A_10_8R_9R           0.037037
katl:D_8R_9L_A_10_9R              0.037037
katl:D_8R_9L_A_8L_9R              0.037037
katl:D_8R_9L_A_8R_9L              0.037037
katl:D_8R_9R_A_10_8L_9R           0.037037
katl:D_9L_A_9R                    0.037037
katl:other                        0.037037
dtype: float64

Make distribution of configurations

Now let's think about making predictions. In reality, we will have good reasons to think that some configurations are more likely than others; that's the whole point of the competition!

Suppose we know that the configuration has changed, but we don't know what the new configuration is. If that's all the information we have, a reasonable predictive distribution would be the distribution of configurations we've seen in the past.

To demonstrate, I'll select the sequence of configurations from one airport.

In [18]:
airport_code = "katl"
airport_config_df = airport_config_df_map[airport_code]
airport_config_df.head()
Out[18]:
timestamp airport_config
0 2020-11-01 01:11:00 D_8R_9L_A_10_8L_9R
1 2020-11-01 01:57:00 D_8R_9L_A_10_8L_9R
2 2020-11-01 02:53:00 D_8R_9L_A_10_8L_9R
3 2020-11-01 03:54:00 D_8R_9L_A_10_8L_9R
4 2020-11-01 04:52:00 D_8R_9L_A_10_8L_9R

The airport configuration dataset is comprised of reports, and the amount of time between reports is not guaranteed to be equal. So the number of times a configuration appears in the dataset is not the same as the amount of time that configuration is active. To get that, we'll first resample the configuration data to regular intervals; let's say every 15 minutes. Note the use of ffill which propagates non-null values forward to fill in missing values (thus making sure we aren't leaking information from the future).

In [19]:
config_timecourse = (
    airport_config_df.set_index("timestamp")
    .airport_config.resample("15min")
    .ffill()
    .dropna()
)
config_timecourse.head()
Out[19]:
timestamp
2020-11-01 01:15:00    D_8R_9L_A_10_8L_9R
2020-11-01 01:30:00    D_8R_9L_A_10_8L_9R
2020-11-01 01:45:00    D_8R_9L_A_10_8L_9R
2020-11-01 02:00:00    D_8R_9L_A_10_8L_9R
2020-11-01 02:15:00    D_8R_9L_A_10_8L_9R
Freq: 15T, Name: airport_config, dtype: object

Now we can count the number of occurrences with value_counts to know how long each configuration was active.

In [20]:
config_dist = config_timecourse.value_counts().sort_values(ascending=False)
print(f"Number of unique configurations at {airport_code.upper()}: {len(config_dist)}")
config_dist
Number of unique configurations at KATL: 156
Out[20]:
D_26L_27R_A_26R_27L_28       15761
D_8R_9L_A_10_8L_9R           10317
D_26L_27R_28_A_26R_27L_28      464
D_26L_27R_A_26L_27L_28         409
D_26L_28_A_26L_28              403
                             ...  
D_27L_27R_A_27L_28               1
D_10_9R_A_10_8L_9R               1
D_26L_27R_28_A_26R_27R_28        1
D_27R_8R_9L_A_10_9R              1
D_8R_A_10_8L                     1
Name: airport_config, Length: 156, dtype: int64

In the course of a year, KATL uses 156 unique configurations! Other airports use more than 200. Still, airports tend to favor a much smaller subset of configurations the majority of the time. At KATL, just 2 configurations account for 74% of the time. For this competition, the prediction target groups uncommon configurations are grouped into an "other" bin.

In [21]:
ax = ((config_dist / config_dist.sum()) * 100).head(20).plot(kind="bar", figsize=(8, 3))
ax.set_title("Configuration distribution for top 20 configurations")
ax.set_ylabel("Percent of time active")
ax.yaxis.set_major_formatter(mpl.ticker.PercentFormatter(decimals=0))
plt.show()

One assumption this approach makes is that the distribution of configurations is stationary, i.e., it does not change over time. Let's use the configuration timecourse to visualize how well assumption holds up. We'll breeze by the pandas transformations for now, but feel free to explore the details on your own.

In [22]:
import colorcet as cc
from cycler import cycler

with mpl.rc_context({"axes.prop_cycle": cycler(color=cc.glasbey_bw)}):
    fig, ax = plt.subplots(figsize=(10, 6))
    ax.set_axisbelow(True)
    sns.set_style("whitegrid")
    x = (
        config_timecourse.reset_index()
        .groupby(
            [
                pd.Grouper(key="timestamp", freq="7D", axis=0),
                "airport_config",
            ]
        )
        .size()
        .unstack()
        .fillna(0)
        .astype(int)
    ) / 4
    x.index = x.index.astype(str)
    x[x.sum().sort_values(ascending=False).index].plot(
        kind="bar", stacked=True, width=1, ax=ax
    )
    handles, labels = ax.get_legend_handles_labels()
    ax.legend(handles[:20], labels[:20], bbox_to_anchor=(1, 1), loc="upper left")
    ax.set_title(f"Configuration distribution by week for {airport_code.upper()}")
    ax.set_xlabel("")
    ax.set_ylabel("Hours per week")
    plt.show()

There is definitely some variation week-to-week, with first and second most common swapping places occasionally. Overall, I'd say there aren't any seismic shifts in the distribution, although (spoiler alert) that isn't the case for all of the airports. You'll have to consider how to handle those cases; for example, you might upweight more recent time periods when computing your distribution.

Notice that the labels in this series do not include the airport code. To make them consistent with the config column in open_train_labels, we have to prepend the airport code, which we can do like this:

In [23]:
prefix = pd.Series(f"{airport_code}:", index=config_dist.index)
config_dist.index = prefix.str.cat(config_dist.index)
config_dist
Out[23]:
katl:D_26L_27R_A_26R_27L_28       15761
katl:D_8R_9L_A_10_8L_9R           10317
katl:D_26L_27R_28_A_26R_27L_28      464
katl:D_26L_27R_A_26L_27L_28         409
katl:D_26L_28_A_26L_28              403
                                  ...  
katl:D_27L_27R_A_27L_28               1
katl:D_10_9R_A_10_8L_9R               1
katl:D_26L_27R_28_A_26R_27R_28        1
katl:D_27R_8R_9L_A_10_9R              1
katl:D_8R_A_10_8L                     1
Name: airport_config, Length: 156, dtype: int64

The following function encapsulates these steps. It takes an airport code, a DataFrame containing timestamps and configurations, and a boolean that indicates whether the results should be normalized so they add up to 1 (like valid probabilities should). It returns a Pandas Series that represents the distribution of past configurations.

In [24]:
def make_config_dist(
    airport_code: str, airport_config_df: pd.DataFrame, normalize: bool = False
) -> pd.Series:
    config_timecourse = (
        airport_config_df.set_index("timestamp")
        .airport_config
        .resample("15min")
        .ffill()
        .dropna()
    )
    config_dist = config_timecourse.value_counts()
    if normalize:
        config_dist /= config_dist.sum()

    # prepend the airport code to the configuration strings
    prefix = pd.Series(f"{airport_code}:", index=config_dist.index)
    config_dist.index = prefix.str.cat(config_dist.index)

    return config_dist

Here's an example.

In [25]:
make_config_dist(airport_code, airport_config_df)
Out[25]:
katl:D_26L_27R_A_26R_27L_28       15761
katl:D_8R_9L_A_10_8L_9R           10317
katl:D_26L_27R_28_A_26R_27L_28      464
katl:D_26L_27R_A_26L_27L_28         409
katl:D_26L_28_A_26L_28              403
                                  ...  
katl:D_27L_28_A_26R_27L_28            1
katl:D_8R_9L_A_8L                     1
katl:D_8R_9L_A_8R                     1
katl:D_26R_27R_A_27L_28               1
katl:D_8R_A_10_8L                     1
Name: airport_config, Length: 156, dtype: int64

Censor data

Using the distribution of historic airport configurations is a great first step. However, we have to be careful that we don't use information from the future to predict the future. That would be cheating, since no real-time solution could know which configurations are common or uncommon in the future.

So, before we compute the distribution of configurations, we have to censor the data, cutting off everything that comes after the time when we are making the predictions.

As an example, let's get the actual configurations for the airport in the running example.

In [26]:
airport_config_df = airport_config_df_map[airport_code]
airport_config_df.head(10)
Out[26]:
timestamp airport_config
0 2020-11-01 01:11:00 D_8R_9L_A_10_8L_9R
1 2020-11-01 01:57:00 D_8R_9L_A_10_8L_9R
2 2020-11-01 02:53:00 D_8R_9L_A_10_8L_9R
3 2020-11-01 03:54:00 D_8R_9L_A_10_8L_9R
4 2020-11-01 04:52:00 D_8R_9L_A_10_8L_9R
5 2020-11-01 05:27:00 D_8R_9L_A_10_8L_9R
6 2020-11-01 05:40:00 D_8L_9L_A_10_8L_9R
7 2020-11-01 06:23:00 D_8L_9L_A_10_8L_9R
8 2020-11-01 06:53:00 D_8L_9L_A_10_8L_9R
9 2020-11-01 07:53:00 D_8L_9L_A_10_8L_9R

And let's suppose we are asked to make a prediction at 2020-11-01 05:00:00.

In [27]:
timestamp = pd.Timestamp("2020-11-01 05:00:00")
timestamp
Out[27]:
Timestamp('2020-11-01 05:00:00')

We are only allowed to use data from before the prediction timestamp.

In [28]:
mask = airport_config_df["timestamp"] <= timestamp
mask.sum()
Out[28]:
5
In [29]:
subset = airport_config_df[mask]
subset
Out[29]:
timestamp airport_config
0 2020-11-01 01:11:00 D_8R_9L_A_10_8L_9R
1 2020-11-01 01:57:00 D_8R_9L_A_10_8L_9R
2 2020-11-01 02:53:00 D_8R_9L_A_10_8L_9R
3 2020-11-01 03:54:00 D_8R_9L_A_10_8L_9R
4 2020-11-01 04:52:00 D_8R_9L_A_10_8L_9R

While we're at it, it will be useful to extract the last configuration from the label data, which should be the current configuration (at the point where we make the prediction).

In [30]:
current = subset.iloc[-1]
current["airport_config"]
Out[30]:
'D_8R_9L_A_10_8L_9R'

The following function encapsulates these steps. It takes the complete list of configurations for a particular airport and the timestamp for the current prediction.

It returns the current configuration and the subset of configurations from prior to the timestamp.

In [31]:
def censor_data(
    airport_config_df: pd.DataFrame, timestamp: pd.Timestamp
) -> Tuple[str, pd.DataFrame]:
    mask = airport_config_df["timestamp"] <= timestamp
    subset = airport_config_df[mask]
    current = subset.iloc[-1].airport_config
    return current, subset

Here's an example.

In [32]:
censor_data(airport_config_df, timestamp)
Out[32]:
('D_8R_9L_A_10_8L_9R',
             timestamp      airport_config
 0 2020-11-01 01:11:00  D_8R_9L_A_10_8L_9R
 1 2020-11-01 01:57:00  D_8R_9L_A_10_8L_9R
 2 2020-11-01 02:53:00  D_8R_9L_A_10_8L_9R
 3 2020-11-01 03:54:00  D_8R_9L_A_10_8L_9R
 4 2020-11-01 04:52:00  D_8R_9L_A_10_8L_9R)

Prediction based on data

Now we're ready to make predictions that actually use data. As a baseline strategy, I'll make a predictive distribution that's a weighted mixture of three distributions:

  • A uniform distribution that makes sure we give non-zero probability to all possible configurations,

  • The distribution of past configurations, and

  • A point mass on the current configuration.

Using the same prediction frame we've been using, we'll construct a DataFrame in which each component distribution is in a column. Then we'll sum the columns together and normalize the result to sum to 1. Let's start with the uniform distribution:

In [33]:
predictive_distribution = pd.DataFrame({"uniform": make_uniform(pred_frame)})

Next we have to censor the data so we're only using history, not future information.

In [34]:
first = pred_frame.iloc[0]
airport_code, timestamp, lookahead, _, _ = first
print(f"Airport code: {airport_code}")
print(f"Timestamp: {timestamp}")
print(f"Lookahead: {lookahead}")
Airport code: katl
Timestamp: 2020-11-06 23:00:00
Lookahead: 30
In [35]:
airport_config_df = airport_config_df_map[airport_code]
current, subset = censor_data(airport_config_df, timestamp)
print(f"Current configuration: {current}")

subset.tail()
Current configuration: D_8R_9L_A_10_8L_9R
Out[35]:
timestamp airport_config
199 2020-11-06 18:53:00 D_8R_9L_A_10_8L_9R
200 2020-11-06 19:53:00 D_8R_9L_A_10_8L_9R
201 2020-11-06 20:54:00 D_8R_9L_A_10_8L_9R
202 2020-11-06 21:54:00 D_8R_9L_A_10_8L_9R
203 2020-11-06 22:54:00 D_8R_9L_A_10_8L_9R

Here's the distribution of past configurations.

In [36]:
config_dist = make_config_dist(airport_code, subset, normalize=True)
config_dist
Out[36]:
katl:D_26L_27R_A_26R_27L_28        0.442681
katl:D_8R_9L_A_10_8L_9R            0.301587
katl:D_8L_9L_A_10_8L_9R            0.102293
katl:D_9L_A_10_9R                  0.035273
katl:D_8R_9L_A_10_8R_9R            0.028219
katl:D_26R_27R_A_26R_27L_28        0.021164
katl:D_8L_9L_A_10_8L_9L            0.021164
katl:D_9L_A_10_8L_9R               0.015873
katl:D_26L_26R_27R_A_26R_27L_28    0.014109
katl:D_26R_27R_A_26R_27R_28        0.010582
katl:D_8L_8R_9L_A_10_8L_9R         0.007055
Name: airport_config, dtype: float64

At this point, there might be configurations in uniform that are not present in the censored subset, so they don't appear in config_dist. And there might be configurations in config_dist that don't appear in uniform because they've been grouped into other in the labels.

We can use reindex to look up each of the configurations in uniform in config_dist. At that point, the resulting series will have NaN values for those configurations not present in the censored subset and those that are grouped into other in the labels.

Then we allocate the values from config_dist that didn't get selected to other.

In [37]:
predictive_distribution["config_dist"] = config_dist.reindex(predictive_distribution.index).fillna(0)

other = config_dist.sum() - predictive_distribution.config_dist.sum()
predictive_distribution.loc[f"{airport_code}:other", "config_dist"] += other

Add more probability mass to the current configuration.

In [38]:
predictive_distribution["current"] = 0  # initalize a column of zeros
predictive_distribution.loc[f"{airport_code}:{current}", "current"] = 1  # set the current configuration to 1

Finally, we'll combine the components and normalize the predictive distribution so it adds up to 1.

In [39]:
mixture = predictive_distribution.sum(axis=1)
predictive_distribution["mixture"] =  mixture / mixture.sum()

Now let's look at the values of the three component probabilities and their mixture:

In [40]:
predictive_distribution
Out[40]:
uniform config_dist current mixture
katl:D_10_8L_A_10_8L 0.037037 0.000000 0 0.012346
katl:D_10_8R_9L_A_10_8L_9R 0.037037 0.000000 0 0.012346
katl:D_10_8R_A_10_8R 0.037037 0.000000 0 0.012346
katl:D_26L_27L_A_26R_27L_28 0.037037 0.000000 0 0.012346
katl:D_26L_27R_28_A_26R_27L_28 0.037037 0.000000 0 0.012346
katl:D_26L_27R_A_26L_27L_28 0.037037 0.000000 0 0.012346
katl:D_26L_27R_A_26L_27R 0.037037 0.000000 0 0.012346
katl:D_26L_27R_A_26L_27R_28 0.037037 0.000000 0 0.012346
katl:D_26L_27R_A_26R_27L 0.037037 0.000000 0 0.012346
katl:D_26L_27R_A_26R_27L_28 0.037037 0.442681 0 0.159906
katl:D_26L_27R_A_26R_27R_28 0.037037 0.000000 0 0.012346
katl:D_26L_27R_A_26R_28 0.037037 0.000000 0 0.012346
katl:D_26L_27R_A_27L_28 0.037037 0.000000 0 0.012346
katl:D_26L_28_A_26L_28 0.037037 0.000000 0 0.012346
katl:D_26L_28_A_26R_27L_28 0.037037 0.000000 0 0.012346
katl:D_26L_28_A_26R_28 0.037037 0.000000 0 0.012346
katl:D_26R_27R_A_26R_27L_28 0.037037 0.021164 0 0.019400
katl:D_26R_28_A_26R_28 0.037037 0.000000 0 0.012346
katl:D_8L_9L_A_10_8L_9R 0.037037 0.102293 0 0.046443
katl:D_8R_9L_A_10_8L_9R 0.037037 0.301587 1 0.446208
katl:D_8R_9L_A_10_8R_9R 0.037037 0.028219 0 0.021752
katl:D_8R_9L_A_10_9R 0.037037 0.000000 0 0.012346
katl:D_8R_9L_A_8L_9R 0.037037 0.000000 0 0.012346
katl:D_8R_9L_A_8R_9L 0.037037 0.000000 0 0.012346
katl:D_8R_9R_A_10_8L_9R 0.037037 0.000000 0 0.012346
katl:D_9L_A_9R 0.037037 0.000000 0 0.012346
katl:other 0.037037 0.104056 0 0.047031
In [41]:
predictive_distribution.rename(
    columns={
        "uniform": "Uniform",
        "config_dist": "Past",
        "current": "Current",
        "mixture": "Mixture",
    }
).plot(kind="bar", subplots=(1, 4), figsize=(6, 8), sharex=True, legend=False)

plt.subplots_adjust(hspace=0.3)
plt.show()

If things have gone right, the index of predictive_distribution should be the same as the list of configurations in pred_frame.

In [42]:
np.array_equal(predictive_distribution.index.values, pred_frame["config"].values)
Out[42]:
True

The following function combines all of this logic and adds in three parameters:

  • hedge: the weight assigned to the uniform distribution (which hedges our bets by giving some probability to all configurations).
  • weight: the weight assigned to current configuration.
  • discount_factor: how much to decrease the importance of the current configuration as the lookahead time increases.

The relationship between weight and discount_factor is:

discount = 1 - discount_factor * (lookahead / 360)
pred_dist[current_key] += weight * discount

For example, with weight=6 and discount_factor=0.66, the effective weight for the present configuration is close to 6 for small values of lookahead, and only 2 when lookahead is 360 minutes (6 hours).

Recency-weighted historical forecast

We'll name this model the Recency-weighted historical forecast since it predicts configurations that were historically prevalent, upweighting the current configuration based on how recently that configuration was in effect.

No change forecast

Parameterizing the model in this way makes it easy to generate one variation worth mentioning: the No change forecast. In the case where weight > hedge, both weight and hedge are very large, and discount_factor=0, we effectively ignore the past configuration distribution and predict no change from the current configuration for all lookaheads. We'll compute that version outside of this notebook and add it to the competition leaderboard as a benchmark score for comparison.

In [43]:
def make_prediction(
    pred_frame: pd.DataFrame,
    hedge: float = 1,
    weight: float = 6,
    discount_factor: float = 0.66,
) -> pd.Series:
    # start with a uniform distribution
    uniform = make_uniform(pred_frame) * hedge
    predictive_distribution = pd.DataFrame({"uniform": uniform})

    # select the data we're allowed to use
    first = pred_frame.iloc[0]
    airport_code, timestamp, lookahead, _, _ = first
    airport_config_df = airport_config_df_map[airport_code]
    current, subset = censor_data(airport_config_df, timestamp)

    # make the distribution of past configurations
    config_dist = make_config_dist(airport_code, subset, normalize=True)
    predictive_distribution["config_dist"] = config_dist.reindex(
        predictive_distribution.index
    ).fillna(0)
    other = config_dist.sum() - predictive_distribution.config_dist.sum()
    predictive_distribution.loc[f"{airport_code}:other", "config_dist"] += other

    # put some extra weight on the current configuration (or `other`)
    current_key = f"{airport_code}:{current}"
    if current_key not in pred_frame.config.values:
        current_key = f"{airport_code}:other"
    discount = 1 - discount_factor * (lookahead / 360)
    predictive_distribution["current"] = 0  # initalize a column of zeros
    predictive_distribution.loc[current_key, "current"] = weight * discount

    # combine the components and normalize the result
    mixture = predictive_distribution.sum(axis=1)
    predictive_distribution["mixture"] = mixture / mixture.sum()

    return predictive_distribution.mixture

Here's an example.

In [44]:
pred_dist = make_prediction(pred_frame)
pred_dist
Out[44]:
katl:D_10_8L_A_10_8L              0.004829
katl:D_10_8R_9L_A_10_8L_9R        0.004829
katl:D_10_8R_A_10_8R              0.004829
katl:D_26L_27L_A_26R_27L_28       0.004829
katl:D_26L_27R_28_A_26R_27L_28    0.004829
katl:D_26L_27R_A_26L_27L_28       0.004829
katl:D_26L_27R_A_26L_27R          0.004829
katl:D_26L_27R_A_26L_27R_28       0.004829
katl:D_26L_27R_A_26R_27L          0.004829
katl:D_26L_27R_A_26R_27L_28       0.062545
katl:D_26L_27R_A_26R_27R_28       0.004829
katl:D_26L_27R_A_26R_28           0.004829
katl:D_26L_27R_A_27L_28           0.004829
katl:D_26L_28_A_26L_28            0.004829
katl:D_26L_28_A_26R_27L_28        0.004829
katl:D_26L_28_A_26R_28            0.004829
katl:D_26R_27R_A_26R_27L_28       0.007588
katl:D_26R_28_A_26R_28            0.004829
katl:D_8L_9L_A_10_8L_9R           0.018166
katl:D_8R_9L_A_10_8L_9R           0.783393
katl:D_8R_9L_A_10_8R_9R           0.008508
katl:D_8R_9L_A_10_9R              0.004829
katl:D_8R_9L_A_8L_9R              0.004829
katl:D_8R_9L_A_8R_9L              0.004829
katl:D_8R_9R_A_10_8L_9R           0.004829
katl:D_9L_A_9R                    0.004829
katl:other                        0.018395
Name: mixture, dtype: float64

Choose a training set

For the Open arena, we split one year of data into weeks and held out a subset of those weeks as a test set. The remainder is intended for model training. This train-test split assumes that individual weeks are independent of one another, which is almost certainly not the case; airport configuration is correlated over time. So if a week is in the test period and we train on the prior and subsequent weeks, we are likely to overestimate the model's performance when compared to the final evaluation set, which comes months after the end of the training data time range.

Since our model uses only three parameters (and we're not spending much time optimizing them), we aren't going to be too concerned with overfitting. To keep the airport/lookahead balance of the full dataset, we'll sample 10 prediction frames from each airport/lookahead combination in open_train_labels. Just remember: you should think carefully about how you split the data during model training and validation to maximize the chance your model will generalize to the final evaluation dataset.

In [45]:
training_samples = (
    open_train_labels.groupby(["airport", "lookahead"])
    .sample(100, random_state=17)
    .set_index(["airport", "timestamp", "lookahead"])
    .sort_index()
)

training_samples.head()
Out[45]:
config active
airport timestamp lookahead
katl 2020-11-07 11:00:00 360 katl:D_8R_9L_A_8L_9R 0.0
2020-11-07 15:00:00 360 katl:other 0.0
2020-11-07 19:00:00 360 katl:D_8R_9L_A_8R_9L 0.0
2020-11-08 01:00:00 360 katl:D_26L_27R_A_26R_27L 0.0
2020-11-08 04:00:00 30 katl:D_8L_9L_A_10_8L_9R 0.0

Check that we have 100 prediction frames from each airport/lookahead.

In [46]:
training_samples.groupby(["airport", "lookahead"]).size().unstack("lookahead")
Out[46]:
lookahead 30 60 90 120 150 180 210 240 270 300 330 360
airport
katl 100 100 100 100 100 100 100 100 100 100 100 100
kclt 100 100 100 100 100 100 100 100 100 100 100 100
kden 100 100 100 100 100 100 100 100 100 100 100 100
kdfw 100 100 100 100 100 100 100 100 100 100 100 100
kjfk 100 100 100 100 100 100 100 100 100 100 100 100
kmem 100 100 100 100 100 100 100 100 100 100 100 100
kmia 100 100 100 100 100 100 100 100 100 100 100 100
kord 100 100 100 100 100 100 100 100 100 100 100 100
kphx 100 100 100 100 100 100 100 100 100 100 100 100
ksea 100 100 100 100 100 100 100 100 100 100 100 100
In [47]:
training_labels = (
    open_train_labels.set_index(["airport", "timestamp", "lookahead"])
    .loc[training_samples.index]  # select the sampled timestamp/lookaheads
    .reset_index()
    .sort_values(["airport", "timestamp", "lookahead", "config"])
)

training_labels.head()
Out[47]:
airport timestamp lookahead config active
0 katl 2020-11-07 11:00:00 360 katl:D_10_8L_A_10_8L 0.0
1 katl 2020-11-07 11:00:00 360 katl:D_10_8R_9L_A_10_8L_9R 0.0
2 katl 2020-11-07 11:00:00 360 katl:D_10_8R_A_10_8R 0.0
3 katl 2020-11-07 11:00:00 360 katl:D_26L_27L_A_26R_27L_28 0.0
4 katl 2020-11-07 11:00:00 360 katl:D_26L_27R_28_A_26R_27L_28 0.0

To contain the predictions, I'll make a new DataFrame with the same columns as training_labels, but with a new column for active set to np.nan.

In [48]:
predictions = training_labels.copy().assign(active=np.nan)
predictions.head()
Out[48]:
airport timestamp lookahead config active
0 katl 2020-11-07 11:00:00 360 katl:D_10_8L_A_10_8L NaN
1 katl 2020-11-07 11:00:00 360 katl:D_10_8R_9L_A_10_8L_9R NaN
2 katl 2020-11-07 11:00:00 360 katl:D_10_8R_A_10_8R NaN
3 katl 2020-11-07 11:00:00 360 katl:D_26L_27L_A_26R_27L_28 NaN
4 katl 2020-11-07 11:00:00 360 katl:D_26L_27R_28_A_26R_27L_28 NaN

Make all the predictions

Now I'll generate predictions for each prediction frame in predictions.

The following function iterates through the prediction frames in a given DataFrame and generates a predictive distribution for each. Then it writes the results into the active column.

In [49]:
from tqdm import tqdm

def make_all_predictions(predictions: pd.DataFrame):
    """Predicts airport configuration for all of the prediction frames in a table."""
    all_preds = []
    grouped = predictions.groupby(["airport", "timestamp", "lookahead"], sort=False)
    for key, pred_frame in tqdm(grouped):
        airport, timestamp, lookahead = key
        pred_dist = make_prediction(pred_frame)
        assert np.array_equal(pred_dist.index.values, pred_frame["config"].values)
        all_preds.append(pred_dist.values)

    predictions["active"] = np.concatenate(all_preds)

I can run it with the sample of prediction frames I selected.

In [50]:
make_all_predictions(predictions)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11933/11933 [01:40<00:00, 118.22it/s]

Score it!

To compute the loss function, we can use sklearn.metrics.log_loss from scikit-learn.

First, we'll compute separate losses for each airport.

In [51]:
from sklearn.metrics import log_loss

scores = [
    log_loss(group["active"], predictions.loc[group.index, "active"])
    for airport, group in training_labels.groupby("airport")
]
scores
Out[51]:
[0.06877280265054836,
 0.14641838551713782,
 0.09005864995544491,
 0.08364211373814505,
 0.10607712940230635,
 0.11573413588108261,
 0.08233448987243573,
 0.09541606567757414,
 0.09384024243625573,
 0.09344240973485897]

Some are better than others. The overall score that the competition uses is the mean of the individual airport losses. To see whether the score is consistent from one run to another, you can go back and change the random seed.

In [52]:
np.mean(scores)
Out[52]:
0.09757364248657897

Accuracy

Log loss does not lend it self to a very intuitive interpretation beyond "lower is better." Other metrics help you understand where your model excels or struggles, generating ideas for how to improve. Let's look into a few options, starting with accuracy, i.e., the percent of times the configuration with the highest predicted probability was the actual configuration. We expect the accuracy of our model to depend on airport and lookahead time, so let's compute accuracy separately for those groups. This is one of the ways the NASA research team reports the performance of their model. (Note that the time period of the data from the paper is not the same as the time period of the competition dataset, so you won't be able to compare scores directly.)

In [53]:
accuracy = []
for airport in sorted(training_labels.airport.unique()):
    airport_labels = training_labels.loc[training_labels.airport == airport]
    airport_predictions = predictions.loc[predictions.airport == airport]
    airport_predictions = (
        airport_predictions.pivot_table(
            index=["timestamp", "lookahead"], columns=["config"], values=["active"]
        )["active"]
        .idxmax(axis=1)
        .rename("active")
        .to_frame()
    )
    airport_labels = (
        airport_labels.pivot_table(
            index=["timestamp", "lookahead"], columns=["config"], values=["active"]
        )["active"]
        .idxmax(axis=1)
        .rename("active")
        .to_frame()
    )
    airport_predictions["correct"] = airport_predictions.active == airport_labels.active
    accuracy.append(
        airport_predictions.groupby("lookahead").correct.mean().rename(airport) * 100
    )

accuracy = pd.concat(accuracy, axis=1)
accuracy.index.rename("Lookahead (minutes)", inplace=True)
accuracy
Out[53]:
katl kclt kden kdfw kjfk kmem kmia kord kphx ksea
Lookahead (minutes)
30 95.876289 92.929293 88.888889 96.000000 98.000000 90.000000 94.897959 82.000000 95.000000 96.000000
60 92.000000 85.000000 57.000000 85.000000 90.000000 63.265306 82.653061 58.585859 86.000000 96.000000
90 88.000000 83.838384 61.000000 78.000000 89.000000 64.000000 76.767677 61.000000 89.000000 90.909091
120 85.000000 84.000000 54.081633 73.737374 85.000000 44.897959 73.737374 46.000000 82.000000 86.000000
150 67.676768 71.428571 45.000000 68.000000 86.000000 40.404040 73.000000 39.393939 75.757576 84.848485
180 72.727273 62.626263 32.653061 61.616162 81.000000 40.000000 57.000000 31.000000 66.666667 80.808081
210 68.000000 68.000000 30.303030 65.000000 76.000000 33.000000 67.000000 29.000000 65.656566 82.000000
240 62.000000 55.555556 30.612245 57.575758 72.000000 17.171717 52.525253 32.000000 66.666667 81.818182
270 68.686869 59.183673 24.489796 45.454545 67.676768 27.000000 48.000000 32.000000 59.000000 77.777778
300 60.000000 53.000000 17.000000 41.414141 68.686869 23.000000 45.000000 25.252525 60.000000 78.000000
330 51.000000 36.000000 18.000000 35.353535 67.346939 16.161616 53.000000 18.000000 50.000000 78.000000
360 60.204082 52.525253 16.161616 34.000000 60.000000 10.000000 36.363636 16.161616 50.000000 76.767677
In [54]:
axs = accuracy.plot(
    subplots=True, sharey=True, sharex=True, figsize=(12, 6), layout=(2, 5)
)
axs[-1][0].set_ylabel("Accuracy %")

for ax in axs.flatten():
    ax.annotate(
        f"{ax.lines[0].get_ydata()[-1]:0.0f}%",
        (ax.lines[0].get_xdata()[-1], ax.lines[0].get_ydata()[-1]),
        xytext=(-5, 15),
        textcoords="offset points",
        verticalalignment="center",
        horizontalalignment="center",
    )
    ax.annotate(
        f"{ax.lines[0].get_ydata()[0]:0.0f}%",
        (ax.lines[0].get_xdata()[0], ax.lines[0].get_ydata()[0]),
        xytext=(15, -5),
        textcoords="offset points",
        verticalalignment="center",
        horizontalalignment="center",
    )
axs[0][0].figure.suptitle("Accuracy by lookahead")
plt.show()

As you can see, for most airports it is pretty easy to predict the active configuration an 30 minutes from the present. However at larger lookaheads, accuracy varies a lot by airport. Part of the challenge will be finding a model that works well across all airports and for later lookahead times.

The overall accuracy is the mean. Since we have the same number of samples for each airport/lookahead, we can compute the overall accuracy as the mean of the individual airport/lookahead accuracies.

In [55]:
print(f"Overall accuracy: {accuracy.values.mean():2.2f}%")
Overall accuracy: 61.55%

Evaluating on changes only

Airport reconfigurations are relatively rare; most of the time, the configuration stays the same. This is partly why a model that only knows the distribution of historical configurations and the current configuration can do pretty well at this task. To really put our model to the test, we can look at how well it performs when evaluated only on prediction frames for which the true configuration is not the same as the current configuration. Let's see how model performance for this only change dataset. We'll use a slight variation of accuracy, "top k" accuracy, where a configuration is marked as correct if it is among the first k most probable configurations.

In [56]:
def compute_airport_top_k_accuracy(
    airport_predictions: pd.DataFrame,
    config_timecourse: pd.Series,
    k: int,
    only_change: bool = False,
) -> float:
    """Computes top k accuracy for a single airport"""
    airport = airport_predictions.iloc[0].airport
    airport_configs = airport_predictions.config.unique()
    top_k_hits = []
    for (timestamp, lookahead), group in airport_predictions.groupby(
        ["timestamp", "lookahead"]
    ):
        lookahead_config = config_timecourse.loc[
            timestamp + pd.Timedelta(minutes=lookahead)
        ]

        if lookahead_config not in airport_configs:
            lookahead_config = f"{airport}:other"
        
        if only_change:
            # only include samples where a configuration change occurred
            current_config = config_timecourse.loc[timestamp]
            include = current_config != lookahead_config
        else:
            include = True
        if include:
            top_k_predictions = group.sort_values(
                "active", ascending=False
            ).config.iloc[:k]
            top_k_hits.append(
                {
                    "lookahead": lookahead,
                    "hit": lookahead_config in top_k_predictions.values,
                }
            )

    return pd.DataFrame(top_k_hits).groupby("lookahead").hit.mean()
In [57]:
def compute_top_k_accuracy(ks: Sequence[int], only_change: bool):
    """Iterates over airports to compute overall top k accuracy"""
    top_ks = {}
    for k in ks:
        airport_top_ks = []
        for airport in predictions.airport.unique():
            config_timecourse = (
                airport_config_df_map[airport]
                .set_index("timestamp")
                .airport_config.resample("30min")
                .ffill()
                .dropna()
            )
            config_timecourse = f"{airport}:" + config_timecourse
            airport_predictions = predictions.loc[predictions.airport == airport]
            airport_top_k = compute_airport_top_k_accuracy(
                airport_predictions, config_timecourse, k=k, only_change=only_change
            )
            airport_top_k.rename(airport, inplace=True)
            airport_top_ks.append(airport_top_k)
        top_ks[k] = pd.concat(airport_top_ks, axis=1).values.mean()

    return pd.Series(
        top_ks.values(), name="accuracy", index=pd.Index(top_ks.keys(), name="k")
    )
In [58]:
top_k_all = compute_top_k_accuracy((1, 3, 5), only_change=False)
top_k_only_change = compute_top_k_accuracy((1, 3, 5), only_change=True)
top_k = pd.concat([top_k_all, top_k_only_change], keys=("All", "Only change"), axis=1) * 100
In [59]:
ax = top_k.rename(index={k: f"$k={k}$" for k in top_k.index}).plot(kind="bar")
ax.set_ylabel("Top $\it{k}$ accuracy")
ax.set_xlabel("")
ax.yaxis.set_major_formatter(mpl.ticker.PercentFormatter(decimals=0))

You can see here how much the performance of our simple model relies on the fact that airport reconfigurations are relatively rare! When we evaluate our model on frames for which the future configuration is not the same as the current configuration, we see that our model does quite poorly.

Generate a submission

Now let's generate a submission for the competition. First we'll read the submission file to see which prediction frames we're supposed to generate.

In [60]:
submission_format = pd.read_csv(
    DATA_PATH / "open_submission_format.csv", parse_dates=["timestamp"]
)
print(f"{len(submission_format):,} rows x {len(submission_format.columns)} columns")
2,468,880 rows x 5 columns
In [61]:
submission = submission_format.copy()
submission["active"] = np.nan
submission.head()
Out[61]:
airport timestamp lookahead config active
0 katl 2021-01-04 06:00:00 30 katl:D_10_8L_A_10_8L NaN
1 katl 2021-01-04 06:00:00 30 katl:D_10_8R_9L_A_10_8L_9R NaN
2 katl 2021-01-04 06:00:00 30 katl:D_10_8R_A_10_8R NaN
3 katl 2021-01-04 06:00:00 30 katl:D_26L_27L_A_26R_27L_28 NaN
4 katl 2021-01-04 06:00:00 30 katl:D_26L_27R_28_A_26R_27L_28 NaN
In [62]:
submission.tail()
Out[62]:
airport timestamp lookahead config active
2468875 ksea 2021-06-20 23:00:00 360 ksea:D_34C_A_34C_34L NaN
2468876 ksea 2021-06-20 23:00:00 360 ksea:D_34R_A_34C NaN
2468877 ksea 2021-06-20 23:00:00 360 ksea:D_34R_A_34C_34R NaN
2468878 ksea 2021-06-20 23:00:00 360 ksea:D_34R_A_34L_34R NaN
2468879 ksea 2021-06-20 23:00:00 360 ksea:other NaN

And generate predictions.

In [63]:
make_all_predictions(submission)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 97200/97200 [13:11<00:00, 122.73it/s]

Before submitting it is a good idea to check that our submission conforms to the submission format.

In [64]:
submission[["airport", "timestamp", "lookahead", "config"]].equals(
    submission_format[["airport", "timestamp", "lookahead", "config"]]
)
Out[64]:
True

And double check that the predictions for each airport, timestamp, and lookahead combination sum to one.

In [65]:
np.allclose(submission.groupby(["airport", "timestamp", "lookahead"]).sum(), 1)
Out[65]:
True

Finally, we can write the predictions to a CSV file ready to submit to the competition. Note the use of the date_format argument to pd.DataFrame.to_csv; that ensures that dates are stored in the correct format, e.g., "2021-01-01T12:00:00". Also consider submitting a zipped CSV, which can result in a much smaller file and much faster uploads. A valid zipped submission contains a single CSV file, which we can achieve with the compression argument to pd.DataFrame.to_csv.

In [66]:
filename = "submission.csv.zip"

submission.to_csv(
    filename,
    date_format="%Y-%m-%dT%H:%M:%S",
    index=False,
    compression={"method": "zip", "archive_name": "submission.csv"},
)
In [67]:
!ls -lh submission.csv.zip
-rw-rw-r-- 1 robert robert 27M Jan 27 20:42 submission.csv.zip

Let's load the actual submission CSV and check the first few lines to make sure the file is in the right format.

In [68]:
pd.read_csv(filename, parse_dates=["timestamp"], nrows=5)
Out[68]:
airport timestamp lookahead config active
0 katl 2021-01-04 06:00:00 30 katl:D_10_8L_A_10_8L 0.004829
1 katl 2021-01-04 06:00:00 30 katl:D_10_8R_9L_A_10_8L_9R 0.004829
2 katl 2021-01-04 06:00:00 30 katl:D_10_8R_A_10_8R 0.004829
3 katl 2021-01-04 06:00:00 30 katl:D_26L_27L_A_26R_27L_28 0.005040
4 katl 2021-01-04 06:00:00 30 katl:D_26L_27R_28_A_26R_27L_28 0.004871

And here is how the submission appears on the competition submissions page:

The test subset score of 0.0980 is pretty close to the training subset score of 0.0976, which is a good sign that the subsets do not have wildly different distributions.

We added this submission (Recency-weighted historical forecast) and the No change forecast as benchmarks on the competition leaderboard so you can compare your models to them throughout the competition.

That concludes our benchmark! There is tons of potential for iterating on this approach and for completely new solutions. We haven't even cracked open the other features, like flight-level air traffic streams and historical weather predictions. Head on over to competition to get started. We're excited to see what you build!

Banner image courtesy of Michael Coghlan.