NASA Airport Configuration Benchmark¶

Coordinating the national airspace is a fascinating problem affecting individual travelers, the economy, and the environment. One important piece of the puzzle is how airport operators make best use of an airport's many runways. The specific set of runways active for departing and arriving flights is the "airport configuration," and knowing when to use which configuration is an important piece of the air traffic management puzzle.

Thanks to NASA, the FAA, airlines, and other agencies, there is a massive stream of real-time data available to aid in effective decision making. Now it's up to you to learn how to best use it! Below is a diagram of an airport configured in a "north flow" with 2 runways active for departures (orange arrows), 2 for arrivals (blue arrows) and 4 inactive runways (thick black lines).

The Run-way Functions: Predict Reconfigurations at US Airports competition sets you up with historical air traffic, airport configuration, and weather data to learn how runways at 10 airports will be configured in the future. Check out the Problem description for more background.

This blog post is intended to familiarize you with the competition data and to get started thinking about how to approach this challenge. It reads the training data and one of the data files, makes predictions with a relatively simple algorithm, and computes the scoring function.

You can use snippets from this notebook in your solution and you can use this algorithm as a starting place (but you don't have to!).

In [1]:

from pathlib import Path
from typing import Sequence, Tuple

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

mpl.rcParams["figure.dpi"] = 100
sns.set_style("whitegrid")
pd.options.display.width = 200

Get the data¶

To run this notebook, you'll first need to download the following files from the Data download page (available after you've joined the competition):

Training features (train_features.tar): A tar archive containing air traffic, airport configuration, and weather features for the 10 airports from Nov. 2020 to Nov. 2021. Each feature is saved as a bzip2-compressed CSV.
Training labels (open_train_labels.csv.bz2): Labels for the open arena training period.
Submission format (open_submission_format.csv): A simple submission that predicts a uniform probability across all configurations. Use this an example of a properly formatted submission.

Once you have downloaded the data files, you can unpack the tar file like this:

tar -xf train_features.tar

And then you can delete the tar file.

Make sure that the following path leads to the data files you downloaded. If the files are in the same directory as the notebook, set the path to ".".

In [2]:

DATA_PATH = Path("../data")

The following cell lists the data directories for each airport.

In [3]:

airport_directories = sorted(path for path in DATA_PATH.glob("k*"))
airport_directories

Out[3]:

[PosixPath('../data/katl'),
 PosixPath('../data/kclt'),
 PosixPath('../data/kden'),
 PosixPath('../data/kdfw'),
 PosixPath('../data/kjfk'),
 PosixPath('../data/kmem'),
 PosixPath('../data/kmia'),
 PosixPath('../data/kord'),
 PosixPath('../data/kphx'),
 PosixPath('../data/ksea')]

There are 10 airports spread throughout the continental US. Here's a map showing their locations.

Location of airports

Picking one airport as an example, here are the data files.

In [4]:

airport_directory = airport_directories[0]
data_files = list(airport_directory.glob("*"))
data_files

Out[4]:

[PosixPath('../data/katl/katl_airport_config.csv.bz2'),
 PosixPath('../data/katl/katl_tfm_estimated_runway_arrival_time.csv.bz2'),
 PosixPath('../data/katl/katl_departure_runway.csv.bz2'),
 PosixPath('../data/katl/katl_etd.csv.bz2'),
 PosixPath('../data/katl/katl_mfs_runway_arrival_time.csv.bz2'),
 PosixPath('../data/katl/katl_mfs_stand_arrival_time.csv.bz2'),
 PosixPath('../data/katl/katl_first_position.csv.bz2'),
 PosixPath('../data/katl/katl_tbfm_scheduled_runway_arrival_time.csv.bz2'),
 PosixPath('../data/katl/katl_lamp.csv.bz2'),
 PosixPath('../data/katl/katl_mfs_stand_departure_time.csv.bz2'),
 PosixPath('../data/katl/katl_arrival_runway.csv.bz2'),
 PosixPath('../data/katl/katl_mfs_runway_departure_time.csv.bz2')]

The one we'll use in this notebook is katl_airport_config.csv.bz2, which contains the actual configurations during the training period.

In [5]:

airport_code = airport_directory.name
filename = f"{airport_code}_airport_config.csv.bz2"
filepath = airport_directory / filename
filepath

Out[5]:

PosixPath('../data/katl/katl_airport_config.csv.bz2')

We can read the file like this, and display the first few lines.

In [6]:

airport_config_df = pd.read_csv(filepath, parse_dates=["timestamp"])
print(airport_config_df.shape)
airport_config_df.head()

(12431, 2)

Out[6]:

	timestamp	airport_config
0	2020-11-01 01:11:00	D_8R_9L_A_10_8L_9R
1	2020-11-01 01:57:00	D_8R_9L_A_10_8L_9R
2	2020-11-01 02:53:00	D_8R_9L_A_10_8L_9R
3	2020-11-01 03:54:00	D_8R_9L_A_10_8L_9R
4	2020-11-01 04:52:00	D_8R_9L_A_10_8L_9R

timestamp is the time when the configuration was recorded; they are a roughly one-hour intervals.
airport_config is a string that represents the runways in use at the given time. For example, D_8R_9L_A_10_8L_9R means that the runways in use for departure were 8R and 9L, and the runways in use for arrivals were 10, 8L, and 9R.

The following function takes the path to the data directory for a given airport, reads the airport configuration file, and returns the airport code and a Pandas DataFrame containing the data.

In [7]:

def read_airport_configs(airport_directory: Path) -> Tuple[str, pd.DataFrame]:
    """Reads the airport configuration features for a given airport data directory."""
    airport_code = airport_directory.name
    filename = f"{airport_code}_airport_config.csv.bz2"
    filepath = airport_directory / filename
    airport_config_df = pd.read_csv(filepath, parse_dates=["timestamp"])
    return airport_code, airport_config_df

I'll use this function to make a dictionary that maps from each airport code to the airport configuration DataFrame.

In [8]:

airport_config_df_map = {}

for airport_directory in sorted(airport_directories):
    airport_code, airport_config_df = read_airport_configs(airport_directory)
    print(airport_code)
    airport_config_df_map[airport_code] = airport_config_df

katl
kclt
kden
kdfw
kjfk
kmem
kmia
kord
kphx
ksea

To check the contents of this data structure, here are the airport codes and the lengths of the airport configuration DataFrames.

In [9]:

for airport_code, df in airport_config_df_map.items():
    print(f"{airport_code}: {len(df):>8,}")

katl:   12,431
kclt:   11,141
kden:   12,145
kdfw:   10,997
kjfk:   10,894
kmem:   12,958
kmia:   11,400
kord:   14,355
kphx:    9,050
ksea:   10,776

Read the training labels¶

Let's load the training data.

In [10]:

open_train_labels = pd.read_csv(
    DATA_PATH / "open_train_labels.csv.bz2", parse_dates=["timestamp"]
)
open_train_labels.shape

Out[10]:

(22640544, 5)

There are about 22 million rows. Here's what the first few rows look like.

In [11]:

open_train_labels.head()

Out[11]:

	airport	timestamp	lookahead	config
0	katl	2020-11-06 23:00:00	30	katl:D_10_8L_A_10_8L
1	katl	2020-11-06 23:00:00	30	katl:D_10_8R_9L_A_10_8L_9R
2	katl	2020-11-06 23:00:00	30	katl:D_10_8R_A_10_8R
3	katl	2020-11-06 23:00:00	30	katl:D_26L_27L_A_26R_27L_28
4	katl	2020-11-06 23:00:00	30	katl:D_26L_27R_28_A_26R_27L_28

The columns are:

airport: string airport code
timestamp: pd.Timestamp with the time when we should generate a prediction.
lookahead: number of minutes in the future we're trying to predict.
config: string representing a possible configuration
active: float 0 or 1 indicating whether a particular configuration was actually active during the future we're trying to predict.

In open_train_labels, only one configuration is active (assigned probability 1) and the others are inactive (assigned probability 0).

Our job is to generate predictions in the same format as the training labels, except that the values in active will be probabilities between 0 to 1.

Get a sample frame¶

One way to think about open_train_labels is as a sequence of "prediction frames", where each frame represents a prediction for a given airport, at a particular time, for a given point in the future.

Each frame contains one row for each of the most common configurations (for that airport), plus a catch-all bucket called other.

To iterate through the frames in predictions, we can make a groupby object.

In [12]:

sample_frame_groups = open_train_labels.groupby(
    ["airport", "timestamp", "lookahead"], sort=False
)

As an example, I'll select just the first frame.

In [13]:

for name, pred_frame in sample_frame_groups:
    print(name)
    break

('katl', Timestamp('2020-11-06 23:00:00'), 30)

In [14]:

pred_frame

Out[14]:

	airport	timestamp	lookahead	config	active
0	katl	2020-11-06 23:00:00	30	katl:D_10_8L_A_10_8L	0.0
1	katl	2020-11-06 23:00:00	30	katl:D_10_8R_9L_A_10_8L_9R	0.0
2	katl	2020-11-06 23:00:00	30	katl:D_10_8R_A_10_8R	0.0
3	katl	2020-11-06 23:00:00	30	katl:D_26L_27L_A_26R_27L_28	0.0
4	katl	2020-11-06 23:00:00	30	katl:D_26L_27R_28_A_26R_27L_28	0.0
5	katl	2020-11-06 23:00:00	30	katl:D_26L_27R_A_26L_27L_28	0.0
6	katl	2020-11-06 23:00:00	30	katl:D_26L_27R_A_26L_27R	0.0
7	katl	2020-11-06 23:00:00	30	katl:D_26L_27R_A_26L_27R_28	0.0
8	katl	2020-11-06 23:00:00	30	katl:D_26L_27R_A_26R_27L	0.0
9	katl	2020-11-06 23:00:00	30	katl:D_26L_27R_A_26R_27L_28	0.0
10	katl	2020-11-06 23:00:00	30	katl:D_26L_27R_A_26R_27R_28	0.0
11	katl	2020-11-06 23:00:00	30	katl:D_26L_27R_A_26R_28	0.0
12	katl	2020-11-06 23:00:00	30	katl:D_26L_27R_A_27L_28	0.0
13	katl	2020-11-06 23:00:00	30	katl:D_26L_28_A_26L_28	0.0
14	katl	2020-11-06 23:00:00	30	katl:D_26L_28_A_26R_27L_28	0.0
15	katl	2020-11-06 23:00:00	30	katl:D_26L_28_A_26R_28	0.0
16	katl	2020-11-06 23:00:00	30	katl:D_26R_27R_A_26R_27L_28	0.0
17	katl	2020-11-06 23:00:00	30	katl:D_26R_28_A_26R_28	0.0
18	katl	2020-11-06 23:00:00	30	katl:D_8L_9L_A_10_8L_9R	0.0
19	katl	2020-11-06 23:00:00	30	katl:D_8R_9L_A_10_8L_9R	1.0
20	katl	2020-11-06 23:00:00	30	katl:D_8R_9L_A_10_8R_9R	0.0
21	katl	2020-11-06 23:00:00	30	katl:D_8R_9L_A_10_9R	0.0
22	katl	2020-11-06 23:00:00	30	katl:D_8R_9L_A_8L_9R	0.0
23	katl	2020-11-06 23:00:00	30	katl:D_8R_9L_A_8R_9L	0.0
24	katl	2020-11-06 23:00:00	30	katl:D_8R_9R_A_10_8L_9R	0.0
25	katl	2020-11-06 23:00:00	30	katl:D_9L_A_9R	0.0
26	katl	2020-11-06 23:00:00	30	katl:other	0.0

This frame represents a prediction for katl at 2020-11-06 23:00:00 for 30 minutes in the future. Our job is to fill the active column with probabilities.

Given a prediction frame, we can get the airport code, timestamp, and lookahead like this:

In [15]:

first = pred_frame.iloc[0]
airport_code = first["airport"]
timestamp = first["timestamp"]
lookahead = first["lookahead"]

print(f"Airport code: {airport_code}")
print(f"Timestamp: {timestamp}")
print(f"Lookahead: {lookahead}")

Airport code: katl
Timestamp: 2020-11-06 23:00:00
Lookahead: 30

Uniform prediction¶

As a simple example, suppose we have no reason to think one configuration is more likely than another. In that case, we could generate a uniform distribution where each configuration has the same probability.

The following function takes a prediction frame and returns a Pandas Series that represent a predictive distribution.

In [16]:

def make_uniform(pred_frame: pd.DataFrame) -> pd.Series:
    indices = pred_frame["config"].values
    uniform = pd.Series(1, index=indices)
    uniform /= uniform.sum()
    return uniform

Here's a the result using the prediction frame from the previous section as an example.

In [17]:

pred_dist = make_uniform(pred_frame)
pred_dist

Out[17]:

katl:D_10_8L_A_10_8L              0.037037
katl:D_10_8R_9L_A_10_8L_9R        0.037037
katl:D_10_8R_A_10_8R              0.037037
katl:D_26L_27L_A_26R_27L_28       0.037037
katl:D_26L_27R_28_A_26R_27L_28    0.037037
katl:D_26L_27R_A_26L_27L_28       0.037037
katl:D_26L_27R_A_26L_27R          0.037037
katl:D_26L_27R_A_26L_27R_28       0.037037
katl:D_26L_27R_A_26R_27L          0.037037
katl:D_26L_27R_A_26R_27L_28       0.037037
katl:D_26L_27R_A_26R_27R_28       0.037037
katl:D_26L_27R_A_26R_28           0.037037
katl:D_26L_27R_A_27L_28           0.037037
katl:D_26L_28_A_26L_28            0.037037
katl:D_26L_28_A_26R_27L_28        0.037037
katl:D_26L_28_A_26R_28            0.037037
katl:D_26R_27R_A_26R_27L_28       0.037037
katl:D_26R_28_A_26R_28            0.037037
katl:D_8L_9L_A_10_8L_9R           0.037037
katl:D_8R_9L_A_10_8L_9R           0.037037
katl:D_8R_9L_A_10_8R_9R           0.037037
katl:D_8R_9L_A_10_9R              0.037037
katl:D_8R_9L_A_8L_9R              0.037037
katl:D_8R_9L_A_8R_9L              0.037037
katl:D_8R_9R_A_10_8L_9R           0.037037
katl:D_9L_A_9R                    0.037037
katl:other                        0.037037
dtype: float64

Make distribution of configurations¶

Now let's think about making predictions. In reality, we will have good reasons to think that some configurations are more likely than others; that's the whole point of the competition!

Suppose we know that the configuration has changed, but we don't know what the new configuration is. If that's all the information we have, a reasonable predictive distribution would be the distribution of configurations we've seen in the past.

To demonstrate, I'll select the sequence of configurations from one airport.

In [18]:

airport_code = "katl"
airport_config_df = airport_config_df_map[airport_code]
airport_config_df.head()

Out[18]:

	timestamp	airport_config
0	2020-11-01 01:11:00	D_8R_9L_A_10_8L_9R
1	2020-11-01 01:57:00	D_8R_9L_A_10_8L_9R
2	2020-11-01 02:53:00	D_8R_9L_A_10_8L_9R
3	2020-11-01 03:54:00	D_8R_9L_A_10_8L_9R
4	2020-11-01 04:52:00	D_8R_9L_A_10_8L_9R

The airport configuration dataset is comprised of reports, and the amount of time between reports is not guaranteed to be equal. So the number of times a configuration appears in the dataset is not the same as the amount of time that configuration is active. To get that, we'll first resample the configuration data to regular intervals; let's say every 15 minutes. Note the use of ffill which propagates non-null values forward to fill in missing values (thus making sure we aren't leaking information from the future).

In [19]:

config_timecourse = (
    airport_config_df.set_index("timestamp")
    .airport_config.resample("15min")
    .ffill()
    .dropna()
)
config_timecourse.head()

Out[19]:

timestamp
2020-11-01 01:15:00    D_8R_9L_A_10_8L_9R
2020-11-01 01:30:00    D_8R_9L_A_10_8L_9R
2020-11-01 01:45:00    D_8R_9L_A_10_8L_9R
2020-11-01 02:00:00    D_8R_9L_A_10_8L_9R
2020-11-01 02:15:00    D_8R_9L_A_10_8L_9R
Freq: 15T, Name: airport_config, dtype: object

Now we can count the number of occurrences with value_counts to know how long each configuration was active.

In [20]:

config_dist = config_timecourse.value_counts().sort_values(ascending=False)
print(f"Number of unique configurations at {airport_code.upper()}: {len(config_dist)}")
config_dist

Number of unique configurations at KATL: 156

Out[20]:

D_26L_27R_A_26R_27L_28       15761
D_8R_9L_A_10_8L_9R           10317
D_26L_27R_28_A_26R_27L_28      464
D_26L_27R_A_26L_27L_28         409
D_26L_28_A_26L_28              403
                             ...  
D_27L_27R_A_27L_28               1
D_10_9R_A_10_8L_9R               1
D_26L_27R_28_A_26R_27R_28        1
D_27R_8R_9L_A_10_9R              1
D_8R_A_10_8L                     1
Name: airport_config, Length: 156, dtype: int64

In the course of a year, KATL uses 156 unique configurations! Other airports use more than 200. Still, airports tend to favor a much smaller subset of configurations the majority of the time. At KATL, just 2 configurations account for 74% of the time. For this competition, the prediction target groups uncommon configurations are grouped into an "other" bin.

In [21]:

ax = ((config_dist / config_dist.sum()) * 100).head(20).plot(kind="bar", figsize=(8, 3))
ax.set_title("Configuration distribution for top 20 configurations")
ax.set_ylabel("Percent of time active")
ax.yaxis.set_major_formatter(mpl.ticker.PercentFormatter(decimals=0))
plt.show()

One assumption this approach makes is that the distribution of configurations is stationary, i.e., it does not change over time. Let's use the configuration timecourse to visualize how well assumption holds up. We'll breeze by the pandas transformations for now, but feel free to explore the details on your own.

In [22]:

import colorcet as cc
from cycler import cycler

with mpl.rc_context({"axes.prop_cycle": cycler(color=cc.glasbey_bw)}):
    fig, ax = plt.subplots(figsize=(10, 6))
    ax.set_axisbelow(True)
    sns.set_style("whitegrid")
    x = (
        config_timecourse.reset_index()
        .groupby(
            [
                pd.Grouper(key="timestamp", freq="7D", axis=0),
                "airport_config",
            ]
        )
        .size()
        .unstack()
        .fillna(0)
        .astype(int)
    ) / 4
    x.index = x.index.astype(str)
    x[x.sum().sort_values(ascending=False).index].plot(
        kind="bar", stacked=True, width=1, ax=ax
    )
    handles, labels = ax.get_legend_handles_labels()
    ax.legend(handles[:20], labels[:20], bbox_to_anchor=(1, 1), loc="upper left")
    ax.set_title(f"Configuration distribution by week for {airport_code.upper()}")
    ax.set_xlabel("")
    ax.set_ylabel("Hours per week")
    plt.show()

There is definitely some variation week-to-week, with first and second most common swapping places occasionally. Overall, I'd say there aren't any seismic shifts in the distribution, although (spoiler alert) that isn't the case for all of the airports. You'll have to consider how to handle those cases; for example, you might upweight more recent time periods when computing your distribution.

Notice that the labels in this series do not include the airport code. To make them consistent with the config column in open_train_labels, we have to prepend the airport code, which we can do like this:

In [23]:

prefix = pd.Series(f"{airport_code}:", index=config_dist.index)
config_dist.index = prefix.str.cat(config_dist.index)
config_dist

Out[23]:

katl:D_26L_27R_A_26R_27L_28       15761
katl:D_8R_9L_A_10_8L_9R           10317
katl:D_26L_27R_28_A_26R_27L_28      464
katl:D_26L_27R_A_26L_27L_28         409
katl:D_26L_28_A_26L_28              403
                                  ...  
katl:D_27L_27R_A_27L_28               1
katl:D_10_9R_A_10_8L_9R               1
katl:D_26L_27R_28_A_26R_27R_28        1
katl:D_27R_8R_9L_A_10_9R              1
katl:D_8R_A_10_8L                     1
Name: airport_config, Length: 156, dtype: int64

The following function encapsulates these steps. It takes an airport code, a DataFrame containing timestamps and configurations, and a boolean that indicates whether the results should be normalized so they add up to 1 (like valid probabilities should). It returns a Pandas Series that represents the distribution of past configurations.

In [24]:

def make_config_dist(
    airport_code: str, airport_config_df: pd.DataFrame, normalize: bool = False
) -> pd.Series:
    config_timecourse = (
        airport_config_df.set_index("timestamp")
        .airport_config.resample("15min")
        .ffill()
        .dropna()
    )
    config_dist = config_timecourse.value_counts()
    if normalize:
        config_dist /= config_dist.sum()

    # prepend the airport code to the configuration strings
    prefix = pd.Series(f"{airport_code}:", index=config_dist.index)
    config_dist.index = prefix.str.cat(config_dist.index)

    return config_dist

Here's an example.

In [25]:

make_config_dist(airport_code, airport_config_df)

Out[25]:

katl:D_26L_27R_A_26R_27L_28       15761
katl:D_8R_9L_A_10_8L_9R           10317
katl:D_26L_27R_28_A_26R_27L_28      464
katl:D_26L_27R_A_26L_27L_28         409
katl:D_26L_28_A_26L_28              403
                                  ...  
katl:D_27L_28_A_26R_27L_28            1
katl:D_8R_9L_A_8L                     1
katl:D_8R_9L_A_8R                     1
katl:D_26R_27R_A_27L_28               1
katl:D_8R_A_10_8L                     1
Name: airport_config, Length: 156, dtype: int64

Censor data¶

Using the distribution of historic airport configurations is a great first step. However, we have to be careful that we don't use information from the future to predict the future. That would be cheating, since no real-time solution could know which configurations are common or uncommon in the future.

So, before we compute the distribution of configurations, we have to censor the data, cutting off everything that comes after the time when we are making the predictions.

As an example, let's get the actual configurations for the airport in the running example.

In [26]:

airport_config_df = airport_config_df_map[airport_code]
airport_config_df.head(10)

Out[26]:

	timestamp	airport_config
0	2020-11-01 01:11:00	D_8R_9L_A_10_8L_9R
1	2020-11-01 01:57:00	D_8R_9L_A_10_8L_9R
2	2020-11-01 02:53:00	D_8R_9L_A_10_8L_9R
3	2020-11-01 03:54:00	D_8R_9L_A_10_8L_9R
4	2020-11-01 04:52:00	D_8R_9L_A_10_8L_9R
5	2020-11-01 05:27:00	D_8R_9L_A_10_8L_9R
6	2020-11-01 05:40:00	D_8L_9L_A_10_8L_9R
7	2020-11-01 06:23:00	D_8L_9L_A_10_8L_9R
8	2020-11-01 06:53:00	D_8L_9L_A_10_8L_9R
9	2020-11-01 07:53:00	D_8L_9L_A_10_8L_9R

And let's suppose we are asked to make a prediction at 2020-11-01 05:00:00.

In [27]:

timestamp = pd.Timestamp("2020-11-01 05:00:00")
timestamp

Out[27]:

Timestamp('2020-11-01 05:00:00')

We are only allowed to use data from before the prediction timestamp.

In [28]:

mask = airport_config_df["timestamp"] <= timestamp
mask.sum()

Out[28]:

In [29]:

subset = airport_config_df[mask]
subset

Out[29]:

	timestamp	airport_config
0	2020-11-01 01:11:00	D_8R_9L_A_10_8L_9R
1	2020-11-01 01:57:00	D_8R_9L_A_10_8L_9R
2	2020-11-01 02:53:00	D_8R_9L_A_10_8L_9R
3	2020-11-01 03:54:00	D_8R_9L_A_10_8L_9R
4	2020-11-01 04:52:00	D_8R_9L_A_10_8L_9R

While we're at it, it will be useful to extract the last configuration from the label data, which should be the current configuration (at the point where we make the prediction).

In [30]:

current = subset.iloc[-1]
current["airport_config"]

Out[30]:

'D_8R_9L_A_10_8L_9R'

The following function encapsulates these steps. It takes the complete list of configurations for a particular airport and the timestamp for the current prediction.

It returns the current configuration and the subset of configurations from prior to the timestamp.

In [31]:

def censor_data(
    airport_config_df: pd.DataFrame, timestamp: pd.Timestamp
) -> Tuple[str, pd.DataFrame]:
    mask = airport_config_df["timestamp"] <= timestamp
    subset = airport_config_df[mask]
    current = subset.iloc[-1].airport_config
    return current, subset

Here's an example.

In [32]:

censor_data(airport_config_df, timestamp)

Out[32]:

('D_8R_9L_A_10_8L_9R',
             timestamp      airport_config
 0 2020-11-01 01:11:00  D_8R_9L_A_10_8L_9R
 1 2020-11-01 01:57:00  D_8R_9L_A_10_8L_9R
 2 2020-11-01 02:53:00  D_8R_9L_A_10_8L_9R
 3 2020-11-01 03:54:00  D_8R_9L_A_10_8L_9R
 4 2020-11-01 04:52:00  D_8R_9L_A_10_8L_9R)

Prediction based on data¶

Now we're ready to make predictions that actually use data. As a baseline strategy, I'll make a predictive distribution that's a weighted mixture of three distributions:

A uniform distribution that makes sure we give non-zero probability to all possible configurations,
The distribution of past configurations, and
A point mass on the current configuration.

Using the same prediction frame we've been using, we'll construct a DataFrame in which each component distribution is in a column. Then we'll sum the columns together and normalize the result to sum to 1. Let's start with the uniform distribution:

In [33]:

predictive_distribution = pd.DataFrame({"uniform": make_uniform(pred_frame)})

Next we have to censor the data so we're only using history, not future information.

In [34]:

first = pred_frame.iloc[0]
airport_code, timestamp, lookahead, _, _ = first
print(f"Airport code: {airport_code}")
print(f"Timestamp: {timestamp}")
print(f"Lookahead: {lookahead}")

Airport code: katl
Timestamp: 2020-11-06 23:00:00
Lookahead: 30

In [35]:

airport_config_df = airport_config_df_map[airport_code]
current, subset = censor_data(airport_config_df, timestamp)
print(f"Current configuration: {current}")

subset.tail()

Current configuration: D_8R_9L_A_10_8L_9R

Out[35]:

	timestamp	airport_config
199	2020-11-06 18:53:00	D_8R_9L_A_10_8L_9R
200	2020-11-06 19:53:00	D_8R_9L_A_10_8L_9R
201	2020-11-06 20:54:00	D_8R_9L_A_10_8L_9R
202	2020-11-06 21:54:00	D_8R_9L_A_10_8L_9R
203	2020-11-06 22:54:00	D_8R_9L_A_10_8L_9R

Here's the distribution of past configurations.

In [36]:

config_dist = make_config_dist(airport_code, subset, normalize=True)
config_dist

Out[36]:

katl:D_26L_27R_A_26R_27L_28        0.442681
katl:D_8R_9L_A_10_8L_9R            0.301587
katl:D_8L_9L_A_10_8L_9R            0.102293
katl:D_9L_A_10_9R                  0.035273
katl:D_8R_9L_A_10_8R_9R            0.028219
katl:D_26R_27R_A_26R_27L_28        0.021164
katl:D_8L_9L_A_10_8L_9L            0.021164
katl:D_9L_A_10_8L_9R               0.015873
katl:D_26L_26R_27R_A_26R_27L_28    0.014109
katl:D_26R_27R_A_26R_27R_28        0.010582
katl:D_8L_8R_9L_A_10_8L_9R         0.007055
Name: airport_config, dtype: float64

At this point, there might be configurations in uniform that are not present in the censored subset, so they don't appear in config_dist. And there might be configurations in config_dist that don't appear in uniform because they've been grouped into other in the labels.

We can use reindex to look up each of the configurations in uniform in config_dist. At that point, the resulting series will have NaN values for those configurations not present in the censored subset and those that are grouped into other in the labels.

Then we allocate the values from config_dist that didn't get selected to other.

In [37]:

predictive_distribution["config_dist"] = config_dist.reindex(
    predictive_distribution.index
).fillna(0)

other = config_dist.sum() - predictive_distribution.config_dist.sum()
predictive_distribution.loc[f"{airport_code}:other", "config_dist"] += other

Add more probability mass to the current configuration.

In [38]:

predictive_distribution["current"] = 0  # initalize a column of zeros
predictive_distribution.loc[f"{airport_code}:{current}", "current"] = (
    1  # set the current configuration to 1
)

Finally, we'll combine the components and normalize the predictive distribution so it adds up to 1.

In [39]:

mixture = predictive_distribution.sum(axis=1)
predictive_distribution["mixture"] = mixture / mixture.sum()

Now let's look at the values of the three component probabilities and their mixture:

In [40]:

predictive_distribution

Out[40]:

	uniform	config_dist	current	mixture
katl:D_10_8L_A_10_8L	0.037037	0.000000	0	0.012346
katl:D_10_8R_9L_A_10_8L_9R	0.037037	0.000000	0	0.012346
katl:D_10_8R_A_10_8R	0.037037	0.000000	0	0.012346
katl:D_26L_27L_A_26R_27L_28	0.037037	0.000000	0	0.012346
katl:D_26L_27R_28_A_26R_27L_28	0.037037	0.000000	0	0.012346
katl:D_26L_27R_A_26L_27L_28	0.037037	0.000000	0	0.012346
katl:D_26L_27R_A_26L_27R	0.037037	0.000000	0	0.012346
katl:D_26L_27R_A_26L_27R_28	0.037037	0.000000	0	0.012346
katl:D_26L_27R_A_26R_27L	0.037037	0.000000	0	0.012346
katl:D_26L_27R_A_26R_27L_28	0.037037	0.442681	0	0.159906
katl:D_26L_27R_A_26R_27R_28	0.037037	0.000000	0	0.012346
katl:D_26L_27R_A_26R_28	0.037037	0.000000	0	0.012346
katl:D_26L_27R_A_27L_28	0.037037	0.000000	0	0.012346
katl:D_26L_28_A_26L_28	0.037037	0.000000	0	0.012346
katl:D_26L_28_A_26R_27L_28	0.037037	0.000000	0	0.012346
katl:D_26L_28_A_26R_28	0.037037	0.000000	0	0.012346
katl:D_26R_27R_A_26R_27L_28	0.037037	0.021164	0	0.019400
katl:D_26R_28_A_26R_28	0.037037	0.000000	0	0.012346
katl:D_8L_9L_A_10_8L_9R	0.037037	0.102293	0	0.046443
katl:D_8R_9L_A_10_8L_9R	0.037037	0.301587	1	0.446208
katl:D_8R_9L_A_10_8R_9R	0.037037	0.028219	0	0.021752
katl:D_8R_9L_A_10_9R	0.037037	0.000000	0	0.012346
katl:D_8R_9L_A_8L_9R	0.037037	0.000000	0	0.012346
katl:D_8R_9L_A_8R_9L	0.037037	0.000000	0	0.012346
katl:D_8R_9R_A_10_8L_9R	0.037037	0.000000	0	0.012346
katl:D_9L_A_9R	0.037037	0.000000	0	0.012346
katl:other	0.037037	0.104056	0	0.047031

In [41]:

predictive_distribution.rename(
    columns={
        "uniform": "Uniform",
        "config_dist": "Past",
        "current": "Current",
        "mixture": "Mixture",
    }
).plot(kind="bar", subplots=(1, 4), figsize=(6, 8), sharex=True, legend=False)

plt.subplots_adjust(hspace=0.3)
plt.show()

If things have gone right, the index of predictive_distribution should be the same as the list of configurations in pred_frame.

In [42]:

np.array_equal(predictive_distribution.index.values, pred_frame["config"].values)

Out[42]:

True

The following function combines all of this logic and adds in three parameters:

hedge: the weight assigned to the uniform distribution (which hedges our bets by giving some probability to all configurations).
weight: the weight assigned to current configuration.
discount_factor: how much to decrease the importance of the current configuration as the lookahead time increases.

The relationship between weight and discount_factor is:

discount = 1 - discount_factor * (lookahead / 360)
pred_dist[current_key] += weight * discount

For example, with weight=6 and discount_factor=0.66, the effective weight for the present configuration is close to 6 for small values of lookahead, and only 2 when lookahead is 360 minutes (6 hours).

Recency-weighted historical forecast¶

We'll name this model the Recency-weighted historical forecast since it predicts configurations that were historically prevalent, upweighting the current configuration based on how recently that configuration was in effect.

No change forecast¶

Parameterizing the model in this way makes it easy to generate one variation worth mentioning: the No change forecast. In the case where weight > hedge, both weight and hedge are very large, and discount_factor=0, we effectively ignore the past configuration distribution and predict no change from the current configuration for all lookaheads. We'll compute that version outside of this notebook and add it to the competition leaderboard as a benchmark score for comparison.

In [43]:

def make_prediction(
    pred_frame: pd.DataFrame,
    hedge: float = 1,
    weight: float = 6,
    discount_factor: float = 0.66,
) -> pd.Series:
    # start with a uniform distribution
    uniform = make_uniform(pred_frame) * hedge
    predictive_distribution = pd.DataFrame({"uniform": uniform})

    # select the data we're allowed to use
    first = pred_frame.iloc[0]
    airport_code, timestamp, lookahead, _, _ = first
    airport_config_df = airport_config_df_map[airport_code]
    current, subset = censor_data(airport_config_df, timestamp)

    # make the distribution of past configurations
    config_dist = make_config_dist(airport_code, subset, normalize=True)
    predictive_distribution["config_dist"] = config_dist.reindex(
        predictive_distribution.index
    ).fillna(0)
    other = config_dist.sum() - predictive_distribution.config_dist.sum()
    predictive_distribution.loc[f"{airport_code}:other", "config_dist"] += other

    # put some extra weight on the current configuration (or `other`)
    current_key = f"{airport_code}:{current}"
    if current_key not in pred_frame.config.values:
        current_key = f"{airport_code}:other"
    discount = 1 - discount_factor * (lookahead / 360)
    predictive_distribution["current"] = 0  # initalize a column of zeros
    predictive_distribution.loc[current_key, "current"] = weight * discount

    # combine the components and normalize the result
    mixture = predictive_distribution.sum(axis=1)
    predictive_distribution["mixture"] = mixture / mixture.sum()

    return predictive_distribution.mixture

Here's an example.

In [44]:

pred_dist = make_prediction(pred_frame)
pred_dist

Out[44]:

katl:D_10_8L_A_10_8L              0.004829
katl:D_10_8R_9L_A_10_8L_9R        0.004829
katl:D_10_8R_A_10_8R              0.004829
katl:D_26L_27L_A_26R_27L_28       0.004829
katl:D_26L_27R_28_A_26R_27L_28    0.004829
katl:D_26L_27R_A_26L_27L_28       0.004829
katl:D_26L_27R_A_26L_27R          0.004829
katl:D_26L_27R_A_26L_27R_28       0.004829
katl:D_26L_27R_A_26R_27L          0.004829
katl:D_26L_27R_A_26R_27L_28       0.062545
katl:D_26L_27R_A_26R_27R_28       0.004829
katl:D_26L_27R_A_26R_28           0.004829
katl:D_26L_27R_A_27L_28           0.004829
katl:D_26L_28_A_26L_28            0.004829
katl:D_26L_28_A_26R_27L_28        0.004829
katl:D_26L_28_A_26R_28            0.004829
katl:D_26R_27R_A_26R_27L_28       0.007588
katl:D_26R_28_A_26R_28            0.004829
katl:D_8L_9L_A_10_8L_9R           0.018166
katl:D_8R_9L_A_10_8L_9R           0.783393
katl:D_8R_9L_A_10_8R_9R           0.008508
katl:D_8R_9L_A_10_9R              0.004829
katl:D_8R_9L_A_8L_9R              0.004829
katl:D_8R_9L_A_8R_9L              0.004829
katl:D_8R_9R_A_10_8L_9R           0.004829
katl:D_9L_A_9R                    0.004829
katl:other                        0.018395
Name: mixture, dtype: float64

Choose a training set¶

For the Open arena, we split one year of data into weeks and held out a subset of those weeks as a test set. The remainder is intended for model training. This train-test split assumes that individual weeks are independent of one another, which is almost certainly not the case; airport configuration is correlated over time. So if a week is in the test period and we train on the prior and subsequent weeks, we are likely to overestimate the model's performance when compared to the final evaluation set, which comes months after the end of the training data time range.

Since our model uses only three parameters (and we're not spending much time optimizing them), we aren't going to be too concerned with overfitting. To keep the airport/lookahead balance of the full dataset, we'll sample 10 prediction frames from each airport/lookahead combination in open_train_labels. Just remember: you should think carefully about how you split the data during model training and validation to maximize the chance your model will generalize to the final evaluation dataset.

In [45]:

training_samples = (
    open_train_labels.groupby(["airport", "lookahead"])
    .sample(100, random_state=17)
    .set_index(["airport", "timestamp", "lookahead"])
    .sort_index()
)

training_samples.head()

Out[45]:

			config	active
airport	timestamp	lookahead
katl	2020-11-07 11:00:00	360	katl:D_8R_9L_A_8L_9R	0.0
	2020-11-07 15:00:00	360	katl:other	0.0
	2020-11-07 19:00:00	360	katl:D_8R_9L_A_8R_9L	0.0
	2020-11-08 01:00:00	360	katl:D_26L_27R_A_26R_27L	0.0
	2020-11-08 04:00:00	30	katl:D_8L_9L_A_10_8L_9R	0.0

Check that we have 100 prediction frames from each airport/lookahead.

In [46]:

training_samples.groupby(["airport", "lookahead"]).size().unstack("lookahead")

Out[46]:

lookahead	30	60	90	120	150	180	210	240	270	300	330	360
airport
katl	100	100	100	100	100	100	100	100	100	100	100	100
kclt	100	100	100	100	100	100	100	100	100	100	100	100
kden	100	100	100	100	100	100	100	100	100	100	100	100
kdfw	100	100	100	100	100	100	100	100	100	100	100	100
kjfk	100	100	100	100	100	100	100	100	100	100	100	100
kmem	100	100	100	100	100	100	100	100	100	100	100	100
kmia	100	100	100	100	100	100	100	100	100	100	100	100
kord	100	100	100	100	100	100	100	100	100	100	100	100
kphx	100	100	100	100	100	100	100	100	100	100	100	100
ksea	100	100	100	100	100	100	100	100	100	100	100	100

In [47]:

training_labels = (
    open_train_labels.set_index(["airport", "timestamp", "lookahead"])
    .loc[training_samples.index]  # select the sampled timestamp/lookaheads
    .reset_index()
    .sort_values(["airport", "timestamp", "lookahead", "config"])
)

training_labels.head()

Out[47]:

	airport	timestamp	lookahead	config
0	katl	2020-11-07 11:00:00	360	katl:D_10_8L_A_10_8L
1	katl	2020-11-07 11:00:00	360	katl:D_10_8R_9L_A_10_8L_9R
2	katl	2020-11-07 11:00:00	360	katl:D_10_8R_A_10_8R
3	katl	2020-11-07 11:00:00	360	katl:D_26L_27L_A_26R_27L_28
4	katl	2020-11-07 11:00:00	360	katl:D_26L_27R_28_A_26R_27L_28

To contain the predictions, I'll make a new DataFrame with the same columns as training_labels, but with a new column for active set to np.nan.

In [48]:

predictions = training_labels.copy().assign(active=np.nan)
predictions.head()

Out[48]:

	airport	timestamp	lookahead	config	active
0	katl	2020-11-07 11:00:00	360	katl:D_10_8L_A_10_8L	NaN
1	katl	2020-11-07 11:00:00	360	katl:D_10_8R_9L_A_10_8L_9R	NaN
2	katl	2020-11-07 11:00:00	360	katl:D_10_8R_A_10_8R	NaN
3	katl	2020-11-07 11:00:00	360	katl:D_26L_27L_A_26R_27L_28	NaN
4	katl	2020-11-07 11:00:00	360	katl:D_26L_27R_28_A_26R_27L_28	NaN

Make all the predictions¶

Now I'll generate predictions for each prediction frame in predictions.

The following function iterates through the prediction frames in a given DataFrame and generates a predictive distribution for each. Then it writes the results into the active column.

In [49]:

from tqdm import tqdm


def make_all_predictions(predictions: pd.DataFrame):
    """Predicts airport configuration for all of the prediction frames in a table."""
    all_preds = []
    grouped = predictions.groupby(["airport", "timestamp", "lookahead"], sort=False)
    for key, pred_frame in tqdm(grouped):
        airport, timestamp, lookahead = key
        pred_dist = make_prediction(pred_frame)
        assert np.array_equal(pred_dist.index.values, pred_frame["config"].values)
        all_preds.append(pred_dist.values)

    predictions["active"] = np.concatenate(all_preds)

I can run it with the sample of prediction frames I selected.

In [50]:

make_all_predictions(predictions)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11933/11933 [01:40<00:00, 118.22it/s]

Score it!¶

To compute the loss function, we can use sklearn.metrics.log_loss from scikit-learn.

First, we'll compute separate losses for each airport.

In [51]:

from sklearn.metrics import log_loss

scores = [
    log_loss(group["active"], predictions.loc[group.index, "active"])
    for airport, group in training_labels.groupby("airport")
]
scores

Out[51]:

[0.06877280265054836,
 0.14641838551713782,
 0.09005864995544491,
 0.08364211373814505,
 0.10607712940230635,
 0.11573413588108261,
 0.08233448987243573,
 0.09541606567757414,
 0.09384024243625573,
 0.09344240973485897]

Some are better than others. The overall score that the competition uses is the mean of the individual airport losses. To see whether the score is consistent from one run to another, you can go back and change the random seed.

In [52]:

np.mean(scores)

Out[52]:

0.09757364248657897

Accuracy¶

Log loss does not lend it self to a very intuitive interpretation beyond "lower is better." Other metrics help you understand where your model excels or struggles, generating ideas for how to improve. Let's look into a few options, starting with accuracy, i.e., the percent of times the configuration with the highest predicted probability was the actual configuration. We expect the accuracy of our model to depend on airport and lookahead time, so let's compute accuracy separately for those groups. This is one of the ways the NASA research team reports the performance of their model. (Note that the time period of the data from the paper is not the same as the time period of the competition dataset, so you won't be able to compare scores directly.)

In [53]:

accuracy = []
for airport in sorted(training_labels.airport.unique()):
    airport_labels = training_labels.loc[training_labels.airport == airport]
    airport_predictions = predictions.loc[predictions.airport == airport]
    airport_predictions = (
        airport_predictions.pivot_table(
            index=["timestamp", "lookahead"], columns=["config"], values=["active"]
        )["active"]
        .idxmax(axis=1)
        .rename("active")
        .to_frame()
    )
    airport_labels = (
        airport_labels.pivot_table(
            index=["timestamp", "lookahead"], columns=["config"], values=["active"]
        )["active"]
        .idxmax(axis=1)
        .rename("active")
        .to_frame()
    )
    airport_predictions["correct"] = airport_predictions.active == airport_labels.active
    accuracy.append(
        airport_predictions.groupby("lookahead").correct.mean().rename(airport) * 100
    )

accuracy = pd.concat(accuracy, axis=1)
accuracy.index.rename("Lookahead (minutes)", inplace=True)
accuracy

Out[53]:

	katl	kclt	kden	kdfw	kjfk	kmem	kmia	kord	kphx	ksea
Lookahead (minutes)
30	95.876289	92.929293	88.888889	96.000000	98.000000	90.000000	94.897959	82.000000	95.000000	96.000000
60	92.000000	85.000000	57.000000	85.000000	90.000000	63.265306	82.653061	58.585859	86.000000	96.000000
90	88.000000	83.838384	61.000000	78.000000	89.000000	64.000000	76.767677	61.000000	89.000000	90.909091
120	85.000000	84.000000	54.081633	73.737374	85.000000	44.897959	73.737374	46.000000	82.000000	86.000000
150	67.676768	71.428571	45.000000	68.000000	86.000000	40.404040	73.000000	39.393939	75.757576	84.848485
180	72.727273	62.626263	32.653061	61.616162	81.000000	40.000000	57.000000	31.000000	66.666667	80.808081
210	68.000000	68.000000	30.303030	65.000000	76.000000	33.000000	67.000000	29.000000	65.656566	82.000000
240	62.000000	55.555556	30.612245	57.575758	72.000000	17.171717	52.525253	32.000000	66.666667	81.818182
270	68.686869	59.183673	24.489796	45.454545	67.676768	27.000000	48.000000	32.000000	59.000000	77.777778
300	60.000000	53.000000	17.000000	41.414141	68.686869	23.000000	45.000000	25.252525	60.000000	78.000000
330	51.000000	36.000000	18.000000	35.353535	67.346939	16.161616	53.000000	18.000000	50.000000	78.000000
360	60.204082	52.525253	16.161616	34.000000	60.000000	10.000000	36.363636	16.161616	50.000000	76.767677

In [54]:

axs = accuracy.plot(
    subplots=True, sharey=True, sharex=True, figsize=(12, 6), layout=(2, 5)
)
axs[-1][0].set_ylabel("Accuracy %")

for ax in axs.flatten():
    ax.annotate(
        f"{ax.lines[0].get_ydata()[-1]:0.0f}%",
        (ax.lines[0].get_xdata()[-1], ax.lines[0].get_ydata()[-1]),
        xytext=(-5, 15),
        textcoords="offset points",
        verticalalignment="center",
        horizontalalignment="center",
    )
    ax.annotate(
        f"{ax.lines[0].get_ydata()[0]:0.0f}%",
        (ax.lines[0].get_xdata()[0], ax.lines[0].get_ydata()[0]),
        xytext=(15, -5),
        textcoords="offset points",
        verticalalignment="center",
        horizontalalignment="center",
    )
axs[0][0].figure.suptitle("Accuracy by lookahead")
plt.show()

As you can see, for most airports it is pretty easy to predict the active configuration an 30 minutes from the present. However at larger lookaheads, accuracy varies a lot by airport. Part of the challenge will be finding a model that works well across all airports and for later lookahead times.

The overall accuracy is the mean. Since we have the same number of samples for each airport/lookahead, we can compute the overall accuracy as the mean of the individual airport/lookahead accuracies.

In [55]:

print(f"Overall accuracy: {accuracy.values.mean():2.2f}%")

Overall accuracy: 61.55%

Evaluating on changes only¶

Airport reconfigurations are relatively rare; most of the time, the configuration stays the same. This is partly why a model that only knows the distribution of historical configurations and the current configuration can do pretty well at this task. To really put our model to the test, we can look at how well it performs when evaluated only on prediction frames for which the true configuration is not the same as the current configuration. Let's see how model performance for this only change dataset. We'll use a slight variation of accuracy, "top k" accuracy, where a configuration is marked as correct if it is among the first k most probable configurations.

In [56]:

def compute_airport_top_k_accuracy(
    airport_predictions: pd.DataFrame,
    config_timecourse: pd.Series,
    k: int,
    only_change: bool = False,
) -> float:
    """Computes top k accuracy for a single airport"""
    airport = airport_predictions.iloc[0].airport
    airport_configs = airport_predictions.config.unique()
    top_k_hits = []
    for (timestamp, lookahead), group in airport_predictions.groupby(
        ["timestamp", "lookahead"]
    ):
        lookahead_config = config_timecourse.loc[
            timestamp + pd.Timedelta(minutes=lookahead)
        ]

        if lookahead_config not in airport_configs:
            lookahead_config = f"{airport}:other"

        if only_change:
            # only include samples where a configuration change occurred
            current_config = config_timecourse.loc[timestamp]
            include = current_config != lookahead_config
        else:
            include = True
        if include:
            top_k_predictions = group.sort_values("active", ascending=False).config.iloc[
                :k
            ]
            top_k_hits.append(
                {
                    "lookahead": lookahead,
                    "hit": lookahead_config in top_k_predictions.values,
                }
            )

    return pd.DataFrame(top_k_hits).groupby("lookahead").hit.mean()

In [57]:

def compute_top_k_accuracy(ks: Sequence[int], only_change: bool):
    """Iterates over airports to compute overall top k accuracy"""
    top_ks = {}
    for k in ks:
        airport_top_ks = []
        for airport in predictions.airport.unique():
            config_timecourse = (
                airport_config_df_map[airport]
                .set_index("timestamp")
                .airport_config.resample("30min")
                .ffill()
                .dropna()
            )
            config_timecourse = f"{airport}:" + config_timecourse
            airport_predictions = predictions.loc[predictions.airport == airport]
            airport_top_k = compute_airport_top_k_accuracy(
                airport_predictions, config_timecourse, k=k, only_change=only_change
            )
            airport_top_k.rename(airport, inplace=True)
            airport_top_ks.append(airport_top_k)
        top_ks[k] = pd.concat(airport_top_ks, axis=1).values.mean()

    return pd.Series(
        top_ks.values(), name="accuracy", index=pd.Index(top_ks.keys(), name="k")
    )

In [58]:

top_k_all = compute_top_k_accuracy((1, 3, 5), only_change=False)
top_k_only_change = compute_top_k_accuracy((1, 3, 5), only_change=True)
top_k = (
    pd.concat([top_k_all, top_k_only_change], keys=("All", "Only change"), axis=1) * 100
)

In [59]:

ax = top_k.rename(index={k: f"$k={k}$" for k in top_k.index}).plot(kind="bar")
ax.set_ylabel("Top $\it{k}$ accuracy")
ax.set_xlabel("")
ax.yaxis.set_major_formatter(mpl.ticker.PercentFormatter(decimals=0))

You can see here how much the performance of our simple model relies on the fact that airport reconfigurations are relatively rare! When we evaluate our model on frames for which the future configuration is not the same as the current configuration, we see that our model does quite poorly.

Generate a submission¶

Now let's generate a submission for the competition. First we'll read the submission file to see which prediction frames we're supposed to generate.

In [60]:

submission_format = pd.read_csv(
    DATA_PATH / "open_submission_format.csv", parse_dates=["timestamp"]
)
print(f"{len(submission_format):,} rows x {len(submission_format.columns)} columns")

2,468,880 rows x 5 columns

In [61]:

submission = submission_format.copy()
submission["active"] = np.nan
submission.head()

Out[61]:

	airport	timestamp	lookahead	config	active
0	katl	2021-01-04 06:00:00	30	katl:D_10_8L_A_10_8L	NaN
1	katl	2021-01-04 06:00:00	30	katl:D_10_8R_9L_A_10_8L_9R	NaN
2	katl	2021-01-04 06:00:00	30	katl:D_10_8R_A_10_8R	NaN
3	katl	2021-01-04 06:00:00	30	katl:D_26L_27L_A_26R_27L_28	NaN
4	katl	2021-01-04 06:00:00	30	katl:D_26L_27R_28_A_26R_27L_28	NaN

In [62]:

submission.tail()

Out[62]:

	airport	timestamp	lookahead	config	active
2468875	ksea	2021-06-20 23:00:00	360	ksea:D_34C_A_34C_34L	NaN
2468876	ksea	2021-06-20 23:00:00	360	ksea:D_34R_A_34C	NaN
2468877	ksea	2021-06-20 23:00:00	360	ksea:D_34R_A_34C_34R	NaN
2468878	ksea	2021-06-20 23:00:00	360	ksea:D_34R_A_34L_34R	NaN
2468879	ksea	2021-06-20 23:00:00	360	ksea:other	NaN

And generate predictions.

In [63]:

make_all_predictions(submission)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 97200/97200 [13:11<00:00, 122.73it/s]

Before submitting it is a good idea to check that our submission conforms to the submission format.

In [64]:

submission[["airport", "timestamp", "lookahead", "config"]].equals(
    submission_format[["airport", "timestamp", "lookahead", "config"]]
)

Out[64]:

True

And double check that the predictions for each airport, timestamp, and lookahead combination sum to one.

In [65]:

np.allclose(submission.groupby(["airport", "timestamp", "lookahead"]).sum(), 1)

Out[65]:

True

Finally, we can write the predictions to a CSV file ready to submit to the competition. Note the use of the date_format argument to pd.DataFrame.to_csv; that ensures that dates are stored in the correct format, e.g., "2021-01-01T12:00:00". Also consider submitting a zipped CSV, which can result in a much smaller file and much faster uploads. A valid zipped submission contains a single CSV file, which we can achieve with the compression argument to pd.DataFrame.to_csv.

In [66]:

filename = "submission.csv.zip"

submission.to_csv(
    filename,
    date_format="%Y-%m-%dT%H:%M:%S",
    index=False,
    compression={"method": "zip", "archive_name": "submission.csv"},
)

In [67]:

!ls -lh submission.csv.zip

-rw-rw-r-- 1 robert robert 27M Jan 27 20:42 submission.csv.zip

Let's load the actual submission CSV and check the first few lines to make sure the file is in the right format.

In [68]:

pd.read_csv(filename, parse_dates=["timestamp"], nrows=5)

Out[68]:

	airport	timestamp	lookahead	config	active
0	katl	2021-01-04 06:00:00	30	katl:D_10_8L_A_10_8L	0.004829
1	katl	2021-01-04 06:00:00	30	katl:D_10_8R_9L_A_10_8L_9R	0.004829
2	katl	2021-01-04 06:00:00	30	katl:D_10_8R_A_10_8R	0.004829
3	katl	2021-01-04 06:00:00	30	katl:D_26L_27L_A_26R_27L_28	0.005040
4	katl	2021-01-04 06:00:00	30	katl:D_26L_27R_28_A_26R_27L_28	0.004871

And here is how the submission appears on the competition submissions page:

The test subset score of 0.0980 is pretty close to the training subset score of 0.0976, which is a good sign that the subsets do not have wildly different distributions.

We added this submission (Recency-weighted historical forecast) and the No change forecast as benchmarks on the competition leaderboard so you can compare your models to them throughout the competition.

That concludes our benchmark! There is tons of potential for iterating on this approach and for completely new solutions. We haven't even cracked open the other features, like flight-level air traffic streams and historical weather predictions. Head on over to competition to get started. We're excited to see what you build!

Banner image courtesy of Michael Coghlan.

Run-way Functions: Predict Reconfigurations at US Airports - Benchmark

NASA Airport Configuration Benchmark¶

Get the data¶

Read the training labels¶

Get a sample frame¶

Uniform prediction¶

Make distribution of configurations¶

Censor data¶

Prediction based on data¶

Recency-weighted historical forecast¶

No change forecast¶

Choose a training set¶

Make all the predictions¶

Score it!¶

Accuracy¶

Evaluating on changes only¶

Generate a submission¶

Tags

Latest posts

Meet the winners of the AI for Advancing Instruction Challenge

Automating wildlife monitoring with Zamba & Zamba Cloud

Community Spotlight: Paola Ruiz, Néstor González, Daniel Crovo

Community Spotlight: Kirill Brodt

Jump-starting data infrastructure and in-house data expertise

A production application to support survivors of human trafficking

Life beyond the leaderboard

(Tech) Infrastructure Week for the Nonprofit Sector

Meet the winners of Phase 2 of the PREPARE Challenge

AI sauce on everything: Reflections on ASU+GSV 2025

Open-source packages for using speech data in ML

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Crowdsourcing solutions for AI-assisted early literacy screening

Where to find a data job for a good cause

Meet the Winners of the Youth Mental Health Narratives Challenge

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

10 takeaways from 10 years of data science for social good

Mapping agricultural trends in Yemen during crisis

Making higher education data more accessible

Work with us to build a better world

NASA Airport Configuration Benchmark¶

Get the data¶

Read the training labels¶

Get a sample frame¶

Uniform prediction¶

Make distribution of configurations¶

Censor data¶

Prediction based on data¶

Recency-weighted historical forecast¶

No change forecast¶

Choose a training set¶

Make all the predictions¶

Score it!¶

Accuracy¶

Evaluating on changes only¶

Generate a submission¶

Tags

Stay updated

Latest posts

Work with us to build a better world