by
Casey Fitzpatrick
Benchmark - Hakuna Ma-data: Identify Wildlife on the Serengeti with AI for Earth¶
Welcome to the benchmark solution tutorial for our new competition in partnership with AI for Earth! In this computer-vision competition, you are tasked with identifying animal species in camera trap footage. The training data consists of over 2.5 million sequences of images collected using camera traps placed in the Serengeti region of Africa. The sequences are one-hot-labeled for 53 different species groups, or as empty
. For each sequence (which may be multiple images), you will generate a submission that consists of probabilities for each possible class.
But the fun doesn't stop there! In this competition, you will not be submitting a csv of predictions. Instead, you will submit the code that performs inference on the test data, and we will execute that code in the cloud to generate and score your submission.
In this benchmark, we'll walk through a first-pass approach to loading, understanding, and preparing the data. We'll use an out-of-the-box transfer learning approach to train a Keras model on a subset of the training data. Then we'll explain how to package up your model and submit a file capable of running in our cloud-based execution environment. With all these pipes connected, you'll be ready to hit the ground running.
We've got a large territory to cover, so let's get started!
Data Exploration¶
Our training set consists of 10 "seasons" of footage. We can get information about each seson's sequences and labels using the training metadata, which provides links between image flenames and sequences, as well as the training labels, which tell us what's in a given sequence.
You can download the metadata as well as the actual image data here. Keep in mind, this is a very large training set, so make sure you have a storage solution. The complete set of images for all 10 seasons is nearly 5 TB! But the image data is split by season, so you can download image files one season at a time. For this benchmark, we're only going to work with a couple of seasons.
Let's load the metadata and look at what we have.
import json
from pathlib import Path
import numpy as np
import pandas as pd
pd.set_option('max_colwidth', 80)
# This is where our downloaded images and metadata live locally
DATA_PATH = Path.cwd().parent / "data/final/public/"
train_metadata = pd.read_csv(DATA_PATH / "train_metadata.csv")
train_labels = pd.read_csv(DATA_PATH / "train_labels.csv", index_col="seq_id")
train_metadata.head()
Number of images in the train set.
train_metadata.shape[0]
That's a lot of images! However, each image is associated to a sequence. And our predictions will be at the sequence level.
A sequence is an ordered series, in this case of camera trap images ordered in time. When the camera trap is triggered, it often takes more than one image, yielding a an image sequence. In the data, each sequence has a seq_id
, which is the index for the labels. The order of images in a given sequence can be inferred from the last four digits in the filename.
Since our predictions are made at the sequence level, this means that there is one label for any given sequence. Imagine that lion walks by and triggers the camera trap, which takes 4 pictures. If the lion keeps walking, they may only appear in the first three pictures in the sequence and be out of frame by the time the 4th image is taken. Despite the 4th frame being empty of any lion, that sequence of 4 images would still be labeled as lion
.
Let's confirm that each label in train_labels
corresponds to a unique seq_id
.
assert train_metadata.seq_id.nunique() == train_labels.index.shape[0]
# number of sequences
train_metadata.seq_id.nunique()
We have different seasons in the training set, and different numbers of images per season. We can see which season an image belongs to by looking at the first few characters of the sequence ID.
train_metadata['season'] = train_metadata.seq_id.map(lambda x: x.split('#')[0])
train_metadata.season.value_counts().sort_index()
Keep in mind that the test set comes from seasons not represented in the training set. So our model needs to generalize to seasons it hasn't seen before.
As we can see, location values are not unique between seasons. So while our model will need to generalize across seasons, it might get to "revisit" the same location from season to season.
Below we see that the number of images we have for each sequence varies, but by far most sequences have between 1 and 3 images in them.
train_metadata.groupby('seq_id').size().value_counts().sort_index()
For this benchmark, we're going to simplify the problem by taking only the first image from each sequence. The justifying assumption here is that the period of time immediately after a camera trap is first triggered is the most likely time to see an animal in frame. However, this may not always be the case and ultimately you'll probably want to give your model as much information as you can by using more images per sequence.
#reduce to first frame only for all sequences
train_metadata = train_metadata.sort_values('file_name').groupby('seq_id').first()
Now, let's look at the labels. Each sequence label is a one-hot-encoded row vector with a 1
in the coloumn corresponding to a species that is present, and a 0
otherwise. Each row corresponds to a unique sequence ID.
train_labels.head()
Though most sequences have only one animal in them, it is possible to mutliple animals in a single sequence.
train_labels.sum(axis=1).value_counts()
Notice below that not only is empty
a category, but it's the most dominant category––by far! This is because, as useful as camera traps are, they tend to register many false positives. For example, they are often triggered by wind blowing plant life around, or fluctuations of heat and light in the surrounding area.
train_labels.mean(axis=0).sort_values(ascending=False)
Now that we have a sense of what the data means, let's get access to the images themselves!
Dealing With Large Amounts of Data¶
Since there are millions of high-quality images, unzipping all the data takes a long time. We recommend starting your model development with a single season or two of data while the rest of the data downloads. In this benchmark, we'll work with seasons 1 and 3.
train_metadata = train_metadata[train_metadata.season.isin(['SER_S1', 'SER_S3'])]
train_labels = train_labels[train_labels.index.isin(train_metadata.index)]
Add Full Image Path to the Data¶
Our local data is mounted under the /databig/raw
directory in folders that match the name of the zipfile.
IMAGE_DIR = Path("/databig/raw")
We'll convert the file_name
column to a Path
object with the full path to our data.
train_metadata['file_name'] = train_metadata.apply(
lambda x: (IMAGE_DIR / f'SnapshotSerengeti_S0{x.season[-1]}_v2_0' / x.file_name), axis=1
)
train_metadata.head()
Before we get into the weeds of modeling, let's take a quick break to look at some of animals in this data!
from IPython.display import Image
def look_at_random_animal(name, random_state, width=500):
seq_ids = train_labels[train_labels[name] == 1].index
file_names = train_metadata.loc[seq_ids].file_name
filename = file_names.sample(random_state=random_state).values[0]
return Image(filename=str(filename), width=width)
A particular animal that catches our eye is the ... zorilla
? Let's check it out.
look_at_random_animal("zorilla", random_state=111)
Cute. What else?
look_at_random_animal("wildebeest", random_state=101)