Cleaning up a self-driving car dataset
Fixing training data with AI itself
Back in February 2020, Roboflow founder Brad Dwyer wrote a Reddit post raising issues with Udacity’s self-driving car dataset. Commercial self-driving car teams have released data under their own agreements, but fully open datasets like this are less common. Brad fixed the dataset manually:
Udacity updated their README to make it clear that this data is for research and educational purposes, but hasn’t otherwise made changes since 2018.
I wondered if I could come up with a process to patch the Udacity dataset programmatically, training a model on the existing images and labels.
I’m also posting my exploration as a CoLab notebook.
Loading drive data
I torrented one of the initial drive datasets, CH2_001, which was recorded in 2016 and takes up 493MB (uncompressed). There are other, ~10x larger drives in this dataset, but they use a format called ROSBAG, which proved difficult to explore on my machine.
Planning my strategy
Method 1: relabel: use a pretrained model (YOLOv3 or Mask R-CNN) to count objects in frame and compare to label
Method 2: entrances and exits: analyze sequential frames where the number of labeled cars and pedestrians changes (or the bounding boxes indicate one has entered and one has left)
Method 3: finetune in parts: finetune a pretrained model on 1/2 or 2/3 of the images and use it to update the remaining portion
Method 4: finetune on human interventions: use some of the manually fixes from Roboflow to finetune a model specifically on interventions; for example if pedestrians were consistently missing in the original dataset, we need to train on an improved dataset and not part of the original.
Choosing a method depends on the type of mistakes:
- In the simplest case, if the labelers failed to label objects on some frames, it could be possible to interpolate bounding boxes without any machine learning. The same approach would be useless if pedestrians enter and exit view without ever being labeled.
- Generic vs. custom, finetuned models: I should try running YOLO and RCNN on the original dataset, see if they ‘see’ the objects which we care about, and if those labels are an improvement. If the improvement isn’t enough, I then should consider finetuning.
- Looking at individual frames is a problem. There will be frames where a car is temporarily hidden by other cars or objects, but a human reviewer or a video stream-based model would remember it.
In this case, remember we are NOT labeling everything from scratch, but looking to see if machine learning can fill in missing labels.
YOLO Working Alone
Here we go — I run keras-yolo3 on CoLab (pip install tensorflow==1.15)
We have a strong start:
YOLO puts two overlapping boxes on this truck:
I scan the first 1,000 rows of the dataset labels for one with a pedestrian.head -n 1000 ./drive/My\ Drive/mlin/driver/objects/labels_crowdai.csv
| grep ‘Pedestrian’
YOLO didn’t see a person here, but neither did I at first… you can see the figure in shadow, just on the edge of the right-most car bounding box.
The next test works better — the pedestrians are visible, and YOLO (red) and Udacity (blue) both see them:
To measure effectiveness, I decided on three steps:
- Visually review a few frames where YOLO detects a pedestrian missing from Udacity’s originals
- Count number of frames where Roboflow manually added pedestrians to originals (taking this as a coverage goal), and measure what % of these were also detected by YOLO
- Search for any pedestrians detected by YOLO but missed by Roboflow
Results of off-the-shelf model
This was rather disappointing — YOLO found people in very few frames where the original had no pedestrians. Most that I’ve reviewed are false positives. This is an interesting one where YOLO identified the motorcyclist as a person, while Udacity/CrowdAI left them unlabeled:
Notebook link: