ML Prediction, Planning and Simulation for Self-Driving

ML prediction, planning and simulation for self-driving

This repository and the associated datasets constitute a framework for developing learning-based solutions to prediction, planning and simulation problems in self-driving. State-of-the-art solutions to these problems still require significant amounts of hand-engineering and unlike, for example, perception systems, have not benefited much from deep learning and the vast amount of driving data available.

The purpose of this framework is to enable engineers and researchers to experiment with data-driven approaches to planning and simulation problems using real world driving data and contribute to state-of-the-art solutions.

Modern AV pipeline

This software is developed by Lyft Level 5 self-driving division and is open to external contributors.


You can use this framework to build systems which:

  • Turn prediction, planning and simulation problems into data problems and train them on real data.

  • Use neural networks to model key components of the Autonomous Vehicle (AV) stack.

  • Use historical observations to predict future movement of cars around an AV.

  • Plan behavior of an AV in order to imitate human driving.

  • Study the improvement in performance of these systems as the amount of data increases.

We provide several notebooks with examples and applications.

L5Kit Usage

Our visualisation notebook is the perfect place to start if you want to know more about L5Kit.

Agent Motion Prediction

Related to our 2020 competition, we provide a notebook to train and test our baseline model for predicting future agents trajectories.


We provide 3 notebooks for a deep dive into planning for a Self Driving Vehicle (SDV). Please refer to our README for a full description of what you can achieve using them:

We also provide pre-trained models for this task. Please refer to the training notebook.



The framework consists of three modules:

  1. Datasets - data available for training ML models.

  2. L5Kit - the core library supporting functionality for reading the data and framing planning and simulation problems as ML problems.

  3. Examples - an ever-expanding collection of jupyter notebooks which demonstrate the use of L5Kit to solve various AV problems.

1. Datasets

To use the framework you will need to download the Lyft Level 5 Prediction dataset from It consists of the following components:

  • 1000 hours of perception output logged by Lyft AVs operating in Palo Alto. This data is stored in 30 second chunks using the zarr format.

  • A hand-annotated, HD semantic map. This data is stored using protobuf format.

  • A high-definition aerial map of the Palo Alto area. This image has 8cm per pixel resolution and is provided by NearMap.

To read more about the dataset and how it was generated, read the dataset whitepaper.

Note (08-24-20): The new version of the dataset includes dynamic traffic light support. Please update your L5Kit version to v1.0.6 to start using this functionality.

Download the datasets

Register at and download the 2020 Lyft prediction dataset. Store all files in a single folder to match this structure:

  +- scenes/
        +- sample.zarr
        +- train.zarr
        +- train_full.zarr
  +- aerial_map/
        +- aerial_map.png
  +- semantic_map/
        +- semantic_map.pb
  +- meta.json

You may find other downloaded files and folders (mainly from aerial_map), but they are not currently required by L5Kit

2. L5Kit

L5Kit is a library which lets you:

  • Load driving scenes from zarr files

  • Read semantic maps

  • Read aerial maps

  • Create birds-eye-view (BEV) images which represent a scene around an AV or another vehicle

  • Sample data

  • Train neural networks

  • Visualize results

3. Examples

The examples folder contains examples in jupyter notebook format which you can use as a foundation for building your ML planning and simulation solutions. Currently we provide two examples, with more to come soon:

Dataset visualization

A tutorial on how to load and visualize samples from a dataset using L5Kit.

Agent motion prediction

An example of training a neural network to predict the future positions of cars nearby an AV. This example is a baseline solution for the Lyft 2020 Kaggle Motion Prediction Challenge.


Installing as a User

Follow this workflow if:

  • you’re not interested in developing and/or contributing to L5Kit;

  • you don’t need any features from a specific branch or latest master and you’re fine with the latest release;

1. Install the package from pypy (in your project venv)

pip install l5kit

You should now be able to import from L5Kit (e.g. from import ChunkedDataset should work)

2. Run example

Examples are not shipped with the package, but you can download the zip release from: L5Kit Releases

Please download the zip matching your installed version (you can run pip freeze | grep l5kit to get the right version) Unzip the files and grab the example folder in the root of the project.

jupyter notebook examples/visualisation/visualise_data.ipynb

Installing as a Developer

Follow this workflow if:

  • you want to test latest master or another branch;

  • you want to contribute to L5Kit;

  • you want to test the examples using a non-release version of the code;

1. Clone the repo

git clone
cd l5kit/l5kit

Please note the double l5kit in the path, as we need to cd where file is.

3. Install L5Kit

3.1 Deterministic Build (Suggested)

We support deterministic build through pipenv.

Once you’ve installed pipenv (or made it available in your env) run:

pipenv sync --dev

This will install all dependencies (--dev includes dev-packages too) from the lock file.

3.1 Latest Build

If you don’t care about determinist builds or you’re having troubles with packages resolution (Windows, Python<3.7, etc..), you can install directly from the by running:

pip install -e .[dev]

If you run into trouble installing L5Kit on Windows, you may need to

  • install Pytorch and torchvision manually first (select the correct version required by your system, i.e. GPU or CPU-only), then run L5Kit install (remove the packages torch and torchvision from

  • install Microsoft C++ Build Tools.

4. Generate L5Kit code html documentation (optional)

sphinx-apidoc --module-first --separate -o API/ l5kit/l5kit l5kit/l5kit/tests*
sphinx-build . docs

5. Run example

jupyter notebook examples/visualisation/visualise_data.ipynb


We use Apache 2 license for the code in this repo.



The framework was developed at Lyft Level 5 and is maintained by the following authors and contributors:


If you are using L5Kit or dataset in your work please cite the following whitepaper:

    title={One Thousand and One Hours: Self-driving Motion Prediction Dataset},
    author={John Houston and Guido Zuidhof and Luca Bergamini and Yawei Ye and Ashesh Jain and Sammy Omari and Vladimir Iglovikov and Peter Ondruska},

Lyft Level 5


If you find problem or have questions about L5Kit please feel free to create github issue or reach out to!