Trajectory Forecasting

Upload Date Method Additional Inputs ADE FDE APD FPD Miss Rate

Upload Date Method Additional Inputs ADE FDE APD FPD Miss Rate

Upload Date Method Additional Inputs ADE FDE APD FPD Miss Rate


The goal of AIODrive trajectory forecasting challenge is to predict future locations of multiple agents under complex interactions. These agents include four major types of traffic participants: Car, Pedestrian, Cyclist, and Motorcycle. In addition to standard inputs of past trajectories, users are allowed to use various sensor data we provide to help improve predictions.


This challenge is a joint work between the CMU team (dataset and evaluation setup) and UC Berkeley (Jiachen Li for baseline development).


The evaluation server is open all year round for submission. We intend to organize a number of challenges at major conferences. Users that submit their results during the challenge period are eligible for awards. These awards may be different for each challenge. To participate in the challenge, please create an account at EvalAI, then upload your result file following the submission rule and format explained below. After EvalAI has processed your uploaded results, we will export the results to our leaderboard shown above. This is the only way to benchmark your methods on the AIODrive test set as we do not plan to release the ground truth on the test set.

Start a EvalAI submission, click here.

Active Events

The first AIODrive trajectory forecasting challenge has been introduced at the 3rd Precognition Workshop, CVPR 2021. The submission period will end July 16, 2021. Winner will be announced on this website and the presentation video will be promoted through social media. Also, there is a prize of iPad for the winner! Note that the evaluation server can still be used for benchmarking outside the challenge period.

Evaluation Metrics

We primarily use the ADE (Average Displacement Error) under the 2-second prediction setting on to rank methods. For each N-second prediction setting, ADE is computed by averaging the class-specific ADE over four object categories. Specifically, for each object category, we compute the class-specific ADE by averaging the L2 distance between predicted and ground truth locations (x, z) over frames and over objects. Also, our evaluation supports stochastic prediction so users are allowed to submit up to 20 samples of trajectories for each object. During evaluation, we will compute the best of K (≤ 20) ADE. In addition to the main ADE metric, we also compute a few other metrics including:

  • FDE (Final Displacement Error), which only computes L2 distance between the last frame of predictions and ground truth;
  • APD (Average Pairwise Distance), which computes average L2 distance between all pairs of trajectory samples to measure diversity across samples;
  • FPD (Final Pairwise Distance), which computes average L2 distance between all pairs of trajectory samples but only on the last frame;
  • Miss Rate, which computes the number of object trajectories that are expected to be predicted but are missed in results;

For each ground truth object, we compute above metrics only on the frames that ground truth locations are available, i.e., objects are still in the scene. In other words, we will mask out a set of frames for each ground truth object where it has left out of the scene. For example, when using frames of 0 to 9 to predict trajectories of an object in frames of 10 to 19, we will only evaluate ADE/APD in frame 10 to 16 and FDE/FPD in frame 16 if this object has left the scene in frame 17 to 19. When computing FDE/FPD, we may choose a different last frame for different objects because some objects may have left the scene at a particular frame while other object s are still in the scene.

For more details about evaluation metrics, please refer to our evaluation code here.

Submission Rules

  • Up to 5 seconds of history trajectories can be used for prediction.
  • Users can submit up to 20 trajectory samples for each object. If more than 20 samples are submitted, only the first 20 samples are used.
  • Prediction results must be reported in the ego-vehicle coordinate frame, although users may convert first to global coordinate for inference.
  • We release full trajectories for the train/val set, but only release partial trajectories for the test set which are expected to be used as past data to make prediction. In particular, this means that we release frames of 0-49, 150-199, 300-349, ..., 900-949 for each sequence on the test set and reserve other frames. We expect users to obtain predictions only on windows of frames 50-99, 200-249, ... (if predicting 5s), or 50-69, 200-219, ... (if predicting 2s), or 50-59, 200-209, ... (if predicting 1s). Prediction on other windows will be discarded and will make your result file too big.
  • To further reduce size of the result file, we only require users to predict 10 key frames for each window no matter of the prediction length (e.g., 2s or 5s setting). This means that, for a prediction window of frame 50-99 in the 5s setting, only predictions on frame 50, 55, 60, ..., 95 are needed. Similar for the 2s setting, only predictions on frame 50, 52, 54, ..., 68 are needed for the prediction window of frame 50-69.
  • We require users to make prediction for an object if this object has past trajectories in at least one key frame. For example, if an object has past data in frame 0, 5, 10 but has no trajectory data in frame 15 to 45 due to occlusion (may re-appear in later frames), we still expect users to make prediction for this object in frame 50, 55, ..., 95, in the 5-second setting. Filtering out such objects will lead to a higher value in the miss rate metric.
  • Prediction for four object categories must be included to compute the average ADE over class-specific ADEs. Fail to include results of four object classes will lead to metrics filled with the NaN value.
  • Prediction for the 2-second setting must be included as we will use the ADE metric in this setting to rank methods. We also encourage users to submit results for the 1-second or 5-second setting, which will be exported here in the leaderboard for comparison.
  • We release sensor data for train, val and test sets, which users can leverage and improve prediction performance.
  • Every submission must follow the results format explained below.
  • We encourage user to release code, but do not make it a requirement.
  • Each user or team can have at most one one account on the evaluation server.
  • Each user or team can submit ≤ 3 times per day, ≤ 10 times per month and ≤ 30 times in total. Invalid submissions do not count against this total.

For trajectories on the train/val set, users can easily extract from the label files here. For partial trajectories on the test set, please find here, which also includes trajectories on the train/val set for convenience.

Submission Format

We require the submission file to be a single json file, containing a hierarchical dictionary. Specifically, the keys from the outermost level to the innermost level are: 1) prediction length, e.g., '10', '20', '50'; 2) object class, e.g., 'Car', 'Ped', 'Cyc', 'Mot'; 3) sequence name, e.g., Town07_seq0000; 4) first frame of the current prediction window. For example, this should be '50' if predicting frames 50 to 59 using past frames of 40 to 49; 5) sample index, e.g., '0', '1', ..., '19'; 6) object ID that is being predicted.

In the innermost level, the dictionary should have two keys: 1) 'state': with values of an N x 2 matrix indicating the predicted ground positions of objects in N frames; 2) 'prob': with a probability value for this particular trajectory sample. In summary, the dictionary should look something like the following:

{pred_len1: {obj_class: {seqname: {frame: {sample: {ID: {'state': N x 2, 'prob': 0.83}}}}}}, pred_len2:{...}, pred_len3: {...}}

Note that all keys in the dictionary should be in string format. As we only require users to predict 10 key frames no matter of the prediction length, N should be always 10. Because we only expect prediction on a set of pre-defined windows, the first frame of the window should be '50', '200', '350', ..., '950'. To obtain valid evaluation, please strictly follow the above format.

For a sample of valid submission file on the test set, please find here. For a sample of valid submission file on the val set, please find here. If there are additional questions about the submission format, please contact us.

Starter Code

We provide Social-GAN (CVPR 2018) as our starter code for this challenge. Our implementation is built on top of the offcial Social-GAN PyTorch code here. The main implementation change is the data loader that can deal with the AIODrive data format, and also the output format that will generate valid json file. If you would like to build your own method on top of the starter code, or borrow functions such as the data loader, you can find the starter code here. More baselines and starter code will come soon.