# Learning to Predict Vehicle Trajectories with Model-based Planning In Submission

Hong Kong University of Science and Technology

### Abstract

Predicting the future trajectories of on-road vehicles is critical for autonomous driving. In this paper, we introduce a novel prediction framework called PRIME, which stands for Prediction with Model-based Planning. Unlike recent prediction works that utilize neural networks to model scene context and produce unconstrained trajectories, PRIME is designed to generate accurate and feasibility-guaranteed future trajectory predictions, which guarantees the trajectory feasibility by exploiting a model-based generator to produce future trajectories under explicit constraints and enables accurate multimodal prediction by using a learning-based evaluator to select future trajectories. We conduct experiments on the large-scale Argoverse Motion Forecasting Benchmark. Our PRIME outperforms state-of-the-art methods in prediction accuracy, feasibility, and robustness under imperfect tracking. Furthermore, we achieve the 1st place on the Argoverse Motion Forecasting Leaderboard.

### Motivation & Key idea

In traffic scenarios, most vehicles operate under their inherent kinematic constraints (e.g., non-holonomic motion constraints for vehicles) while in compliance with the road structure (e.g., lane connectivity, static obstacles) and semantic information (e.g., traffic lights, speed limits). All these kinematic and environmental constraints explicitly regularize the trajectory space.
However, most existing future prediction approaches model traffic agents as points and produce sequences of future positions without constraints. Such constraint-free predictions may be incompliant with kinematic or environmental characteristics, which gives rise to massive uncertainty in the predicted future states. Consequently, the downstream planning module would inevitably undergo some extra burdens and even the "freezing robot problem."
Moreover, the recent learning-based prediction models follow the typical paradigm of generating trajectory predictions by network regression that highly relies on long-term tracking results. But for some dense driving scenarios where the target would be momently occluded or suddenly appears within the sensing range, tracking results are discontinuous or not accumulated enough. The prediction accuracy would degrade under such imperfect tracking cases.

Toward overcoming these challenges, we propose a novel prediction architecture called PRIME. The critical idea is to exploit a model-based motion planner as the prediction generator to sample feasible future trajectories under explicit constraints, together with a deep neural network as the prediction evaluator to model implicit interactions and select future trajectories by scoring. The novel architecture contributes to accurate, feasible, and robust trajectory predictions.
More specifically, the model-based generator (left) which samples the target's feasible future trajectories $$\mathcal{T}$$ by taking its real-time state $$\mathbf{s}_{tar}^0$$ and the map $$\mathcal{M}$$, while explicitly imposing kinematical and environmental constraints to guarantee trajectory feasibility; the learning-based evaluator (right) which receives the feasible trajectories $$\mathcal{T}$$ and all observed tracks $$\mathcal{S}$$ to model the implicit interactions among all traffic agents, and selects a final set of feasible trajectories $$\mathcal{T}_{tar}\subset\mathcal{T}$$ as the prediction result.

### Framework Overview

The model-based generator searches reachable paths $$\mathcal{P}$$ through the map with Depth-First-Search and samples a set of feasible future trajectories $$\mathcal{T}$$ with the Frenet Planer. This part is detailed in our paper.
The learning-based evaluator first encodes scene context given by $$(\mathcal{P}, \mathcal{T}, \mathcal{S})$$, including $$l$$ paths in $$\mathcal{P}$$, $$(m+1)$$ history tracks in $$\mathcal{S}$$ and $$n$$ future trajectories in $$\mathcal{T}$$. The implicit agent-map interactions are learned in the subsequent attention modules: P2T and P2F propagate the spatial information of each reference path $$\mathcal{P}_i$$ into history tracks and corresponding future trajectories, and A2A takes track tensors from P2T to capture the multi-agent interactions. As the path-based Frenet coordinate is used in our dual spatial representation, P2T, P2F, and A2A operate for each path, while F2F fuses all the future trajectories processed by P2F to obtain a global understanding for the reachable space. Subsequently, each feasible trajectory $$\mathcal{T}_{i,j}$$ could query its track tensor $$\mathbf{X}_i(\mathbf{s}_{tar})$$ from P2T, interaction tensor $$\mathbf{Y}_i(\mathbf{s}_{tar})$$ from A2A and future tensor $$\mathbf{Z}(\mathcal{T}_{i,j})$$ from F2F, and it is scored by feeding the concatenation of these tensors to fully-connected layers. Finally, the evaluator ranks all feasible future trajectories in $$\mathcal{T}$$ by scoring and outputs a final set of $$K$$ predicted trajectories.

### Qualitative Results

Qualitative results under various scenarios on the Argoverse validation set. The model-based generator produces the set of future trajectories $$\mathcal{T}$$ (blue) with feasibility guaranteed, which well regularize the target vehicle's future trajectory space. The learning-based evaluator selects $$K$$ trajectories from $$\mathcal{T}$$ as multimodal prediction results (red), and the depth of red indicates their probability.

### More Comparisons

Qualitative comparisons between ours (left) and LaneGCN (right) on the Argoverse validation set, with the same coloring scheme. Here, we use the current state-of-the-art method, LaneGCN, as a representative for typical prediction models that generate unconstrained trajectories by neural networks.

We show some common failures, including kinematically and environmentally infeasible predictions. Due to kinematic constraints, vehicles cannot take a turn suddenly at high speed (1st row), or reverse their moving directions (2nd row). Also, the prediction results of turning with across lane boundaries (3rd row), or heading towards reverse lanes (4th row) are incompliant with environmental constraints. Such infeasible predictions would cause redundant burdens for an AV to make decisions and motion plans. By contrast, the future trajectory set (blue) produced by our model-based generator is explicitly regularized by kinematic and environmental constraints, and thereupon, makes accurate and reasonable future predictions (red).

### BibTeX

@article{song2021learning,
title={Learning to Predict Vehicle Trajectories with Model-based Planning},
author={Song, Haoran and Luan, Di and Ding, Wenchao and Wang, Michael Yu and Chen, Qifeng},
journal={arXiv preprint arXiv:2103.04027},
year={2021}
}