Learning to Predict Vehicle Trajectories with
Modelbased Planning
In Submission
Abstract
Predicting the future trajectories of onroad vehicles is critical for autonomous driving. In this paper, we introduce a novel prediction framework called PRIME, which stands for Prediction with Modelbased Planning. Unlike recent prediction works that utilize neural networks to model scene context and produce unconstrained trajectories, PRIME is designed to generate accurate and feasibilityguaranteed future trajectory predictions, which guarantees the trajectory feasibility by exploiting a modelbased generator to produce future trajectories under explicit constraints and enables accurate multimodal prediction by using a learningbased evaluator to select future trajectories. We conduct experiments on the largescale Argoverse Motion Forecasting Benchmark. Our PRIME outperforms stateoftheart methods in prediction accuracy, feasibility, and robustness under imperfect tracking. Furthermore, we achieve the 1st place on the Argoverse Motion Forecasting Leaderboard.
Motivation & Key idea
In traffic scenarios, most vehicles operate under their inherent kinematic constraints (e.g., nonholonomic motion constraints for vehicles)
while in compliance with the road structure (e.g., lane connectivity, static obstacles)
and semantic information (e.g., traffic lights, speed limits).
All these kinematic and environmental constraints explicitly regularize the trajectory space.
However, most existing future prediction approaches model traffic agents as points and produce sequences of future positions without constraints.
Such constraintfree predictions may be incompliant with kinematic or environmental characteristics,
which gives rise to massive uncertainty in the predicted future states.
Consequently, the downstream planning module would inevitably undergo some extra burdens and even the "freezing robot problem."
Moreover, the recent learningbased prediction models follow the typical paradigm of generating trajectory predictions by network regression that highly relies on longterm tracking results.
But for some dense driving scenarios where the target would be momently occluded or suddenly appears within the sensing range, tracking results are discontinuous or not accumulated enough.
The prediction accuracy would degrade under such imperfect tracking cases.
Toward overcoming these challenges, we propose a novel prediction architecture called PRIME.
The critical idea is to exploit a modelbased motion planner as the prediction generator to sample feasible future trajectories under explicit constraints,
together with a deep neural network as the prediction evaluator to model implicit interactions and select future trajectories by scoring.
The novel architecture contributes to accurate, feasible, and robust trajectory predictions.
More specifically,
the modelbased generator (left) which samples the target's feasible future trajectories \(\mathcal{T}\)
by taking its realtime state \(\mathbf{s}_{tar}^0\) and the map \(\mathcal{M}\),
while explicitly imposing kinematical and environmental constraints to guarantee trajectory feasibility;
the learningbased evaluator (right) which receives the feasible trajectories \(\mathcal{T}\) and all observed tracks \(\mathcal{S}\)
to model the implicit interactions among all traffic agents, and selects a final set of feasible trajectories \(\mathcal{T}_{tar}\subset\mathcal{T}\) as the prediction result.
Framework Overview
The modelbased generator searches reachable paths \(\mathcal{P}\) through the map with DepthFirstSearch
and samples a set of feasible future trajectories \(\mathcal{T}\) with the Frenet Planer.
This part is detailed in our paper.
The learningbased evaluator first encodes scene context given by \((\mathcal{P}, \mathcal{T}, \mathcal{S})\),
including \(l\) paths in \(\mathcal{P}\), \((m+1)\) history tracks in \(\mathcal{S}\) and \(n\) future trajectories in \(\mathcal{T}\).
The implicit agentmap interactions are learned in the subsequent attention modules:
P2T and P2F propagate the spatial information of each reference path \(\mathcal{P}_i\) into history tracks and corresponding future trajectories,
and A2A takes track tensors from P2T to capture the multiagent interactions.
As the pathbased Frenet coordinate is used in our dual spatial representation, P2T, P2F, and A2A operate for each path,
while F2F fuses all the future trajectories processed by P2F to obtain a global understanding for the reachable space.
Subsequently, each feasible trajectory \(\mathcal{T}_{i,j}\) could query its track tensor \(\mathbf{X}_i(\mathbf{s}_{tar})\) from P2T,
interaction tensor \(\mathbf{Y}_i(\mathbf{s}_{tar})\) from A2A and future tensor \(\mathbf{Z}(\mathcal{T}_{i,j})\) from F2F,
and it is scored by feeding the concatenation of these tensors to fullyconnected layers.
Finally, the evaluator ranks all feasible future trajectories in \(\mathcal{T}\) by scoring and outputs a final set of \(K\) predicted trajectories.
Quantitative Results
Qualitative Results
Qualitative results under various scenarios on the Argoverse validation set. The modelbased generator produces the set of future trajectories \(\mathcal{T}\) (blue) with feasibility guaranteed, which well regularize the target vehicle's future trajectory space. The learningbased evaluator selects \(K\) trajectories from \(\mathcal{T}\) as multimodal prediction results (red), and the depth of red indicates their probability.







More Comparisons
Qualitative comparisons between ours (left) and LaneGCN (right) on the Argoverse validation set, with the same coloring scheme.
Here, we use the current stateoftheart method, LaneGCN,
as a representative for typical prediction models that generate unconstrained trajectories by neural networks.
We show some common failures, including kinematically and environmentally infeasible predictions.
Due to kinematic constraints, vehicles cannot take a turn suddenly at high speed (1st row), or reverse their moving directions (2nd row).
Also, the prediction results of turning with across lane boundaries (3rd row), or heading towards reverse lanes (4th row) are incompliant with environmental constraints.
Such infeasible predictions would cause redundant burdens for an AV to make decisions and motion plans.
By contrast, the future trajectory set (blue) produced by our modelbased generator
is explicitly regularized by kinematic and environmental constraints, and thereupon, makes accurate and reasonable future predictions (red).







BibTeX
@article{song2021learning, title={Learning to Predict Vehicle Trajectories with Modelbased Planning}, author={Song, Haoran and Luan, Di and Ding, Wenchao and Wang, Michael Yu and Chen, Qifeng}, journal={arXiv preprint arXiv:2103.04027}, year={2021} }