DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction

Weiyi Lv1*, Yuhang Huang2*, Ning Zhang3, Ruei-Sung Lin3, Mei Han3, Dan Zeng1✉,
1Shanghai University, 2National University of Defense Technology, 3PAII Inc.

overview

Abstract

overview

overview

In Multiple Object Tracking, objects often exhibit non-linear motion of acceleration and deceleration, with irregular direction changes. Tacking-by-detection (TBD) trackers with Kalman Filter motion prediction work well in pedestrian-dominant scenarios but fall short in complex situations when multiple objects perform non-linear and diverse motion simultaneously. To tackle the complex non-linear motion, we propose a real-time diffusion-based MOT approach named DiffMOT. Specifically, for the motion predictor component, we propose a novel Decoupled Diffusion-based Motion Predictor. It models the entire distribution of various motion presented by the data as a whole. It also predicts an individual object's motion conditioning on an individual's historical motion information. Furthermore, it optimizes the diffusion process with much less sampling steps. As a MOT tracker, the DiffMOT is real-time at 22.7FPS, and also outperforms the state-of-the-art on DanceTrack and SportsMOT datasets with 62.3% and 76.2% in HOTA metrics, respectively. To the best of our knowledge, DiffMOT is the first to introduce a diffusion probabilistic model into the MOT to tackle non-linear motion prediction.

DanceTrack Results




SportsMOT Results

BibTeX

@inproceedings{lv2024diffmot,
      title={DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction},
      author={Lv, Weiyi and Huang, Yuhang and Zhang, Ning and Lin, Ruei-Sung and Han, Mei and Zeng, Dan},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      pages={19321--19330},
      year={2024}
    }