Trajectory-Aware Motion Generation for Enhanced Naturalness in Interactive Applications

Main Article Content

Xuan Liu
Shaojun Yuan
Zhiyang Zhang
Xiangyu Qu
Yidian Liu
Chaomurilige
Zheng Liu
Shan Jiang

Abstract

Human motion generation is a pivotal task in the field of data generation, with trajectory-guided methods emerging as a prominent approach due to their ability to provide precise control over motion outcomes. However, achieving a balance between motion naturalness and trajectory accuracy remains a significant challenge. In this paper, we present a novel method, Trajectory-Aware Motion Generator (TAMG) that optimally addresses this challenge. TAMG integrates third-order dynamic features, namely position, velocity, and acceleration, to enhance the naturalness of generated motions while maintaining precise trajectory control. We propose a multimodal feature fusion strategy that combines biomechanical features to ensure accurate motion representation, alongside a sparse sampling strategy based on motion importance distribution to focus on key phases of joint motion. The effectiveness of TAMG is validated through extensive experiments, which demonstrate its superior performance in both trajectory accuracy and motion quality compared to existing methods. This approach offers a simple, effective solution for interactive motion generation tasks, advancing the state of the art in trajectory-guided motion generation.

Article Details

Liu, X., Yuan, S., Zhang, Z., Qu, X., Liu, Y., Chaomurilige, … Jiang, S. (2025). Trajectory-Aware Motion Generation for Enhanced Naturalness in Interactive Applications. Journal of Artificial Intelligence Research and Innovation, 085–093. https://doi.org/10.29328/journal.jairi.1001010
Research Articles

Copyright (c) 2025 Liu X, et al.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Wang CY, Zhou Q, Fitzmaurice G, Anderson F. Videoposevr: authoring virtual reality character animations with online videos. Proc ACM Hum Comput Interact. 2022;6(ISS). Available from: https://doi.org/10.1145/3567728

Ye H, Kwan KC, Su W, Fu H. Aranimator: in-situ character animation in mobile AR with user-defined motion gestures. ACM Trans Graph. 2020;39(4). Available from: https://doi.org/10.1145/3386569.3392404

Hu L, Zhang B, Zhang P, Qi J, Cao J, Gao D, Zhao H, Feng X, Wang Q, Zhuo L, Pan P, Xu Y. A virtual character generation and animation system for e-commerce live streaming. In: Proceedings of the 29th ACM International Conference on Multimedia. MM’21. New York: Association for Computing Machinery; 2021;1202–1211. Available from: https://doi.org/10.1145/3474085.3481547

Thomas S, Ferstl Y, McDonnell R, Ennis C. Investigating how speech and animation realism influence the perceived personality of virtual characters and agents. In: 2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 2022;11–20. Available from: https://doi.org/10.1109/VR51125.2022.00018

Qi J, Jiang G, Li G, Sun Y, Tao B. Intelligent human-computer interaction based on surface EMG gesture recognition. IEEE Access. 2019;7:61378–61387. Available from: https://doi.org/10.1109/ACCESS.2019.2914728

Wang X, Yan K. Immersive human-computer interactive virtual environment using large-scale display system. Future Gener Comput Syst. 2019;96:649–659. Available from: https://doi.org/10.1016/j.future.2017.07.058

Zhou H, Wang D, Yu Y, Zhang Z. Research progress of human-computer interaction technology based on gesture recognition. Electronics. 2023;12(13). Available from: https://doi.org/10.3390/electronics12132805

Xu P. A real-time hand gesture recognition and human-computer interaction system. arXiv e-prints. 2017;1704.07296. Available from: https://doi.org/10.48550/arXiv.1704.07296

Tevet G, Raab S, Gordon B, Shafir Y, Cohen-Or D, Bermano AH. Human motion diffusion model. arXiv e-prints. 2022;2209.14916. Available from: https://doi.org/10.48550/arXiv.2209.14916

Xie Y, Jampani V, Zhong L, Sun D, Jiang H. OmniControl: control any joint at any time for human motion generation. arXiv e-prints. 2023;2310.08580. Available from: https://doi.org/10.48550/arXiv.2310.08580

Ahn H, Ha T, Choi Y, Yoo H, Oh S. Text2Action: generative adversarial synthesis from language to action. arXiv e-prints. 2017;1710.05298. Available from: https://doi.org/10.48550/arXiv.1710.05298

Messina N, Sedmidubsky J, Falchi F, Rebok T. Text-to-motion retrieval: towards joint understanding of human motion data and natural language. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’23. New York: Association for Computing Machinery; 2023;2420–2425. Available from: https://doi.org/10.1145/3539618.3592069

Ahuja C, Morency LP. Language2Pose: natural language grounded pose forecasting. arXiv e-prints. 2019;1907.01108. Available from: https://doi.org/10.48550/arXiv.1907.01108

Zhang J, Zhang Y, Cun X, Zhang Y, Zhao H, Lu H, Shen X, Shan Y. Generating human motion from textual descriptions with discrete representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023;14730–14740. Available from: https://openaccess.thecvf.com/content/CVPR2023/papers/Zhang_Generating_Human_Motion_From_Textual_Descriptions_With_Discrete_Representations_CVPR_2023_paper.pdf

Athanasiou N, Petrovich M, Black MJ, Varol G. TEACH: temporal action composition for 3D humans. arXiv e-prints. 2022;2209.04066. Available from: https://doi.org/10.48550/arXiv.2209.04066

Poole B, Jain A, Barron JT, Mildenhall B. DreamFusion: text-to-3D using 2D diffusion. arXiv e-prints. 2022;2209.14988. Available from: https://doi.org/10.48550/arXiv.2209.14988

Xu L, Song Z, Wang D, Su J, Fang Z, Ding C, Gan W, Yan Y, Jin X, Yang X. Actformer: a GAN-based transformer towards general action-conditioned 3D human motion generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023;2228–2238. Available from: https://doi.org/10.48550/arXiv.2203.07706

Ma F, Xia G, Liu Q. Spatial consistency constrained GAN for human motion transfer. IEEE Trans Circuits Syst Video Technol. 2021;32(2):730–742.

Kundu JN, Gor M, Babu RV. Bihmp-gan: bidirectional 3D human motion prediction GAN. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019;33:8553–8560. Available from: https://doi.org/10.1609/aaai.v33i01.33018553

Petrovich M, Black MJ, Varol G. Action-conditioned 3D human motion synthesis with transformer VAE. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021;10985–10995. Available from: https://openaccess.thecvf.com/content/ICCV2021/papers/Petrovich_Action-Conditioned_3D_Human_Motion_Synthesis_With_Transformer_VAE_ICCV_2021_paper.pdf

Bie X, Guo W, Leglaive S, Girin L, Moreno-Noguer F, Alameda-Pineda X. Hit-dvae: human motion generation via hierarchical transformer dynamical VAE. arXiv preprint. 2022;2204.01565. Available from: https://doi.org/10.48550/arXiv.2204.01565

Kim H, Kong K, Kim JK, Lee J, Cha G, Jang HD, Wee D, Kang SJ. Enhanced control of human motion generation using action-conditioned transformer VAE with low-rank factorization. IEIE Trans Smart Process Comput. 2024;13(6):609–621. Available from: https://ieiespc.org/ieiespc/XmlViewer/f434509

Zhang M, Cai Z, Pan L, Hong F, Guo X, Yang L, Liu Z. MotionDiffuse: text-driven human motion generation with diffusion model. arXiv e-prints. 2022;2208.15001. Available from: https://doi.org/10.48550/arXiv.2208.15001

Wan W, Dou Z, Komura T, Wang W, Jayaraman D, Liu L. TLControl: trajectory and language control for human motion synthesis. arXiv e-prints. 2023;2311.17135. Available from: https://doi.org/10.48550/arXiv.2311.17135

Karunratanakul K, Preechakul K, Suwajanakorn S, Tang S. Guided motion diffusion for controllable human motion synthesis. arXiv e-prints. 2023;2305.12577. Available from: https://doi.org/10.48550/arXiv.2305.12577

Dai W, Chen LH, Wang J, Liu J, Dai B, Tang Y. MotionLCM: real-time controllable motion generation via latent consistency model. arXiv e-prints. 2024;2404.19759. Available from: https://doi.org/10.48550/arXiv.2305.12577

Zhang L, Rao A, Agrawala M. Adding conditional control to text-to-image diffusion models. 2023. Available from: https://arxiv.org/abs/2302.05543

Dabral R, Hamza Mughal M, Golyanik V, Theobalt C. MoFusion: a framework for denoising-diffusion-based motion synthesis. arXiv e-prints. 2022;2212.04495. Available from: https://doi.org/10.48550/arXiv.2212.04495

Ma J, Bai S, Zhou C. Pretrained diffusion models for unified human motion synthesis. arXiv e-prints. 2022;2212.02837. Available from: https://doi.org/10.48550/arXiv.2212.02837

Zhao M, Liu M, Ren B, Dai S, Sebe N. Modiff: action-conditioned 3D motion generation with denoising diffusion probabilistic models. arXiv e-prints. 2023;2301.03949. Available from: https://doi.org/10.48550/arXiv.2301.03949

Nichol A, Dhariwal P, Ramesh A, Shyam P, Mishkin P, McGrew B, Sutskever I, Chen M. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. arXiv e-prints. 2021;2112.10741. Available from: https://doi.org/10.48550/arXiv.2112.10741

Popov V, Vovk I, Gogoryan V, Sadekova T, Kudinov M. Grad-TTS: a diffusion probabilistic model for text-to-speech. arXiv e-prints. 2021;2105.06337. Available from: https://doi.org/10.48550/arXiv.2105.06337

Xu J, Wang X, Cheng W, Cao YP, Shan Y, Qie X, Gao S. Dream3D: zero-shot text-to-3D synthesis using 3D shape prior and text-to-image diffusion models. arXiv e-prints. 2022;2212.14704. Available from: https://doi.org/10.48550/arXiv.2212.14704

He K, Chen X, Xie S, Li Y, Dollár P, Girshick R. Masked autoencoders are scalable vision learners. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022;15979–15988. Available from: https://doi.org/10.1109/CVPR52688.2022.01553

Kundu JN, Seth S, Jampani V, Rakesh M, Babu RV, Chakraborty A. Self-supervised 3D human pose estimation via part guided novel image synthesis. 2020. Available from: https://arxiv.org/abs/2004.04400

Pinyoanuntapong E, Usama Saleem M, Karunratanakul K, Wang P, Xue H, Chen C, Guo C, Cao J, Ren J, Tulyakov S. ControlMM: controllable masked motion generation. arXiv e-prints. 2024;2410.10780. Available from: https://doi.org/10.48550/arXiv.2410.10780

Three-Dimensional Kinematics and Kinetics. In: John Wiley & Sons, Ltd. 2009;176–199. Chapter 7. Available from: https://doi.org/10.1002/9780470549148.ch7

Yuan Y, Song J, Iqbal U, Vahdat A, Kautz J. PhysDiff: physics-guided human motion diffusion model. 2023. Available from: https://arxiv.org/abs/2212.02500

Zhou H, Guo C, Zhang H, Wang Y. Learning multiscale correlations for human motion prediction. 2021. Available from: https://arxiv.org/abs/2103.10674

Shao D, Shi M, Xu S, Chen H, Huang Y, Wang B. FinePhys: fine-grained human action generation by explicitly incorporating physical laws for effective skeletal guidance. 2025. Available from: https://arxiv.org/abs/2505.13437

Rempe D, Luo Z, Bin Peng X, Yuan Y, Kitani K, Kreis K, Fidler S, Litany O. Trace and pace: controllable pedestrian animation via guided trajectory diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023;13756–13766. Available from: https://doi.org/10.48550/arXiv.2304.01893

Zhang L, Rao A, Agrawala M. Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023; 3836–3847. Available from: https://doi.org/10.48550/arXiv.2302.05543

Guo C, Zou S, Zuo X, Wang S, Ji W, Li X, Cheng L. Generating diverse and natural 3D human motions from text. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022;5142–5151. Available from: https://ieeexplore.ieee.org/document/9880214

Mahmood N, Ghorbani N, Troje NF, Pons-Moll G, Black MJ. AMASS: archive of motion capture as surface shapes. arXiv e-prints. 2019;1904.03278. Available from: https://doi.org/10.48550/arXiv.1904.03278

Guo C, Zuo X, Wang S, Zou S, Sun Q, Deng A, Gong M, Cheng L. Action2Motion: conditioned generation of 3D human motions. arXiv e-prints. 2020;2007.15240. Available from: https://doi.org/10.1145/3394171.3413635