Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion

Shiyuan Yang, Liang Hou, Haibin Huang, Chongyang Ma, Pengfei Wan, Di Zhang, Xiaodong Chen, Jing Liao*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

18 Citations (Scopus)

Abstract

Recent text-to-video diffusion models have achieved impressive progress. In practice, users often desire the ability to control object motion and camera movement independently for customized video creation. However, current methods lack the focus on separately controlling object motion and camera movement in a decoupled manner, which limits the controllability and flexibility of text-to-video models. In this paper, we introduce Direct-a-Video, a system that allows users to independently specify motions for multiple objects as well as camera's pan and zoom movements, as if directing a video. We propose a simple yet effective strategy for the decoupled control of object motion and camera movement. Object motion is controlled through spatial cross-attention modulation using the model's inherent priors, requiring no additional optimization. For camera movement, we introduce new temporal cross-attention layers to interpret quantitative camera movement parameters. We further employ an augmentation-based approach to train these layers in a self-supervised manner on a small-scale dataset, eliminating the need for explicit motion annotation. Both components operate independently, allowing individual or combined control, and can generalize to open-domain scenarios. Extensive experiments demonstrate the superiority and effectiveness of our method. © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Original languageEnglish
Title of host publicationSIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers
PublisherAssociation for Computing Machinery
ISBN (Print)9798400705250
DOIs
Publication statusPublished - Jul 2024
Event51st International Conference & Exhibition on Computer Graphics & Interactive Techniques (SIGGRAPH 2024) - Colorado Convention Center, Denver, United States
Duration: 28 Jul 20241 Aug 2024
https://s2024.siggraph.org/

Publication series

NameProceedings - SIGGRAPH Conference Papers

Conference

Conference51st International Conference & Exhibition on Computer Graphics & Interactive Techniques (SIGGRAPH 2024)
PlaceUnited States
CityDenver
Period28/07/241/08/24
Internet address

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Funding

This work was supported by a GRF grant (Project No. CityU 11208123) from the Research Grants Council (RGC) of Hong Kong, and research funding from Kuaishou Technology.

Research Keywords

  • diffusion model.
  • motion control
  • Text-to-video generation

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion'. Together they form a unique fingerprint.

Cite this