Projects per year
Abstract
Recent text-to-video diffusion models have achieved impressive progress. In practice, users often desire the ability to control object motion and camera movement independently for customized video creation. However, current methods lack the focus on separately controlling object motion and camera movement in a decoupled manner, which limits the controllability and flexibility of text-to-video models. In this paper, we introduce Direct-a-Video, a system that allows users to independently specify motions for multiple objects as well as camera's pan and zoom movements, as if directing a video. We propose a simple yet effective strategy for the decoupled control of object motion and camera movement. Object motion is controlled through spatial cross-attention modulation using the model's inherent priors, requiring no additional optimization. For camera movement, we introduce new temporal cross-attention layers to interpret quantitative camera movement parameters. We further employ an augmentation-based approach to train these layers in a self-supervised manner on a small-scale dataset, eliminating the need for explicit motion annotation. Both components operate independently, allowing individual or combined control, and can generalize to open-domain scenarios. Extensive experiments demonstrate the superiority and effectiveness of our method. © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
| Original language | English |
|---|---|
| Title of host publication | SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers |
| Publisher | Association for Computing Machinery |
| ISBN (Print) | 9798400705250 |
| DOIs | |
| Publication status | Published - Jul 2024 |
| Event | 51st International Conference & Exhibition on Computer Graphics & Interactive Techniques (SIGGRAPH 2024) - Colorado Convention Center, Denver, United States Duration: 28 Jul 2024 → 1 Aug 2024 https://s2024.siggraph.org/ |
Publication series
| Name | Proceedings - SIGGRAPH Conference Papers |
|---|
Conference
| Conference | 51st International Conference & Exhibition on Computer Graphics & Interactive Techniques (SIGGRAPH 2024) |
|---|---|
| Place | United States |
| City | Denver |
| Period | 28/07/24 → 1/08/24 |
| Internet address |
Bibliographical note
Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).Funding
This work was supported by a GRF grant (Project No. CityU 11208123) from the Research Grants Council (RGC) of Hong Kong, and research funding from Kuaishou Technology.
Research Keywords
- diffusion model.
- motion control
- Text-to-video generation
RGC Funding Information
- RGC-funded
Fingerprint
Dive into the research topics of 'Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion'. Together they form a unique fingerprint.Projects
- 1 Active
-
GRF: Text-to-3D Generation and Manipulation with Neural Radiance Field Representation
LIAO, J. (Principal Investigator / Project Coordinator)
1/01/24 → …
Project: Research