FateZero
This zero-shot video editing framework uses pre-trained diffusion models for text-based modifications, preserving video structure and motion by using intermediate attention maps. It enhances consistency with spatial-temporal attention, offering style and attribute changes in videos. The method allows for shape-aware adjustments, as demonstrated through empirical evaluations.