Model | Video Quality |
Temporal Quality |
Motion Quality |
Text Alignment |
Ethical Robustness |
Human Preference |
---|---|---|---|---|---|---|
Pre-training LRAs | ||||||
Gen2 | 3.33 (1) | 2.63 (1) | 2.03 (1) | 1.57 (1) | 1.36 (1) | 2.87 (1) |
Pika | 1.11 (2) | 1.71 (2) | 1.37 (2) | 1.03 (3) | 1.08 (3) | 1.21 (2) |
Latte | 0.67 (5) | 0.79 (5) | 0.84 (5) | 1.03 (4) | 1.00 (5) | 0.77 (5) |
TF-T2V | 0.76 (3) | 1.09 (3) | 1.01 (4) | 0.90 (5) | 1.06 (4) | 0.87 (4) |
Videocrafter2 | 0.72 (4) | 0.92 (4) | 1.06 (3) | 1.24 (2) | 1.12 (2) | 0.91 (3) |
Post-training LRAs | ||||||
Gen2 | 2.71 (1) | 2.37 (1) | 2.16 (1) | 2.71 (1) | 2.57 (1) | 2.96 (1) |
Pika | 1.16 (2) | 1.34 (2) | 1.24 (2) | 1.12 (3) | 1.18 (3) | 1.24 (2) |
Latte | 0.82 (5) | 0.89 (4) | 0.89 (4) | 1.43 (2) | 1.42 (2) | 0.89 (3) |
TF-T2V | 0.91 (3) | 1.00 (3) | 0.95 (3) | 0.82 (4) | 0.86 (4) | 0.85 (4) |
Videocrafter2 | 0.82 (4) | 0.83 (5) | 0.89 (5) | 0.68 (5) | 0.73 (5) | 0.76 (5) |
AMT Annotators | ||||||
Gen2 | 2.25 (1) | 2.29 (1) | 2.11 (1) | 2.76 (1) | 3.14 (1) | 2.73 (1) |
Pika | 1.09 (2) | 1.21 (2) | 1.23 (2) | 1.00 (3) | 0.82 (3) | 1.04 (2) |
Latte | 0.80 (5) | 0.88 (4) | 0.89 (4) | 1.40 (2) | 1.29 (2) | 0.87 (3) |
TF-T2V | 0.90 (3) | 0.88 (3) | 0.91 (3) | 0.71 (4) | 0.49 (4) | 0.71 (4) |
Videocrafter2 | 0.86 (4) | 0.76 (5) | 0.87 (5) | 0.51 (5) | 0.29 (5) | 0.56 (5) |
Post-training LRAs (Dyn) | ||||||
Gen2 | 2.75 (1) | 2.42 (1) | 2.30 (1) | 2.90 (1) | 2.66 (1) | 2.98 (1) |
Pika | 1.22 (2) | 1.46 (2) | 1.35 (2) | 1.21 (3) | 1.23 (3) | 1.31 (2) |
Latte | 0.86 (5) | 0.97 (4) | 0.92 (4) | 1.62 (2) | 1.53 (2) | 0.98 (3) |
TF-T2V | 0.92 (3) | 1.01 (3) | 1.00 (3) | 0.86 (4) | 0.91 (4) | 0.89 (4) |
Videocrafter2 | 0.87 (4) | 0.86 (5) | 0.88 (5) | 0.69 (5) | 0.76 (5) | 0.81 (5) |