On Tuesday at Google I/O 2024, Google announced Veo, a new AI video-synthesis model that can create HD videos from text, image, or video prompts, similar to OpenAI’s Sora. It can generate 1080p videos lasting over a minute and edit videos from written instructions, but it has not yet been released for broad use.

  • @mindbleach@sh.itjust.works
    link
    fedilink
    English
    11 month ago

    it can generate 1080p videos lasting over a minute

    Any length limit is a sign you’re doing it wrong. You don’t need every single frame in-memory at the same time to figure out what any specific frame should look like. Local frames matter for fine changes. Further frames matter for continued movement. Distant frames matter for continuity. It should be possible to scan across an arbitrarily long sequence and gradually remove its flaws.

    … though admittedly once you get to about five minutes, you’ve covered nearly one hundred percent of all shots in film and television. One minute is already long enough for a human editor to work with. (And evidently people hate the idea of a robot churning out a whole finished product, cuts and all.) But if the network only need a few seconds at a time, it’ll be faster to train and easier to run.