×

AI & ML

Sora Serves as a Foundation for Models that can Understand and Simulate the Real World: OpenAI

“The world is multimodal. If you think about the way we, as humans, process the world and engage with the world, we see things, we hear things, we say things—the world is much bigger than text.”

SMEBRFebruary 16, 19:31
Sora Serves as a Foundation for Models that can Understand and Simulate the Real World: OpenAI

Photo Credit (OpenAI)

OpenAI, which gained prominence last year with the widespread adoption of ChatGPT, is now expanding its artificial intelligence technology into the field of video production.

Introduced on Thursday, Sora represents OpenAI's latest generative AI model, functioning much like the company's image-generation AI tool, DALL-E. Users input their desired scenario, and Sora generates a high-definition video clip in response. Moreover, Sora can create video clips inspired by still images and can extend existing videos or fill in missing frames.

As chatbots and image generators have become commonplace in both consumer and business contexts, video creation emerges as the next frontier for generative AI. While this advancement offers exciting creative possibilities, it also raises concerns about misinformation, particularly with major political elections approaching globally. According to data from Clarity, a machine learning firm, the production of AI-generated deepfakes has increased by 900% year over year.

With Sora, OpenAI aims to compete with video-generation AI tools from companies like Meta and Google, the latter of which unveiled Lumiere in January. Similar AI solutions are also offered by startups such as Stability AI, with its product Stable Video Diffusion, and Amazon's Create with Alexa, specializing in generating prompt-based short-form animated content for children.

At present, Sora is limited to generating videos of one minute or less in duration. Backed by Microsoft, OpenAI is committed to multimodality—the integration of text, image, and video generation—as it expands its suite of AI models.

In the words of OpenAI COO Brad Lightcap, speaking to CNBC in November, “The world is multimodal. If you think about the way we, as humans, process the world and engage with the world, we see things, we hear things, we say things—the world is much bigger than text. So to us, it always felt incomplete for text and code to be the single modalities, the single interfaces that we could have to how powerful these models are and what they can do.”

To date, Sora has only been accessible to a select group of safety testers, known as 'red teamers,' tasked with identifying vulnerabilities such as misinformation and bias. OpenAI has not publicly demonstrated Sora beyond 10 sample clips available on its website, with an accompanying technical paper set to be released later on Thursday.

OpenAI also disclosed plans to develop a 'detection classifier' capable of identifying Sora-generated video clips, along with including specific metadata in its output to aid in identifying AI-generated content. This metadata aligns with Meta's efforts to identify AI-generated images during this election year.

Sora operates as a diffusion AI model, leveraging the Transformer architecture—originally introduced by Google researchers in a 2017 paper—similar to ChatGPT.

As stated by OpenAI in its announcement, “Sora serves as a foundation for models that can understand and simulate the real world.”