As it can be seen, the speed of development of artificial intelligence is moving towards a point beyond human understanding, and in this context, the text-to-video conversion system of OpenAI is the latest artificial intelligence technology that has affected the world of technology and shows that what happens in The field of artificial intelligence is happening sooner than expected.
OpenAI, better known for its AI chatbot tool “Chat GPT” (Chat GPT), has unveiled its new AI tool called “Sora” for converting text to video in recent weeks. This tool is not yet publicly available and only limited testers have the opportunity to use it, but as the published videos show, it is a significant improvement over other tools for making videos from text, and the final videos produced with it look completely realistic. they arrive A topic that can be exciting and worrying at the same time.
8,170,000
7,900,000
Toman
10,700,000
Toman
What is OpenAI Sora?
Like other artificial intelligence generating tools such as DALL-E and MidJourney, Sora's AI-based tool takes text messages from you and turns them into a visual image. But unlike the other mentioned AI-based image generators, Sora is a complete video clip with motion, different camera angles, direction, and everything else you'd expect from a traditionally produced video.
Looking at the samples on Sora's website, the end results are often indistinguishable from real, professionally produced video. The comparison includes everything from expensive drone footage to multi-million dollar productions complete with AI-generated actors and special effects.
Of course, Sora is not the first technology to produce video from text, and so far the most prominent example in this field was RunwayML, which offers its services to the public for a fee. However, even at their best, runway videos look more like the early generations of midcentury stills. There's no image stabilization, the physics in it don't make sense, and currently the longest clip it offers is 16 seconds.
Lumiere, released a few weeks ago, claimed to produce better videos than its predecessors. But Sora seems to be more powerful than Lumiere at least in some cases. This technology can produce videos with a maximum resolution of 1920 x 1080 pixels and various aspect ratios.
The best output that Sora offers is quite stable, the physics presented in it appear true to the human mind, and the clips can be up to a minute long. The videos generated by Sora are silent, but there are other AI systems that can generate music, sound effects, and speech for you to put over your AI-generated videos.
As such, the huge leap Sora has made over previous generations of AI-generated videos cannot be ignored. It was only a year ago that artificial intelligence was producing completely unrealistic videos, but now with the preview of Sora, a big shock has been created for the activists in the field of visual arts. Sora is likely to affect the entire video industry, from single-person video makers to the level of big-budget projects like Disney and Marvel. This may be the true beginning of the synthetic film industry.
How does Sora work?
As much as possible, we will review the process of producing images by Sora, but we cannot go into details. First, because OpenAI doesn't talk about the inner workings of its technology. All of this is proprietary, so the details of the secret technology that sets Sora apart from the competition are not known. Second, the details of the subject may not be interesting and understandable to the general public and only a computer scientist would understand, so we can only understand how this technology works in general.
Fortunately, “Mike Young” (Mike Young) has provided a complete explanation of Sora's technology on Medium, based on an OpenAI technical report, and here we review the most important points.
Sora is built on the lessons that companies like OpenAI learned when creating technologies like ChatGPT or DALL-E. Sora generates videos by dividing those videos into segments that are similar to the tokens used in the GPT chat training model. Because these tokens are all the same size, things like clip length, aspect ratio, and resolution don't matter to Sora.
In fact, this text-to-video generator uses the same extensive transformation approach that is used in other AI language transformation models such as Chat GPT and Google Gemini. These transformers were first introduced by Google in 2017. While Transformers was originally designed to find patterns in the symbols that represent text, Sora now uses symbols that represent small chunks of space and time.
During training, Sora looks at noisy and partially sparse patch tokens of a video and tries to predict a clean, noise-free token. By comparing it with a ground truth, the linguistic model learns the video and puts these images together to form a complete video. It is based on this practice and training that the examples on the Sora website look very authentic and real.
Aside from this remarkable ability, Sora also has very detailed annotations for the video frames it's trained on, which is a big part of why the tool can modify generated videos based on text requests.
Sora's ability to accurately simulate the physics in videos appears to be an emerging feature, stemming solely from training on millions of videos that contain motion based on real-world physics. Object persistence is very good in Sora, and even when objects move out of the frame or are temporarily obscured by something else in the frame, they stay in the background and reappear without cluttering the camera angle.
However, sometimes when things in the video are connected, it has problems understanding the reason for the connection to generate the next image, including reproducing the objects. Also, Sora seems to sometimes confuse left and right. However, what has been shown so far in the introduction of Sora's power is not only usable right now, it's quite advanced.
When will you have access to Sora?
We're all very excited to use Sora, and we'll definitely be writing more about how useful this technology can be in the future. But when will this happen?
It's not yet clear exactly how long it will take for Sora to become available to the public, or how much it will cost. According to OpenAI, the technology is in the hands of the Red Team, a group of people tasked with trying to get Sora to do all the wrong things it shouldn't, and then helping to put in place protections against such things. which may be requested by real customers. These include the potential to create false information, create offensive or violent content, and many other conceivable abuses.
In addition, some select content creators already have access to it, which appears to be both for testing purposes and to receive various third-party comments and approvals that could eventually lead to its final release. As a result, the public release time of Sora is not yet known. That's because if it's in the hands of safety testers now, problems may be discovered that take longer to fix than expected, thus delaying its public release.
The fact that OpenAI feels ready to show off Sora's capabilities, and has even received public requests through X for AI video generation, means the company thinks the quality of the final product is almost ready, but until the image As long as there is better public opinion on this and safety issues are raised and discovered, no one can say for sure when it will be released. At the same time, we can expect this technology to be released on Sora's website in the coming months, not years, but probably not next week!
Potential applications of text to video conversion
Currently, video content is produced either by shooting real-world footage or using special effects, both of which can be costly and time-consuming. But if Sora becomes available to the public at a reasonable price, people can use it as a prototyping software to visualize ideas at a much lower cost. Based on what we know of Sora's capabilities, it can even be used to create short videos for some applications in entertainment, advertising, and education.
OpenAI's technical paper on Sora is published under the title “Video Production Models as World Simulators”. The article argues that larger versions of video generators like Sora may be “capable simulators of the physical and digital worlds, and the objects and animals and people that inhabit them.”
If this is true, future versions may have scientific applications for physical, chemical, and even social experiments. For example, it may be possible to investigate and test the impact of tsunamis of different dimensions on various types of infrastructure and the physical and mental health of people near the affected areas.
However, achieving this level of simulation is extremely challenging, and some experts argue that a system like Sora is fundamentally incapable of doing so. A complete simulator must account for physical and chemical reactions at the most precise levels of the universe. However, approximating the world and making videos realistic to the human eye may be readily available in the coming years.
Ethical risks and concerns
The main concerns about tools like Sora revolve around their social and ethical impact. In a world already plagued by misinformation, tools like Sora may make things even worse.
It's easy to see how the ability to produce real video of any conceivable scene could be used to spread convincing fake news or cast doubt on real footage. It may endanger public health measures, be used to influence society, or even challenge judicial systems with potentially false evidence.
Video producers may also pose direct threats to target individuals by producing deepfakes, especially immoral ones. Such activities may have dire consequences on the lives of the affected individuals and their families.
Beyond these concerns, there are issues related to copyright and intellectual property. AI generators require large amounts of data for training, and OpenAI has not disclosed where Sora's training data came from.
“Large language models” (LLM) and image generators have also been criticized for this reason. In America, a group of famous authors have sued OpenAI for possible misuse of their content. The case argues that the big language models and the companies that use them are stealing the work of writers to create new content.
This is not the first time in social memory that technology moves ahead of the law. For example, the issue of social media platforms' obligations to moderate content has generated heated debates in recent years, most of which revolve around Section 230 of the United States Code.
While these concerns are real and worth investigating, based on past experience, they are not expected to stop the development of video production technology. As noted, OpenAI has taken several important safety steps before making Sora publicly available, including working with experts to “prevent misinformation, hateful content, and bias” and “building tools to help identify misleading content.” “doer” cooperates.
Conclusion
Sora, the new OpenAI product, represents a significant advance in artificial intelligence technology and once again reminds us that the speed of progress in this field is far beyond what we imagine. This AI model can now create videos from textual descriptions that are very difficult and sometimes impossible to distinguish from real footage.
Like it or not, we are inevitably standing on the edge of a new era of innovation in the world of technology, and now is the moment when we should try to take control of artificial intelligence and use it in a positive way instead of confronting and fearing technology. . Converting text to video, which can be much easier in the future, opens countless creative opportunities for filmmakers, content producers, digital artists and everyone to visualize creative ideas in the easiest way possible.
If human society takes up this challenge with wisdom and grace, powerful simulators like Sora can open up unimaginable vistas for visual storytelling, shaping countless diverse voices to tell stories previously unimaginable by humans or machines.
The real excitement of this new technology lies in its ability to empower everyone to share their unique views of the world. By intertwining the disciplines of artificial intelligence into the traditional filmmaking process, the ultimate message of art can be shared more widely than ever that despite our differences in what makes us laugh or cry, despite our dreams and anxieties, we are all still human. .
Sources: The Conversation, How to Geek, Christian Martinez, Light Works
8,170,000
7,900,000
Toman
10,700,000
Toman
Frequently asked questions about artificial intelligence converting text to video
When will Sora be released to the general public?
The exact time of Sora's public release is not yet known, and OpenAI is currently reviewing and testing it.
Will Sora be free?
Until the public release and announcements of OpenAI, it is not possible to talk about it definitively, but like GPT chat and Dal-e, it is expected that Sora will also be offered in basic versions for free.
Sora is the first text to video conversion technology?
no There have already been technologies based on artificial intelligence to produce video from text, one of the most important of which is Runway. But Sora offers impressive image quality that is very difficult to distinguish from real-world videos.
RCO NEWS