Many experts in the field of artificial intelligence, such as “Ilya Satskiver”, one of the founders of OpenAI, say that the process of training artificial intelligence with old methods has reached its peak and more powerful models can no longer be developed with them. Now researchers at Google DeepMind say the outputs of “reasoning” models like o1 can be used as new AI training data sources.
According to a Business Insider report, all the useful data available on the Internet has been used to train artificial intelligence models. This process, known as pre-training, created many of the recent achievements of artificial intelligence, including ChatGPT, but recently the advancements in artificial intelligence have lost their momentum, and experts say that the pre-training period is nearing its end.
With big tech companies investing trillions of dollars in the technology, slowing down the progress of AI models can be daunting, but researchers say there’s a new way to train and develop AI models.
DeepMind researchers’ new method for training artificial intelligence
New models such as o1 and o3 from OpenAI use a new method to respond to user requests, which is called test-time or inference-time compute.
In this method, the artificial intelligence divides your requests into smaller parts and turns each one into a new prompt. Each stage requires the execution of a new request, which is known as the inference stage in artificial intelligence. This creates a chain of arguments in which each part of the problem is solved. The model does not go to the next step until it solves each part and can finally provide a better final answer.
According to published benchmarks, the new models often produce better outputs than the previous models, especially on math questions. The researchers say that these high-quality outputs can be the new training data; In other words, this huge new information can be injected into the training process of other artificial intelligence models to create an iterative self-improvement loop.
For example, if o1 model outputs are better than GPT-4, these new outputs can be used to train future AI models, or say o1 scores 90% on a particular AI benchmark, you can aggregate these responses and Give GPT-4 and that model will also reach 90% score.
Of course, it seems that some companies are using this method to develop their models. Researchers believe that this synthetic data is better than what is available on the Internet.
RCO NEWS