Artificial intelligence-generated content recognition startup GPTZero has reviewed all 4,841 accepted papers at the prestigious NeurIPS neural information processing conference held in San Diego last month. According to this company, a total of 100 fictitious references were found in 51 articles, all of which are fake.
Getting a paper accepted in NeurIPS is an achievement in the world of artificial intelligence that can easily be highlighted on a resume. Considering that the authors of these articles are among the pioneers of research in artificial intelligence, it may be expected that they use large language models for the exhausting and boring work of setting up sources and references.
However, this finding should be viewed with caution: 100 verified fictitious references in 51 articles are not statistically significant. Each article has dozens of references; Therefore, among tens of thousands of references, this number is statistically practically close to zero.
It is also important to note that the presence of an incorrect reference does not necessarily question the validity of the research paper. As NeurIPS told Fortune magazine—which was the first media outlet to report on GPTZero’s research—even if 1.1 percent of papers have one or more miscitations due to the use of large language models, the content of the papers themselves is not necessarily invalid.
Despite all these explanations, referral falsification is not a trivial issue. NeurIPS prides itself on its rigorous standards for scientific publishing in machine learning and artificial intelligence. In addition, each article is peer-reviewed by several reviewers, who are instructed to report instances of hallucinations and similar errors.
Scientific references are also considered a kind of “unit of value” for researchers. They are used as career indicators to show how effective a researcher’s work has been among his peers. When the AI makes these referrals, their value is effectively devalued.
Given the sheer volume of articles, the peer reviewers can’t be blamed for not spotting a few fake AI-generated references; GPTZero himself is quick to point out the same. The company says in its report that the purpose of this review was to provide specific data about how low-quality content produced by artificial intelligence, in the form of a huge flood of articles, enters scientific processes; The flood that “has pressed the arbitration infrastructures of these conferences to the point of collapse”. GPTZero even points to a May 2025 paper titled “The Crisis of Peer Review at AI Conferences” that explored this problem at top-tier conferences, including NeurIPS.
However, the main question remains: Why couldn’t the researchers themselves check and validate the work of the language model? Obviously, they should know the actual list of articles they used in their research.
Ultimately, the whole thing boils down to one grand yet ironic conclusion: If the world’s foremost AI experts can’t vouch for the accuracy of their use of linguistic models in detail, even with their professional credibility at stake, what does that mean for the rest of us?
RCO NEWS



