Artificial ielligence-generated coe recognition startup GPTZero has reviewed all 4,841 accepted papers at the prestigious NeurIPS neural information processing conference held in San Diego last moh. According to this company, a total of 100 fictitious references were found in 51 articles, all of which are fake.
Getting a paper accepted in NeurIPS is an achieveme in the world of artificial ielligence that can easily be highlighted on a resume. Considering that the authors of these articles are among the pioneers of research in artificial ielligence, it may be expected that they use large language models for the exhausting and boring work of setting up sources and references.
However, this finding should be viewed with caution: 100 verified fictitious references in 51 articles are not statistically significa. Each article has dozens of references; Therefore, among tens of thousands of references, this number is statistically practically close to zero.
It is also importa to note that the presence of an incorrect reference does not necessarily question the validity of the research paper. As NeurIPS told Fortune magazine—which was the first media outlet to report on GPTZero’s research—even if 1.1 perce of papers have one or more miscitations due to the use of large language models, the coe of the papers themselves is not necessarily invalid.
Despite all these explanations, referral falsification is not a trivial issue. NeurIPS prides itself on its rigorous standards for scieific publishing in machine learning and artificial ielligence. In addition, each article is peer-reviewed by several reviewers, who are instructed to report instances of hallucinations and similar errors.
Scieific references are also considered a kind of “unit of value” for researchers. They are used as career indicators to show how effective a researcher’s work has been among his peers. When the AI makes these referrals, their value is effectively devalued.
Given the sheer volume of articles, the peer reviewers can’t be blamed for not spotting a few fake AI-generated references; GPTZero himself is quick to poi out the same. The company says in its report that the purpose of this review was to provide specific data about how low-quality coe produced by artificial ielligence, in the form of a huge flood of articles, eers scieific processes; The flood that “has pressed the arbitration infrastructures of these conferences to the poi of collapse”. GPTZero even pois to a May 2025 paper titled “The Crisis of Peer Review at AI Conferences” that explored this problem at top-tier conferences, including NeurIPS.
However, the main question remains: Why couldn’t the researchers themselves check and validate the work of the language model? Obviously, they should know the actual list of articles they used in their research.
Ultimately, the whole thing boils down to one grand yet ironic conclusion: If the world’s foremost AI experts can’t vouch for the accuracy of their use of linguistic models in detail, even with their professional credibility at stake, what does that mean for the rest of us?



