A new verification protocol developed at Binghamton University could help reduce hallucinations in AI-generated biomedical information. By combining retrieval-augmented generation with majority voting among seven large language models, the workflow produced matched medical terminology without unmatched or fabricated terms in more than 10,000 experiments.
New research from Binghamton University improves the accuracy of generative artificial intelligence to diagnose medical conditions.
As chatbots powered by artificial intelligence become more ingrained in our everyday lives, people are increasingly using them to help diagnose their medical concerns.
Should I be worried about this rash? What if this insect bite gets infected? Is this pain the symptom of a larger problem? When dealing with someone’s health, the answers need to be as accurate as possible.
Last year, Binghamton University researchers tested Open AI’s ChatGPT, and it showed high accuracy in identifying disease terms, drug names, and genetic information. However, the AI bot also generated a high number of false “hallucinations.”
A follow-up study funded by a $100,000 grant from New York state’s Empire AI Consortium may have found a way to eliminate that confidently delivered but fake information.
Ahmed Abdeen Hamed — a research fellow for the Thomas J. Watson College of Engineering and Applied Science’s School of Systems Science and Industrial Engineering — collaborated with George J. Klir Professor of Systems Science Luis M. Rocha to develop an innovative verification method, and the journal STAR Protocols recently published their conclusions.
The new protocol harnesses the growing number of open-source AI options, each of which has a different way to arrive at an answer to an inquiry. Hamed and Rocha chose seven of these large language models and forced them to use retrieval-augmented generation (RAG), which required them to reference an authoritative database of medical terminology before giving a response.
Over 10,000 experiments, the seven chatbots all received the same plain-language symptoms, and each of them came up with what it thought were the medical terms for them, complete with an official identification number. Then the bots put the answers up for a “vote.”
The result: 76.85% of the answers were supported by at least four LLMs, and the remaining 23.15% were supported by at least two. No unmatched terms — and no hallucinations.
“The new workflow is incredible,” Hamed said, “because it can verify anything from a biomedical point of view — biological knowledge with disease and genetics, translational knowledge from diseases to treatments and clinical trials, and also from a healthcare point of view with symptoms and treatments.”
A big advantage of this new protocol is that it can be reproduced in a near-infinite number of permutations to reinforce its accuracy.
“There can be 100 large language models that are open source, and every time we can perform an experiment with seven LLMs selected at random from that list,” Hamed said. “When we perform the experiment many, many times, we increase the confidence in the voting.”
Rocha said the protocol is an important step toward increasing confidence in large multiscale network models of disease, which is a key topic for his Complex Adaptive Systems and Computational Intelligence Lab at Binghamton.
Among the research is the development of “digital twins” for precision medicine. These dynamic, virtual replicas of physical processes are continuously updated using AI and real-time data to create precise, predictive simulations of human reactions, so that healthcare providers can optimize outcomes before real-world testing.
“For instance, the protocol can extract and provide multi-agent verification of evidence for an adverse drug reaction for a given medication that is available in clinical trials, the scientific literature, pharmacological databases, and even social media discourse,” Rocha said. “And it can assist in the extraction of evidence at multiple scales, from multiomics to epidemiological and behavioral data sources, which we have already started to pilot by building multi-layer models of ER+ breast cancer.”
Hamed hailed the input from his collaborator as essential: “The guidance from Professor Rocha was huge, from securing the grant to helping to decide the direction of where this research would go and coaching us to develop the protocols needed to make it all work.”
Although the study centered on biomedical applications, the Binghamton team’s discovery could be used to curb or eliminate other kinds of LLM hallucinations, such as fabricated legal citations, fake academic citations, or blatant historical errors.
Date: 08.12.2025
Naturally, we always handle your personal data responsibly. Any personal data we receive from you is processed in accordance with applicable data protection legislation. For detailed information please see our privacy policy.
Consent to the use of data for promotional purposes
I hereby consent to Vogel Communications Group GmbH & Co. KG, Max-Planck-Str. 7-9, 97082 Würzburg including any affiliated companies according to §§ 15 et seq. AktG (hereafter: Vogel Communications Group) using my e-mail address to send editorial newsletters. A list of all affiliated companies can be found here
Newsletter content may include all products and services of any companies mentioned above, including for example specialist journals and books, events and fairs as well as event-related products and services, print and digital media offers and services such as additional (editorial) newsletters, raffles, lead campaigns, market research both online and offline, specialist webportals and e-learning offers. In case my personal telephone number has also been collected, it may be used for offers of aforementioned products, for services of the companies mentioned above, and market research purposes.
Additionally, my consent also includes the processing of my email address and telephone number for data matching for marketing purposes with select advertising partners such as LinkedIn, Google, and Meta. For this, Vogel Communications Group may transmit said data in hashed form to the advertising partners who then use said data to determine whether I am also a member of the mentioned advertising partner portals. Vogel Communications Group uses this feature for the purposes of re-targeting (up-selling, cross-selling, and customer loyalty), generating so-called look-alike audiences for acquisition of new customers, and as basis for exclusion for on-going advertising campaigns. Further information can be found in section “data matching for marketing purposes”.
In case I access protected data on Internet portals of Vogel Communications Group including any affiliated companies according to §§ 15 et seq. AktG, I need to provide further data in order to register for the access to such content. In return for this free access to editorial content, my data may be used in accordance with this consent for the purposes stated here. This does not apply to data matching for marketing purposes.
Right of revocation
I understand that I can revoke my consent at will. My revocation does not change the lawfulness of data processing that was conducted based on my consent leading up to my revocation. One option to declare my revocation is to use the contact form found at https://contact.vogel.de. In case I no longer wish to receive certain newsletters, I have subscribed to, I can also click on the unsubscribe link included at the end of a newsletter. Further information regarding my right of revocation and the implementation of it as well as the consequences of my revocation can be found in the data protection declaration, section editorial newsletter.
“This protocol is a big step toward the democratization of knowledge verification,” Hamed said.
With this research, Hamed wraps up his fellowship at Binghamton University and transitions to a new role as a research associate professor at the University of Nebraska-Lincoln.
“Dr. Hamed’s period in our lab was most productive, not only in the rapid development of AI-driven workflows and publications, but in catalyzing new, creative ideas for all lab members,” Rocha said. “I cannot wait to see the amazing new research he will produce at the University of Nebraska—Lincoln.”
Hamed is grateful for the opportunities he received at Binghamton.
“Watson College provided an exceptional environment where I could fully develop and implement the forward‑looking research agenda I began during my time in Europe,” he said. “The direction I envisioned was still emerging there at the time, and the fellowship offered the right setting to advance it. I’m hopeful that the resulting peer‑reviewed publications can help shift perspectives and demonstrate how GenAI and LLMs can be used responsibly, constructively, and with genuine innovation."
Original Article: Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow; STAR Protocols; DOI:10.1016/j.xpro.2026.104533