Download PDFOpen PDF in browser

HyGen – a Hybrid Automation Testing Approach for Reducing Hallucination in LLM Based Applications

EasyChair Preprint 15699

21 pagesDate: January 10, 2025

Abstract

Subjective manual evaluation of hallucinations in Large Language Models (LLMs) remains a challenging task due to its time-consuming and labored character. It expects human expertise to analyze each LLL response critically to assess cases where the model produces wrong or misleading data. This manual approach is usually small-scale and logical in extent, bringing elements of judgment into the assessment equation. To solve these problems, we have proposed a novel hybrid test automation framework that integrates rules-based and model-based grading methods. The rule-based evaluation, done at the per-commit granularity during the continuous integration (CI) cycle, offers quick feedback on the exact hallucination patterns, while the model-based grading method is carried out during a increment release and is evaluated using a critique LLM. Since there is already known behaviors concerning hallucinations, the framework can readily encode these behaviors in a set of rules and thus be able to alert on potential issues. A more in-depth evaluation is carried out on the LLM by using model-graded evaluation, which is done after the release stage. The first approach focuses on training machine learning models to estimate the probability of hallucination from different characteristics of the LLM’s responses. Thus, using both of these methods in the context of our framework allows for the constant review of the potential Hallucination risks during each stage of the LLM expansion. This makes it possible to identify problems that might cause problems in the use of LLM-based applications early enough and taken necessary measures to address them before their manifestations, thus enabling LLM-based applications to be deployed with higher chances of working as expected.

Keyphrases: Continuous Integration, Hallucination, LLMOps, Large Language Models (LLMs), Model graded evaluation, Rule-based evaluation, test automation

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15699,
  author    = {Amit Chakraborty and Chirantana Mallick and Rajdeep Chakraborty and Saptarshi Das},
  title     = {HyGen – a Hybrid Automation Testing Approach for Reducing Hallucination in LLM Based Applications},
  howpublished = {EasyChair Preprint 15699},
  year      = {EasyChair, 2025}}
Download PDFOpen PDF in browser