Download PDFOpen PDF in browser

Extracting Semantic Entity Triplets by Leveraging LLMs

EasyChair Preprint 14914

3 pagesDate: September 16, 2024

Abstract

As Large Language Models (LLMs) become increasingly powerful and accessible, there is a rise in concerns regarding the automatic generation of academic papers. Several instances of undeniable usage of LLMs in reputable journals have been reported. Probably significantly more articles were partially or entirely written by LLMs but have not yet been detected, posing a threat to the veracity of academic journals.

The current consensus among researchers is that detecting LLM-generated text is ineffective or easy to evade in a general setting. Therefore, we explore an alternative approach, targeting the stochastic nature of LLMs. As LLMs are stochastic text generators, hallucinations in long texts are a persistent problem, and the generated output regularly contains counterfactual components. Semantic entity triplets can be used to assess a text's factual accuracy and filter the publication corpus accordingly.

Previous work has built a classical triplet-extraction pipeline based on spaCy. However, the limitation of this method is the retrieval of relatively few triplets that tend to be overly generic, to the point of being domain-agnostic. We overcome these limitations by applying few-shot prompting on the recently released Meta-Llama-3-8B-Instruct. The results show we can extract more triplets per paragraph than the classical extraction method. Moreover, we show that the triplets are more specific and find no evidence of hallucination when comparing the extracted subjects and objects to the original reference text.

Keyphrases: Natural Language Processing, Noun Extraction, entity extraction, large language models, machine learning

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:14914,
  author    = {Alexander Sternfeld and Andrei Kucharavy and Dimitri Percia David and Julian Jang-Jaccard and Alain Mermoud},
  title     = {Extracting Semantic Entity Triplets by Leveraging LLMs},
  howpublished = {EasyChair Preprint 14914},
  year      = {EasyChair, 2024}}
Download PDFOpen PDF in browser