Adaptive Search Engine for Heterogeneous Documents

EasyChair Preprint 4685

2 pages•Date: December 1, 2020

Oussama Ayoub, Christophe Rodrigues and Nicolas Travers

Abstract

Providing an efficient search engine for legal actors querying for textual documents is a challenging objective. Nowadays most engines target semantic analysis on top of text queries to enhance the relevance. But the legal context relies mainly on heterogeneous data in terms of both queries and documents length, structural complexity, and queries context. This combination makes standard solutions hardly scalable or adaptable. The proposed solution is an adaptive approach that aims to be applied to any textual database establishing a search engine. Its peculiarity is to normalize documents by producing fragments, enriching them with word embedding, here summarizing, and rebuilding documents through similarity aggregations on either enriched content, structure and context. By integrating our solution in Elasticsearch we ensure the flexibility and the fine-tuning of both words embedding and similarities.

Keyphrases: Information Retrieval, Natural Language Processing, search engine

Links:

https://easychair.org/publications/preprint/wFSS

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:4685,
  author    = {Oussama Ayoub and Christophe Rodrigues and Nicolas Travers},
  title     = {Adaptive Search Engine for Heterogeneous Documents},
  howpublished = {EasyChair Preprint 4685},
  year      = {EasyChair, 2020}}

Download PDF Open PDF in browser