Download PDFOpen PDF in browserArtificial Intelligence in Data Analysis for Open-Source InvestigationsEasyChair Preprint 97776 pages•Date: February 26, 2023AbstractOpen source investigations are challenging due to the vast amounts of data to verify and the likelihood of encountering incorrect data. To address these issues, an agent needs a tool that can process data in real-time and simplify the processing of large volumes of information. This can be achieved by integrating a GPT model that learns from the data it processes. The study introduces an OSINT platform that uses a GPT model to enhance the efficiency of open source investigations. The OSINT platform developed by us organizes data in a hierarchical graph, with a particularly interesting feature: the integration of a GPT model, which allows the user to process large data faster and more easily. To communicate with this GPT model, the user may chat with a virtual agent in natural language to give data processing commands. The study assessed different natural language processing models, including BERT and GPT models, and focused on the benefits of pretraining, fine-tuning, and generative models for open source investigations. GPT models have an advantage in pretraining, allowing them to capture complex relationships between words and phrases. This pretraining makes the models customizable for specific tasks, providing investigators with a powerful tool for analyzing text data. The generative nature of GPT models is a key advantage for OSINT investigations, as it allows the model to generate human-like text for analyzing data. Fine-tuning is also critical, as it enables investigators to train the model on specific topics and customize it to their needs. By using natural language processing models in open source investigations, investigators can generate more accurate and reliable results while reducing the time and effort required for data analysis. Overall, this work highlights the importance of incorporating natural language processing models in OSINT investigations and provides a foundation for future research in this field. Keyphrases: Artificial Intelligence, BERT, GPT, Open Source Intelligence, information
|