Download PDFOpen PDF in browserOptical Character Recognition for a Redaction System Using Machine Learning Techniques.EasyChair Preprint 349512 pages•Date: May 28, 2020AbstractThis paper presents the use of OCR in an automatic Redaction System. A Redactor is a system which takes in any electronic document as an input from the user and identifies sensitive information, mainly nouns, such as: Person name, country name, gender, credit card information, phone numbers, email id, any confidential information that is to be not shown to the end user who the document is to be sent to. Initially, the user inputs a document, probably an image. This image is then pre-processed and put into the OCR which extracts the text out of the image. Hence, to be able to identify the sensitive information the very first step is to extract the information. A major application of an OCR is Redaction. Reading of information present in the documents can be read with the help of an OCR Machine. Keyphrases: Named Entity Recognition, Natural Language Processing, Optical Character Recognition, machine learning
|