Natural language processing (NLP) has seen significant advances in the recent past, and is now being used for various tasks that once required a lot of human involvement. In this seminar, we will share our journey by applying NLP techniques to a robotic process automation problem.

In particular, we share our challenges automating the extraction of the Table of Contents from PDF research reports, a task which becomes challenging due to the heterogeneity of report formats, and many arbitrary constraints designed for humans to handle.

Using a combination of approaches from transfer learning with a Single Shot Detector model to rules, we highlight situations where such a hybrid approach might prove useful.

We will also cover how to break down and scope a complex set of data science requirements, and common pitfalls handling a dynamic set of requirements that were originally tailored for human operators.

Instructor's Bio

Jia Hui Chow, Data Scientist at Refinitiv

Currently a data scientist at Refinitiv, Jia Hui gained experience on NLP tasks like named entity recognition and relation extraction for both English and Chinese. She built a deep learning model released in an existing product, Eikon, and has also released a relation extraction model for a risk intelligence tool.

Melvin Perera, Data Scientist at Refinitiv

Currently a Data Scientist at Refinitiv, Melvin works to enhance and improve Refinitiv’s existing product offerings (TRIT, Eikon, World-Check) for Information Retrieval in the Asian markets using Natural Language Processing. He is currently working on Named Entity Recognition techniques to improve quality of News Document Tagging in Chinese.

