20220930论文报告-An OCR Post-Correction Approach Using DeepLearning for Processing Medical Reports

当前位置: 首页 2014贵州省先进计算与医疗信息服务工程实验室通知公告

报告题目：An OCR Post-Correction Approach Using DeepLearning for Processing Medical Reports

论文出处：IEEE Transactions on Circuits and Systems for Video Technology

作者：Srinidhi Karthikeyan , Alba G. Seco de Herrera , Faiyaz Doctor , Senior Member, IEEE,Asim Mirza

报告人：连敏

报告时间：2022年10月6日下午 2:30

报告地点：贵州大学北校区博学楼624室

摘要：According to a recent Deloitte study, the COVID-19 pandemic continues to place a huge strain on the global health care sector. Covid-19 has also catalysed digital transformation across the sector for improving operational efficiencies. As a result, the amount of digitally stored patient data such as discharge letters, scan images, test results or free text entries by doctors has grown signiﬁcantly. In 2020, 2314 exabytes of medical data was generated globally. This medical data does not conform to a generic structure and is mostly in the form of unstructured digitally generated or scanned paper documents stored as part of a patient’s medical reports. This unstructured data is digitised using Optical Character Recognition (OCR) process. A key challenge here is that the accuracy of the OCR process varies due to the inability of current OCR engines to correctly transcribe scanned or handwritten documents in which text may be skewed, obscured or illegible. This is compounded by the fact that processed text is comprised of speciﬁc medical terminologies that do not necessarily form part of general language lexicons. The proposed work uses a deep neural network based self-supervised pre-training technique: Robustly Optimized Bidirectional Encoder Representations from Transformers (RoBERTa) that can learn to predict hidden (masked) sections of text to ﬁll in the gaps of non-transcribable parts of the documents being processed. Evaluating the proposed method on domain-speciﬁc datasets which include real medical documents, shows a signiﬁcantly reduced word error rate demonstrating the effectiveness of the approach.

【关闭本页】　【返回顶部】