Efficient Machine Reading Comprehension for Health Care Applications: Algorithm Development and Validation of a Context Extraction Approach.

Publication date: Mar 25, 2024

Extractive methods for machine reading comprehension (MRC) tasks have achieved comparable or better accuracy than human performance on benchmark data sets. However, such models are not as successful when adapted to complex domains such as health care. One of the main reasons is that the context that the MRC model needs to process when operating in a complex domain can be much larger compared with an average open-domain context. This causes the MRC model to make less accurate and slower predictions. A potential solution to this problem is to reduce the input context of the MRC model by extracting only the necessary parts from the original context. This study aims to develop a method for extracting useful contexts from long articles as an additional component to the question answering task, enabling the MRC model to work more efficiently and accurately. Existing approaches to context extraction in MRC are based on sentence selection strategies, in which the models are trained to find the sentences containing the answer. We found that using only the sentences containing the answer was insufficient for the MRC model to predict correctly. We conducted a series of empirical studies and observed a strong relationship between the usefulness of the context and the confidence score output of the MRC model. Our investigation showed that a precise input context can boost the prediction correctness of the MRC and greatly reduce inference time. We proposed a method to estimate the utility of each sentence in a context in answering the question and then extract a new, shorter context according to these estimations. We generated a data set to train 2 models for estimating sentence utility, based on which we selected more precise contexts that improved the MRC model’s performance. We demonstrated our approach on the Question Answering Data Set for COVID-19 and Biomedical Semantic Indexing and Question Answering data sets and showed that the approach benefits the downstream MRC model. First, the method substantially reduced the inference time of the entire question answering system by 6 to 7 times. Second, our approach helped the MRC model predict the answer more correctly compared with using the original context (F-score increased from 0. 724 to 0. 744 for the Question Answering Data Set for COVID-19 and from 0. 651 to 0. 704 for the Biomedical Semantic Indexing and Question Answering). We also found a potential problem where extractive transformer MRC models predict poorly despite being given a more precise context in some cases. The proposed context extraction method allows the MRC model to achieve improved prediction correctness and a significantly reduced MRC inference time. This approach works technically with any MRC model and has potential in tasks involving processing long texts.

Open Access PDF

Concepts Keywords
Biomedical context extraction
Covid covid19
Extractive health care
Slower machine reading comprehension
Train question answering


Type Source Name
disease VO efficient
disease IDO algorithm
drug DRUGBANK Tropicamide
disease IDO process
disease MESH causes
disease VO time
disease VO data set
disease MESH COVID-19
drug DRUGBANK Coenzyme M
disease VO document
disease VO inefficient
disease VO effective
drug DRUGBANK Aspartame
drug DRUGBANK Lactic Acid
disease VO Bacteria
disease MESH influenza
disease VO vaccine
disease VO immunization
disease VO Viruses
disease MESH Fibrosis
disease IDO susceptibility
disease MESH Schistosoma Mansoni Infection
disease MESH cholera
disease IDO quality
drug DRUGBANK Darunavir
drug DRUGBANK Cobicistat
disease VO Tat
disease MESH fowlpox
disease MESH avian influenza
disease VO Fowlpox virus
disease MESH vaccinia
disease MESH anthrax
disease MESH hepatitis
disease MESH measles
pathway KEGG Measles
disease MESH malaria
pathway KEGG Malaria
disease MESH tuberculosis
pathway KEGG Tuberculosis
disease MESH Middle East respiratory syndrome
disease VO Yellow fever virus
disease MESH shock
disease IDO blood
disease VO nose
disease VO efficiency
disease MESH lung diseases
disease MESH infection
disease VO population
disease VO Respiratory syncytial virus
disease IDO contact tracing

Original Article

(Visited 1 times, 1 visits today)