Effective Pharmacovigilance Using NLP

Pharmacovigilance (PV) is the “science and activities relating to the detection, assessment, understanding, and prevention of adverse effects or any other possible drug-related problems” (WHO, 2015). PV practices for most cases depend on analysing clinical trials, biomedical writing, observational examinations, Electronic Health Records (EHRs), social media and Spontaneous Reporting (SR). Pharmacovigilance plays a vital role in monitoring the Adverse Drug Reaction (ADR) caused due to single drug intake, combined dose as well as prolonged administration. ADR has led to an increase in the mortality rate by 1.8% throughout the world.

In developed countries, ADR is the fourth leading cause of death due to poor reporting of side effects after consuming drugs. The physician sometimes can unknowingly prescribe excess doses and unwarranted drug combinations. In a country like India, ADRs are scarcely ever reported. The reporting is also slow because half the population of patients depend on local drug stores and self-medication. In addition, sometimes physicians who hold alternative medical qualifications prescribe allopathic drugs that they are not supposed to or qualified to prescribe.

The challenges of establishing and maintaining progressively more complex pharmacovigilance (PV) systems in a globally diverse and evolving regulatory environment are increasing day by day. As more and more drugs receive regulatory approval, there is growing public awareness due to social media connectivity and media scrutiny. Pharmaceutical companies therefore need to manage PV activities more diligently and efficiently than ever.

 Research has shown that healthcare workers sometimes do not report ADRs due to complacency, insecurity, diffidence, indifference, ignorance, fear of medico-legal consequences and the lack of time to complete the formal diagnosis. Recently, many hospitals have also introduced the Electronic Health Records (EHRs).

Despite the availability of electronic healthcare data, there is no consensus on the best methods of identifying adverse reactions from these data sources. However, it is evident that EHRs hold the promise of active monitoring of ADR. Given the growth of textual data, it is becoming impossible for domain experts to manually curate the information contained within them in an efficient and timely way. Potentially vital information may remain hidden in a deluge of results that are returned when querying these sources. The difficulties in creating and maintaining comprehensive resources are highlighted in a recent survey of several frequently-used drug interaction resources, which found several discrepancies amongst the resources in terms of the scope of reactions covered, completeness of information about the reactions and consistency of information between the resources. Such inconsistencies could result in patient care being compromised.

In order to mitigate such issues, text mining (TM) techniques have proven to form the right basis for more efficient solutions. They have been used to detect information relevant to drug effects in a range of complementary information sources, including scientific literature, electronic health records and social media. That’s where an NLP pipeline can play a critical role – the incoming data from different feeds can be processed, and the vital link between diseases, medicines and their effects established.

The development of such tools is typically based on the availability of annotated corpora, i.e., collections of texts manually marked up by domain experts (with semantic information pertaining to a domain), which can then be used for training and evaluating text mining tools. The levels of semantic annotation in different corpora determine the types of information that can be recognised by TM tools. Named Entities (NEs), i.e., semantically categorised words/phrases, such as drugs and disorders, form the basis for a number of more complex types of annotation. Several efforts have produced corpora annotated with such NEs and demonstrated how such corpora can be used to train machine learning (ML) tools to recognise NEs automatically to high degrees of accuracy.

The process of creating such rich corpora is not limited to training it on the binary relations between the diseases and the drugs because such relationships are limited in terms of the complexity of the information they can encode. For example, a binary relation representing an adverse drug reaction can only encode the fact that a single drug adversely affects or causes the occurrence of a particular disorder. However, additional information in the text may provide important clues, or even critical details, with respect to the safe usage of a drug. For example: We describe a life-threatening side effect of acute epoprostenol infusion (pulmonary edema) in a patient with pulmonary hypertension associated with limited scleroderma (a group of rare diseases that involve the hardening and tightening of the skin and connective tissues) and discuss its potential etiology. Here, the phrase life-threatening denotes an severe adverse reaction. In contrast, a moderate adverse reaction may be considered acceptable, especially if there are other significant benefits to be gained by taking a particular drug/combination – binary encoding does not take into account the variance in reaction adversity

            To combat this problem, NLP researchers and physicians have come up with a corpus called PHAEDRA (Pharmacovigilance Entity Drug Annotation). The corpus includes annotations that go beyond binary relations and encode more complex information in a structured manner. It is intended that PHAEDRA will encourage the development/adaption of machine learning based text mining tools for extracting PV-related information from text, at a level of complexity that has not previously been possible. Ultimately, it is hoped that such tools will lead to the provision of curator-oriented applications that provide sophisticated, efficient and flexible means to explore and pinpoint relevant information in different textual sources, and thus help to increase the coverage, consistency and completeness of information in PV resources.

Moving from pilot to scale in these settings will require addressing several issues and must be grounded in the experience of the beneficiaries of these powerful tools. That refers to using human-centered design when developing and implementing new AI and ML applications. It also necessitates asking legal and ethical questions through a human rights lens that includes privacy, confidentiality, data security, ownership and informed consent. Effective implementation will also require understanding the local, social, epidemiological, health system and political contexts. Furthermore, wide-scale deployment will need to be guided by a robust research agenda. Although not a panacea, AI is one of several tools that could help in achieving the health-related targets set out in the SDGs (Sustainable Development Goals) set by the UN, particularly those related to providing universal health coverage.


Ferraro, J. P., Ye, Y., Gesteland, P. H., Haug, P. J., Tsui, F. R., Cooper, G. F., … Wagner, M. (2017). The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance. Applied Clinical Informatics, 8(2), 560–580. https://doi.org/10.4338/ACI-2016-12-RA-0211

Joshi, A., Karimi, S., Sparks, R., Paris, C., & Macintyre, R. (2019). Survey of Text-based Epidemic Intelligence: A Computational Linguistic Perspective. Retrieved from https://www.who.int/csr/alertresponse/epidemicintel

Sheikhalishahi, S., Miotto, R., Dudley, J. T., Lavelli, A., Rinaldi, F., & Osmani, V. (2019, May 1). Natural language processing of clinical notes on chronic diseases: Systematic review. Journal of Medical Internet Research. Journal of Medical Internet Research. https://doi.org/10.2196/12239 Thompson, P., Daikou, S., Ueno, K., Batista-Navarro, R., Tsujii, J., & Ananiadou, S. (2018). Annotation and detection of drug effects in text for pharmacovigilance. Journal of Cheminformatics, 10(1), 37. https://doi.org/10.1186/s13321-018-0290-y

Leave a Reply

Your email address will not be published. Required fields are marked *