Combining AI, NLP and Big Data to Monitor and Control the Spread of Epidemic Diseases

Whether it was Ebola yesterday or the coronavirus today, continuous monitoring and containment of epidemic diseases has been an on-going global challenge. These are moments when the global healthcare organizations and professionals actively collaborate to fight the threat to human life.

In the era of artificial intelligence (AI) and big data, it is a logical next step to mine massive datasets for intelligence. Such intelligence provides us with a peek into the future based on historical patterns. AI-focused healthcare industries are using private data resources to predict epidemic outbreaks. Recently, BlueDot, an AI-company that specializes in disease surveillance predicted the spread of 2019-nCov (Novel Coronavirus). They used data ranging from plant and animal disease networks to global airline ticketing information to identify the location and the direction of spread of the virus.

Other researchers are sourcing data from unusual sources such as social media tweets, news articles and potential cases reported by doctors to make predictions about the spread of the disease. These researchers are focused on understanding the direction of spread and the rate of spread for coronavirus. The nature of the data used by them is largely unstructured text. And this is where NLP and AI can be effectively and efficiently used to make data intelligence a reality.

While the work of these researchers can help contain the spread of an epidemic, but there could be significant delays between the detection and the implementation of appropriate responses. Under these circumstances, using Big Data, AI and NLP technologies to provide predictive intelligence that is typically pre-onset, or during the early onset, of epidemic diseases is a step in the right direction towards addressing the problem.

A good approach would be to implement a centralized platform to collect electronic health records (EHRs). These platforms can enable us to build an NLP-based predictive analytics dashboard to monitor disease patterns within and across regions, help detect anomalies and raise early warning alerts, if necessary. Flagging these anomalies based on the early onset or pre-onset disease conditions and symptoms can help authorities take preventive / control measures to limit the spread of epidemics.

Consider the recent emergence of the coronavirus in Wuhan, China. Suppose Wuhan has 10 healthcare providers and each healthcare provider registered cases from 1 or 2 senior citizens with symptoms including shortness of breath and mild fever. For an independent health care provider, they are just 2 patients and are likely to be dismissed as regular cases – but if the exact same symptoms were presented across 8 of the 10 health care providers by 20 senior citizens in a single day, an intelligent solution can flag these situations as unusual, thus preparing the local authorities to take appropriate preventive / precautionary measures.

In the healthcare industry, the high volumes of unstructured data contain critical signals and nuggets of information that are vital to detecting potential outbreaks.  Integrating AI/ML to the available text and numerical data can help us distinguish regular trends from anomalies. Having such systems in place also help detect the outbreak of epidemic diseases (anomalies) earlier than the prevailing techniques. A simple forecasting of patients’ disease and symptom trends can also help healthcare providers manage inventory and resources more effectively.

These kinds of centralized data sources should not be limited for generating analytics and reporting purposes. Research in healthcare and AI can tremendously benefit from this kind of data. It will help researchers to predict future trends in several different areas such as suicides, accidents, seasonal disease outbreaks and so on. These big data sources can help build robust NLP models and that in turn can help health care providers and local governments respond faster by automating healthcare-related logistics.

However, the idea of centralizing patient data comes with its own set of challenges and limitations. Healthcare providers typically have confidentiality clauses to consider while sharing information. Additionally, such data is typically extremely unstructured text. It can be extremely time-consuming and expensive to make all health care providers, even within a single network, profile their patient information in a standardized manner. While centralizing EHRs and building NLP-based predictive analytics models is a great idea, it requires extensive collaboration from public and private healthcare providers, government, and independent bodies to make such platforms successful.

Given the challenges involved in building a centralized data repository, a different approach can be to build predictive analytics applications over existing decentralized data sources. Recent research suggests a clear rise in the use of big data platforms by independent health care providers. Until now, major applications focused on doing text mining using classical NLP techniques. These text-mining applications typically have a different set of objectives, such as simplifying administrative activities.

This is just a glimpse of how AI can help monitor or control epidemic outbreaks before they become crises. Additionally, with the advent of AI, ground breaking research in areas like drug discovery can help accelerate the development of new drugs and vaccines. In short, we are all working towards a better outbreak tracking, reporting, and response management systems.


Knight, W. (2020). How AI Is Tracking the Coronavirus Outbreak. Retrieved from

This Canadian start-up used AI to track coronavirus and raised alarm days before the outbreak. (2020). Retrieved from

Yakobovitch, D. (2020). How to Fight the Coronavirus with AI and Data Science. Retrieved from

Leave a Reply

Your email address will not be published. Required fields are marked *