It is feasible to use big data to control the flu

Source: Internet
Author: User
Keywords Big data flu
Tags anti- anti-virus beginning big data close control data domain

Flu, the topic that deeply troubles people in recent years. In the face of wave after wave of viral influenza, not only aroused the close attention of academics, but also often become a heated topic in the public domain. From 2009, the H1N1 Influenza A virus, which caused a staggering spread of 200,000 people worldwide, led to the rest of H7N9 bird flu in China by the beginning of this year. The influenza virus has been constantly reshaped and flooded the world. As a result, the drugs and vaccines are neither prepared nor used , Either can not prevent, in useless. At that time, the early detection of the trend of influenza could not only secure valuable time for preparation of antiviral drugs, but also help the vaccine research and development institute take measures to "symptomatically" as soon as possible.

The current influenza test is based primarily on the WHO global influenza surveillance network established by WHO in 1952. The network consists of 128 national influenza centers from 99 countries and the Influenza Reference and Research Collaborating Centers.

As far as the current situation is concerned, the smooth operation of this network has played a huge role in the monitoring and prevention and control of influenza. However, this is because the flu has been mostly regional in recent decades. According to the prevalence of the pandemic flu, the risk of a global flu pandemic is increasing.

This also puts forward higher requirements on flu surveillance: how to detect the signs of flu epidemic earlier and more accurately?

A group of Google engineers have the same idea. As the world's largest search engine, millions of users are using search services provided by Google every minute of every day, and there are also a few people searching for health information. These user actions provide a wealth of valuable analytical data.

As you can imagine, in the flu epidemic season, the number of people searching for flu symptoms will soar, and in the flu-prone area, the proportion will rise accordingly. This means that there is a certain degree of correlation between the search trends for flu-related keywords and the prevalence and severity of the flu. Although not all people who search for such keywords have flu symptoms or flu, bringing together these search results may create an accurate and reliable model for real-time monitoring of the current flu epidemic and Estimate the status of future outbreaks.

Engineers first task is to choose the flu-related keywords. This step may seem simple, but it is very tricky. Although it is certain that the key words containing the word "flu" will certainly be selected, the forms of language organizations vary widely and it remains difficult to determine exactly which keywords should be included.

The engineer simply handed over the option of the keyword "rudely" to the machine. They picked Google search keywords up to 50 million, respectively, into a pre-built model, and then the curve generated by this model and the US Centers for Disease Control and Prevention (hereinafter referred to as the CDC) influenza epidemic curve Fitting, and then screened to a maximum of 100 keywords.

So far, half of Rome has been built. The next step is to find out what is relevant to the flu from the 100 best-predictive keywords and combine them to create a predictive model. Finally, 45 keywords fall into the eyes of engineers.

Practice is the only criterion for testing truth. The best evaluation criterion for predicting a good or bad model must be to observe whether it can pass the test in actual combat. In retrospective validation, Google engineers compared seasonal flu data from 2003 to 2007 in New York City, the United States, with model-calculated data and found a correlation coefficient of 0.90. More useful for this model is the result of "future" verification. Encouraged by backdated results, Google engineers, beginning in early 2008, compared the results derived from the model with data released by the U.S. CDC two weeks later. The results are still encouraging, with the correlation reaching 0.90 as well. In the end, the engineers also wrote the paper on how the model was built in Nature magazine.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.