The relative absolute relationship between large data and data mining

Source: Internet
Author: User
Keywords Data mining big data we
Tags administration agency big data computer data data mining e-mail information

When Snowden Edward Snowden is still looking for shelter, the US National Security Agency (NSA) 's full collection of telephone and e-mail records has sparked unease and outrage after his revelations.

The Obama administration claims that surveillance data has brought security, while the left and right are condemning the spying as a violation of privacy.

Data is not information, but a raw material to be understood. But one thing is certain: when the NSA is spending billions of to improve new tools to "tap" information from its massive data, it is benefiting from the steep landings of computer storage and processing prices.

Researchers at MIT John Gutag John Guttag and Collin Stalz (Collin Stultz) have created a computer model to analyze the ECG data discarded by heart patients. They used data mining and machine learning to sift through massive amounts of data, and found that three of the abnormalities in the electrocardiogram were twice times more likely to die in a second heart attack in a year than those who did not appear. This new approach can identify more of the high-risk patients that are being screened out through existing risks.

Data mining is a broad term that refers to a number of mechanisms that are usually implemented by software to extract information from huge amounts of data. Data mining is often called an algorithm.

David Krakauer, director of the Wisconsin Institute of Exploration, said the growth of data-and the ability to extract information-also David Krakaul science. "The computer's processing power and storage space are exponential, and costs are falling exponentially." In this sense, many scientific studies now follow Moore's law. ”

In 2005, a 1TB hard drive cost about 1,000 dollars, "but now a USB disk less than 100 dollars has so much capacity." Krakaul, who studies intelligent evolution. Now the discussion of big data and data mining "happens because we are in the midst of an earthshaking transformation and we are perceiving it in ways we have never seen before." "Claraul said.

As we leave more traces of life through phone calls, credit cards, E-commerce, the Internet and e-mail, the growing business impact of big data is reflected in the following moments:

You search for a flight to Tascaruza and then you see Tascaruza Hotel discounts on the site.

The movie you watch uses computer graphics and image technology based on hundreds of thousands of G data

The stores that you patronize gain maximum profit based on data mining of customer behavior

Using algorithms to predict people's demand for tickets, airlines adjust prices in unpredictable ways

Smartphone applications recognize your location, so you receive service information from nearby restaurants

Big Data looking at you?

In addition to security and commerce, large data and data mining are also surging in the field of scientific research. More and more devices, with more sophisticated sensors, are sending back increasingly unmanageable data streams, so people need more and more power to analyze. In the fields of meteorology, oil exploration and astronomy, the blowout growth of data has supported and even made demands for higher levels of analysis and insight.

Schematic diagram of ocean surface currents from June 2005 to December 2007. Data source: Sea level data from NASA's Topex/poseidon satellite, Jason-1 satellite, and sea map mission/jason-2 satellite altimeter; gravity data from the nasa/German Aerospace Center, gravity Recovery and climate test missions Surface wind pressure data from NASA's QuikSCAT mission; sea-level temperature data from the advanced microwave scanning radiometer-Earth observation System of the nasa/Japan Aerospace Research and Development agency; sea ice concentration and velocity data from passive microwave radiometer; temperature and salinity distributions come from shipboard, mooring-type measuring instruments, And the International Argo Ocean Observing System.

This schematic diagram of ocean surface currents from June 2005 to December 2007 integrates satellite data with numerical models. Eddies and narrow currents transmit heat and carbon in the ocean. Ocean circulation and climate assessment projects provide all the depth of ocean currents, but only the surface currents are used here. These diagrams are used to measure the ocean's role in the global carbon cycle and to monitor heat, water and chemical exchange within and between different parts of the Earth system.

In the field of medicine, 2003 years is a milestone in the process of large data emergence. That year the first human genome was sequenced. After the breakthrough, thousands of human, primate, mouse and bacterial genomes have expanded the data they hold. There are billions of "letters" on each genome, and there is a danger of mistakes in the calculation, which has spawned bioinformatics. This discipline supports new scientific types by leveraging the power of software, hardware, and complex algorithms.

Mental disorders are usually specific to specific cases, but a study of 1.5 million cases shows that a significant number of patients suffer from more than one disease. The Silvio Canter Center at the University of Chicago uses data mining to understand the causes of neurological disorders and the relationship between them. "Several (research) teams are working to solve this problem. "We are trying to incorporate all of them into the model, to unify the analysis of those data types," said Andrey Rzhetsky, director of the center, Chiski. Look for possible environmental factors. ”

Source: Andrey Rzhetsky, University of Chicago

Another example of bioinformatics applications comes from the National Cancer Institute. The Institute's Susan Holbeck (Susan holbeck) tested 5000 of the FDA-approved anticancer drugs on 60 cell lines. After 300,000 tests, Holbeck said: "We know the RNA expression level of each gene in each cell line." We have data on sequence data, protein data, and microscopic RNA expression. We can use all of this data for data mining to see why one cell line reacts well to the mixture and the other does not. We can take a pair of observations and develop suitable targeted drugs and be tested clinically. ”

(Responsible editor: Mengyishan)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.