Big data: Avoid herd scientific planning is important in the boom

Source: Internet
Author: User
Keywords Large data large data think large data think propose large data think propose can large data think propose can China

"Through the excavation of large data, from the integrity of the data, the sincerity of friends and account security rating users ' reliability '." "A famous matchmaking website recently advertised that it could use" big data "technology to crack down on dating crooks.

Today, "Big data" is a fashionable word, many businesses have launched "Big data" services. Large data technology research and development institutions around the world attract many VCs and eyeballs.

Big Data makes a person bright, also let a person two eyes one black. September, Gartner, a renowned information technology analyst, published a report on the hype behind the popularity of the big data in 2013, noting that 30% of companies in 2013 have started working on big data, while another 34% have plans to start in two years. But most of these companies tell investigators, do not know what they are doing, do not know why to do big data work.

According to the report, more than half of the companies do not know how to derive value from the data; One-third per cent lack large data-processing capabilities; even more than one-fifth of companies do not know what the big numbers are.

If even the most sensitive entrepreneurs don't really understand big data, others will be more difficult to see. The big data age is still in the embryonic shape, it will become what it is no one can say.

Commercial hot words originating in the scientific community

While business has a relish for the business of Big data, it was scientists who started talking about the big data age. "Life and medicine, particle physics, weather forecasting, genetics, earthquake prediction, etc., are already data-intensive applications." Shodan, a professor of automation at Tsinghua University, said, "the typical example is the annual data volume of 30PB (1pb=100 GB) per year in the United States weather forecast, more than 3.5 billion daily observations." DNA sequence analysis using large network data analysis tools for billions of times DNA short chain analysis, the production of DNA based molecular material. The scientists also unveiled a large scale data management architecture and visualization method, allowing the decoding of the human genome to be completed within a week of the previously spent 10 years of work. ”

Shanghai University professor Fee Sharp Introduction: "Like the big gene, the analysis of the amount of data has hundreds of PB." They found 25 cultivated and 24 wild rice in the world, and because of their large data analysis capabilities, they scanned the entire genome of the rice and found 162 genes that determined rice production. ”

The Large Hadron Collider (LHC) is often mentioned, the LHC generates 1PB of data per second, to be archived with 45,000 tape drives, is currently the world's largest data producer.

In this context, in September 2008, the "Nature" magazine launched a large data special issue, to explore the changes in scientific research, Science Magazine 2011 also launched a large data issue, the large data in-depth analysis as a breakthrough in future research.

"Really starting to talk about Big data is a recognition of the 2008 album of Nature," he said. Wang Jipeng, a researcher at the Institute of Electronic Science, said, "in the metrology literature, a blowout was seen in the 2011 paper on large data." Over the years, we've been talking about big data, but it's mainly about application, not theoretical research. ”

Scientists were worried that large numbers were difficult to deal with, Shodan said: "The computer has surpassed the petajoules, ' Tianhe second ' is 2000 trillion times, the next 10 years may reach billion times." But software development is slow. This is the case in the United States, which considers the development lag of high-performance computing. The reason why our national high-performance computer utilization is not high, is here. ”

Typical data management dilemma, as one information technology worker in the country said: "I have done a satellite information resource management, the satellite generated hundreds of gigabytes of data per day." After processing, the file has a corresponding disk, tape, etc., through a number of systems presented, a year down the amount of data is quite large, but did not produce large data characteristics. ”

Industry has raised a similar issue, Shodan, citing an example: "A famous engine company has come up with the idea of uploading its aero-engine data to headquarters in real time, combining past data to detect and forecast failures in real time." The amount of data is very large, it is difficult to detect, calculate and forecast at the same time. "This difficulty, Shodan summed up as" scientific research intelligence how to catch up with perceptual ability. "

After a discussion of the challenges of big data in the scientific community, Internet commerce saw "gold deposits" in large numbers. This is the big data issue that is now familiar to everyone. The most typical example such as Hongyuan Securities Institute deputy Director Yi Huan introduced: "Alibaba, the earliest from business-to-business to consumer, gathered tens small and medium-sized enterprises, formed 540 million registered users. Where's the cow? What are the factors behind these data? Sales data, product data, accounts receivable, inventory, capital flow, property information and a series of comprehensive information, but also real-time, far more accurate than the bank statement. It has your consumption preferences, home address, repayment card number and so on a series of information, this is called large data. ”

A new report by M&m, an IT industry research firm, says the global data market will usher in a 26%-year compound growth rate in the next 5 years-from $14.87 billion trillion this year to $46.34 billion trillion in 2018. The expansion speed is so fast that the market is enthusiastic about the concept of large data.

Different industries have identified the challenges and opportunities for data explosion from different perspectives. In the end, large data are frequently seen in the media as hot words at international summits such as Davos, but as the aforementioned report shows, entrepreneurs are not sure about the concept.

Data disclosure gives America a head Start

After big data opportunities emerged, the Obama administration launched the big Data Research and development initiative in March 2012, as well as the formation of "Big Data Senior Steering Group", marking the United States to raise large data to the national strategic level.

"I think Obama is trying to repeat the Internet hegemony brought about by the information superhighway plan through big data development plans," said Xiaofeng, a professor at the National Defense University. Americans have focused on the future of big data, and I think it's about creating the foundation for future big Data hegemony. ”

"The root of the US government's big data plan is that it has more than more than 10 years to make the data public," he said. Ding Gangyi, a professor at Beijing Institute of Technology, said that the Data.gov Web site in the United States reflects the Government's efforts to disclose data. Gov has a very large amount of data, comparable to those of the World Bank and the United Nations. Many of these are sensitive data, but it is a dare to announce that only by publishing such data can there be better international cooperation to deal with the crisis. The European Union, the UK, and developing countries, including Brazil, have joined the data.gov.

Ding Gangyi said that in the data public, the United Nations organizations and some of the United States research institutions have been doing their best, for 10 years. Every year there are various activities to promote the disclosure of data.

The opening up of data in the United States has enabled many services based on government data to create enormous benefits. Silicon Valley, for example, has a "climate company" that uses weather data from the US Meteorological Bureau database for decades to study the correlation between rainfall, temperature, soil conditions and crop yields over the years, predicting farm production for the next year to sell insurance. The company, due to its bright future, was recently acquired by the agricultural giant Monsanto.

There is also the use of meteorological information and flight delayed information to predict the probability of flight delays, can promote the airline to improve punctuality rate. The government's data, such as the city's blockage, came in handy – the United States and Britain were the first to use large data to manage traffic, to give traffic forecasts, and to allow public and private vehicles to travel at the right time.

"The real stumbling block for governments to achieve their goals is not just in collecting data, but in translating it into usable information products and developing knowledge," Hickman, chief information officer at the US Department of Commerce, said at an IT meeting. ”

"After all, many of the people who have the ability to make good ideas are scattered across the private sector, and they may come up with some excellent programs for data use." "The sharing of our data is not just about pursuing the so-called transparency of government affairs, it is really possible that the data we generate and disseminate in a completely new way can be created with a different kind of power--something that cannot be achieved in the face of our existing planning and limited resources." ”

Ding Gangyi said he had contacted some of the country's leading internet companies to share their data, and the company said, "give you a piece of it, hundreds of TB or several PB, but the continuous data is absolutely not." He believes that for researchers, long-term, anytime and anywhere can contact the data is the big data.

"The government and industry share data should be the basis of large data, leaving the sharing policy, there is no big data." "Ding Gangyi said.

Big data boom needs to be shared + legislation

The Chinese are not too late to focus on big data. July 2012, China's "Twelve-Five" national Strategic Emerging Industry Development plan explicitly put forward, to "strengthen the mass data processing software, such as the representative of the development of basic software." December 2012, the Zhongguancun large Data industry alliance was announced.

According to IDC, an IT analyst, China's large data technology and services market will grow rapidly to $616 million trillion in 2016. But this amount is only a fraction of the world market.

Qin, a network research expert, said in a media interview China's establishment of large data institutions does not seem to lag behind the United States in time, but large data applications involve the entire Internet-centred industrial chain, and the US's leadership in big data applications is determined by the strength of multinational IT companies such as Cisco, Microsoft and Google, I am afraid that China cannot completely surpass it within dozens of years.

September 30, the Communist Party's Politburo in Zhongguancun research, Baidu CEO Robin Li preached large data topics. He believes that large data is the most valuable in two aspects, one is to promote information consumption, speed up economic restructuring and upgrading, the second is to pay attention to social livelihood, and promote social management innovation. Li also said that to develop large data at the national level, we must promote data openness, support scientific research and nurture talent.

This "Open Data" view represents a consensus among China's big data-industry observers. Internet critic Gockai pointed out: "The data open in the separatist state of the internet on the strength of enterprises is not complete, only from the government level to promote." Now there are so many companies doing big data, in fact, most of them only have a shell, there is no actual content. Large data is based on huge amounts of data, do not have a certain amount of data, is not done. So Li put forward the concept of data openness. ”

Gockai that the difficulties in the data open "is the supporting management system and laws and regulations, the role of the Government is to maintain its fairness, and resolutely protect the commercial interests of small enterprises, play a good role in the management and arbitrators, not to let their own economic interests involved in the meantime."

Hequan, an expert in information industry and academician of the Chinese Academy of Engineering, wrote this year in an article entitled "Opportunities and challenges in the big Data Age": "China's population ranks first in the world, will become the most data-producing countries, but we do not pay attention to data preservation, the utilization of storage data is not high. In addition, some sectors and institutions in our country have large numbers of data but are unwilling to share them with other sectors, resulting in incomplete or repetitive investment. The government should break the data separatist and blockade through the reform of institutional mechanism. ”

Another industry expert told reporters that the government's support for the large data industry, in addition to promoting the disclosure of information, should be based on the purchase of services rather than on the basis of unnecessary government projects.

In addition, Hequan also pointed out that the disclosure of Information Act should be enacted as soon as possible. "Many organizations and businesses now have a lot of customer information. It should not only encourage the data mining for the community, but also prevent the infringement of individual privacy, promote data sharing and prevent the misuse of data. "He believes that the need to define the data mining, use of the authority and scope, to prevent information from being damaged, tampered with, leaked or stolen, to protect the information security of citizens."

"(large data) standards and industrial structure has not yet formed, is a valuable opportunity for China to achieve leapfrog development." Hequan said, "should pay attention to scientific planning, avoid herd." ”

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.