Describe in detail the origin and future of big data

Source: Internet
Author: User
Keywords Big data era when we become a bunch of numbers
Tags big data big data era cloud computer content cookie cookies data

Some time ago, the industry changed "big data" from a common noun into a proper noun. But there are still many people who fail to understand the relationship and difference between "turning us into a bunch of numbers" and "big data." So this article borrows both books to introduce these two concepts.

First, "When we become a bunch of numbers, we noticed? Every time we search for a word or an event in Google or Baidu, when you visit the Internet again to browse other sites, in the sidebar or screen above the bar The ads that appear on the page are related to the content we just searched. For example, you searched for a digital camera information, and immediately you will find in the next browse on the screen above or to the right ads and digital cameras.

In addition to the Internet search, the same thing happens when we use Web mail. When you send an e-mail to a friend to discuss the next holiday to which resort to go on vacation, you will find next time you open the mailbox, next to the advertisement and the local resort hotel or return flight ticket related. These are related to a network proper noun "Cookie".

Cookies What does it mean? Cookies are literally "cookies," but in the online world cookies are cookies (.txt formatted text files) temporarily stored on your computer by the server so that the server can be used to identify your computer . When you browse the Web or send a Web mail, the Web server will send a small information on your computer, Cookies will help you write the text on the site or some options are recorded. Next time you visit the same website again, the web server will see if it has the last cookie information it left, and if so, it will judge the user based on the content of the cookie and send it to you.

Through cookies, each of us is digitized, our personal information, personal preferences, range of daily activities, purchase preferences, all in a series of numbers displayed in the online world.

Because of this cookie, whenever we buy a book on Joyo or Dangdang (or even just browse a few books on their site), the next time we go to their site, the site will recommend and you last time Buy books with related topics. Like to know your preferences.

If you travel abroad with a laptop, when you enter the resort and settle down, you will often receive a warning from the Web Mailbox server when you turn on your computer to check your Webmail over WiFi (for example, Gmail) Your computer is logged in at a location that has not been there before, and if it is indeed your own login then please follow the steps given below to activate your email. This is another example of working with cookies.

When we turn into a bunch of numbers, this book cites many of these cases, detailing how cyber geeks, cyber-entrepreneurs, use the digital information they collect from the web and spend a great deal of time Relevance and Causality between Group Data. Therefore, we can create new business opportunities, improve social efficiency and improve our lives.

Then "big data" is how is it?

With the proliferation of smartphones, tablets, and laptops accelerating our "digitization of identities," the Web server has encountered new challenges when it comes to such massive amounts of data and information: the need for processing power and storage capacity Have increased explosively. Do you still remember? 15 years ago, our PC hard drives are 200M or 500M capacity, but 5 years ago, PC hard drives are 250G or 500G. A G is 1000 M. And now the mobile hard disk sold on the market are often a few T, a T is how much? However, the amount of storage that a Web server now needs to handle large amounts of data on the network is based on Peta. One should guess that one P is 1000 T.

At the current state of the art, few single computers can handle such large amounts of data, and few single storage devices can have such a large capacity. Fortunately, the concept and technology of "cloud" just happened to be very mature recently, so using the massive computing power and massive storage capacity of "cloud computing" and "cloud storage", network geeks and network entrepreneurs successfully entered the era of "big data "!

Big data development and application has three characteristics:

1, data samples, no longer sampling, but the data is the overall. The analysis data includes all the collected data.

2, the data need not be 100% accurate, without removing the exception, but summed up from the "massive" data most of the common characteristics of most people. With "most of the features" represents "the overall characteristics."

3, no longer focus on the "causal" between the data, but only focus on the "relevance" between them.

In the "big data era" cites a lot of examples to explain the above point of view.

1, the language of translation:

Traditional translation software is set a lot of grammar rules, with these rigid rules of dismantling each sentence, and then verbatim translation to generate different language statements. However, this tends to make translated sentences very hard-nosed and even extremely error-prone. The industry is also rumored to be a joke about Microsoft's "machine translation department": the quality of translation will improve as each language specialist in the department resigns.

Google's translation software did not do that, and most of the members of their translation team were not only linguists but did not even know the language they translated into. All of them are statisticians who compare rules from a large number of existing translations to find rules and produce translated text. Facts have proved that although there are many mistakes in the translation of articles collected online, which are of low quality, these mistakes are naturally ignored due to the huge amount of data. This method indeed greatly improves the quality and accuracy of translation. In other words, "not 100% accurate" big data with a simple algorithm than the precise small data with complex algorithms to effectively more!

2, the prediction of flu

In the past, the information released by the health authorities about the flu epidemic was often collected from major hospitals and clinics. The biggest disadvantage of this method is that "the information is lagging behind." It is a fact that "disaster" has become a reality. Warn the community that many people have already been infected at this time.

Again, Google put forward different forecasting methods. They are alerting to the flu about how many people search for "Cough?" Or "What to do with fever?" In addition to noting the dramatic increase in the frequency of searching for these words, you can lock in which area people are starting to search heavily The answer to these questions. So Google will be able to tell the public early flu has started to prevail, and the infected area is to which direction to migrate. This project enables health units to develop early preventive vaccines to control the spread of the epidemic as early as possible, significantly reducing the spread of the flu.

3, the level of aircraft fares and early pre-order relationship

Perhaps you will intuitively think the sooner you buy a plane ticket, the cheaper you can buy it. A founder called Farecast inspired a new service from his personal experience. He found himself sitting next to him a few days later than his purchase of air tickets than his purchase price is still low. So he collected data on all airlines fares and pre-order time, and established a mathematical model. Now anyone of us can go to his website: farecast.com, enter your departure and destination, plus the time you want to leave, and this web page can tell you right now whether to buy a ticket right away or wait a few more days buy.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.