Large data processing technology is changing the current operating mode of the computer. We've got a lot of revenue from that because it's the big data processing technology that brings us search engine Google. But the story is just beginning, and for several reasons, we say that large data processing technology is changing the world: * It can handle almost every type of data, whether it's microblogging, articles, emails, documents, audio, video, or other forms of data.
* It works very fast: practically in real time.
* It's universal: because it uses the most common, low-cost hardware
Big Data offers solutions for these companies: EBay, Facebook, LinkedIn, Netflix, Twitter, and Zynga.
In fact, large data processing is not a new technology, it is only a small part of the technology of a short term. Some of these technologies have been living with us for several years, but the time lapse to 2012, where the smaller part of the technology let big data suddenly fire up.
* * The current Big data market has reached $70 billion trillion and is growing at a rate of 15% a year *
Data storage giant EMC CEO Pat Gelsinger recently revealed that the current market for large data processing has reached $70 billion trillion and is growing at an annual 15–20% rate. Almost all major tech companies are interested in large data and have invested heavily in the products and services in this area. These include IBM, Oracle, EMC, HP, Dell, SGI, Hitachi, Yahoo, and so on, and the list continues.
Seeing these deep pockets of large companies move frequently, VC are not idle, because the field will be their future cash cow. They are looking to invest in reliable start-ups in big data fields, and Accel, the Innovation Incubator agency, set up a 100 million dollar "Big Data" fund last November, and IA Ventures also established the same fund in the previous one months.
* * Large data areas are attracting a large number of people to enter
Everything about Big data is "big": the potential market is big, the business in this area is big, even the small teams that have just entered the field to start a business, they get a big investment. Therefore, we would not be surprised to see a large number of Silicon Valley engineer cattle entering the field. Engineers from Google, Facebook and Yahoo are lining up to enter large data-field startups like Cloudera, Hortonworks and MAPR.
* * Cheap technology makes big data possible *
Large data processing occurs because of the need to:
* Cloud technology gives people the ability to get a huge amount of computing and storage cheaply. You don't have to buy a mainframe or a data center, just pay for the part you use.
* Social media means that everyone is creating interesting data and consuming it.
* Smart phones with GPS positioning systems are providing new insights into people's daily lives.
* Popularity of broadband connections keeps people online all the time
* * Task decomposition, large data technology is composed of four kinds of technology *
As we mentioned earlier, large data technologies are some sort of collection of many technologies, including:
* Analytical Technology
* Store Database
*nosql Database
* Distributed Computing
* * Analysis means analyzing massive amounts of data to get answers in real time *
People think about what we can do with cloud technology. IBM vice president and cloud computing CTO Lauren States explained that with the use of large data and analytical techniques, we hope to gain a sense of insight. She provided a case for an Australian Open tennis tournament. The organizing committee built a analytics engine called Slam Tracker on IBM's cloud platform, Slam Tracker collected nearly 39 million statistics for the last 5 years. This data gives an analysis of some of the athletes ' modes of performance when they win.
* * Memory Database technology (as Databases) enables rapid flow of information *
Large data analysis often uses a storage database to quickly process a large number of recorded data flows. For example, it can analyze the sales records of a national chain store one day, draw certain characteristics and then provide rewards to consumers in a timely manner according to certain rules.
**nosql database is a new type of data processing model based on cloud platform *
NoSQL are also called cloud databases in many cases. Because its data processing pattern is entirely distributed across a variety of low-cost servers and storage disks, it can help web pages and various interactions to apply the massive amounts of data in the process of rapid processing. It provides Web application support for Zynga, AOL, Cisco, and other businesses. A normal database needs to organize data into categories, similar to names and accounts, which need to be structured and tagged. But the NoSQL database does not care about this at all, it can handle various types of documents.
When processing massive amounts of data at the same time, it does not have any problems. For example, if 10 million people are logged in to a Zynga game at the same time, it distributes the data across the world's servers and uses them for data processing, with the result that it's no different than 10,000 people online.
**nosql from players of varying sizes
There are many different types of NoSQL models today. Commercial models such as Couchbase, 10gen MongoDB and Oracle NoSQL, open source free models such as CouchDB and Cassandra, and Amazon's latest NoSQL cloud services.
* * Distributed computing combined with NoSQL and real-time analysis technology *
If you want to handle both real-time analytics and NoSQL data, you need distributed computing. The distributed technology combines a series of techniques to analyze the massive data in real time. More importantly, the hardware it uses is very cheap, making it possible to popularize this technology.
SGI's Sunny Sundstrom explains that we can get a lot of valuable results by analyzing data that doesn't seem to correlate and organize. For example, you can find new patterns or new behaviors. Using distributed computing technology, banks can identify fraudulent behavior of online transactions from consumer behavior and patterns.
* * Distributed computing technology makes impossible possible *
Distributed computing technology is leading the impossible to become possible. Skybox Imaging is a good example. The company's analysis of satellite images results in real time, such as how much parking space is available in a given city, or how many ships there are at one port. They sell these real-time results to customers who need them. Without this technology, it would be impossible to quickly and cheaply analyze such large amounts of satellite imagery.
* * Distributed computing technology is the core of Google, but also Yahoo's foundation
At present, distributed computing technology is based on the technologies created by Google, but it is newly built by Yahoo. Google published a total of two papers, published in 2004, called MapReduce Paper on how to handle data processing between multiple computers, and another published in 2003, mainly on how to store data on multiple servers.
Doug Cutting, an engineer from Yahoo, built a distributed computing platform, named after his son's toy elephant, after reading the two papers. Now cutting has left Yahoo to join the largest distributed systems start-up company Cloudera. Other startups include MAPR and Yahoo's own hortonworks. But all the biggest IT vendors offer this technology, either in the form of products or based on their cloud computing platform.
* * The technology is often free, but consultancy fees are quite expensive * Most of the big data technologies are open projects and are free and profitable by providing services. Many of the IT companies in need do not understand how to build this application, nor is it necessary. Mainstream IT companies are building products and services to help companies take full advantage of distributed technology. These include a number of emerging start-ups. We can believe that future companies like Google will be more from these startups.