Whether the cloud-computing tide has receded for the time being, and the latest survey by Gartner, a research institute, shows that the cloud computing sector will keep growing, and growth will slow, after all, cloud computing has been a lot of trouble.
Now, a new concept that is not clearly defined as cloud computing is increasingly popular-"big data". And large data has begun to change the IT landscape, according to Gartner data, only 2012 years of large data to drive the global 28 billion of dollars in IT spending, 2013 led to the scale of IT spending is expected to further increase to 34 billion U.S. dollars.
Looking back at the front of the big data?
The origins of large data theory must be the United States, but also the United States, not because of the advanced technology in the United States, but because they have a large number of users of the Internet service base. Social networking, Internet of Things, E-commerce started early, mobile device popularity high "congenital" factors also make their data no longer "pure", and the simple data format can not meet these business needs. Of the three types of structured, semi-structured, and unstructured data, structured data the current traditional RDBMS technology (relational database management system) is mature compared to other technologies and has obvious performance advantages, but for the other two forms of data, the current solution is still growing or even just beginning.
For large data generation, it can be said that there is no doubt thanks to internet companies, but in fact, not only internet companies to use large data, the current banks, insurance industry companies, telecommunications operators, some manufacturing sector enterprises, meteorological industry, medical industry are real and direct users of large data. Now the Internet, E-commerce, FMCG Enterprises because of the fastest growth in data volume, and make their demand to walk in the forefront of large data. Mr Courio that these companies have more or less their own solutions and technologies, and from the American experience, one of the major processing technologies Hadoop, though not the only solution, has become one of the main solutions, especially since 2006 when it was open source, nearly 6, 7 years of development is becoming more and more stable.
Cracking down on big data errors
Although some people say that big data and cloud computing are two distinct concepts, it is undeniable that there are many intersections, even "big data is inseparable from the cloud." From the hardware layer, the distributed storage and the flexible support of virtual Server are all important features of cloud computing, but it is also because so many people have produced some mistaken ideas.
Myth One, "Big data is Storage", a new storage technology.
In fact, this "misunderstanding" is only one-sided understanding, mainly in the storage of large data is the basis, and more importantly, processing work, after all, storage is to prepare for further processing. So from this point of view, the general people's understanding is somewhat wrong. So keep in mind that large data must be stored and computed at the same time.
Misunderstanding two, industry audience small, not widely applicable.
Although the large data originated from the Internet, because of the existence of heterogeneous data, many traditional industries actually need more urgent. such as graphics, image recognition and other fields, automatic control field many scenarios require large data to help.
Of course, some people will think that structured data processing is relatively easy, not the concept of "big data", or the large number of data processing may only BI, to provide business intelligence. In fact, in addition to BI, there are times when you need to search for text or graphics, and there are some things that you might want to improve your experience with, such as carriers and financial insurance companies. We can start with a layered description of the data:
Top layer: Hot data, this is the heat, it is the highest real-time demand, in a few seconds after the query will be the results;
Middle: Warm Data, a bit of temperature, it needs to query at any time, it does not need a few seconds to deal with the results;
Bottom: Cold data, the biggest feature of this type of information is that it looks like I'm not going to use it, just get up.
And in this three-tier data, the easiest thing to do is cold data at the bottom, which can be deposited on disk as long as conditions permit. The most direct starting point is the top layer, a large number of data mining, data warehousing cases and solutions to make the relationship based hot data easy to apply.
Of course, the completion of all three-tier data processing has already illustrated the company has a set of data lifecycle management. But the point is going back to the data itself, what does all this data do? How valuable is it to keep this data? Maybe the problem is how to find out how different you are from your peers, how to offer different services to your competitors, and how to make the user experience the starting point with your peers. And now the market to see that although many enterprises have this demand, but most of the big data solutions are in the form of projects, not a product or the introduction of a standard product for an industry, which makes it difficult for more users to speak clearly of their own needs, but also to the technical implementation itself has created a huge obstacle. This situation has led to the introduction of our software/hardware integrated large data machine, in the form of products delivered to users, to promote the large data market.
Hadoop is a double-edged sword
One of two blades: the pressure of internal talent and technology
Actually starting with Hadoop announcing open source, its own continuous improvement, although experienced many years of development, but the talent pressure, technical pressure still exists, such as an obvious example is that the old Internet companies have such a group of people, they still in the form of internal projects to serve the company, rather than put on the market. At present, such teams need products and programs as the main task, vertical industry as a division of services, that is, to resolve the pressure of service, but also to provide customers with local technical support services.
Two-edged second, competitive pressure of external environment
Hadoop is not the only way to solve big data, and different vendors have their own solutions, whether it's a veteran it giant or a pilot in the storage field, and there's a lot of it for the Hadoop depth optimization release. But the difference is that Hadoop is just a "software" built on the operating system, although the software can manage a large number of data, but for the operating system has not been overlooked.
The existing large data integration machine appliance not only has its technical value, when users do not know their specific needs in a particular time, or to say that there is a demand for the direction but seemingly do not have the time, large data integration machine to help users to find demand and even to guide users to mining needs, the different industries of the standard products "Insert" to support the needs of all walks of life.
Large data Services, one is the hardware, the second is the software, the third is the service, from the foundation to the application, and then to the business help must be tied up in these three areas, so that users experience one-stop service.
This is not so much a product planning strategy as a means of market cultivation and cultivation. For large data services, especially large data services based on Hadoop, it is important to choose a service that is consistent with water and soil.
Returning data to the back of the cloud to reproduce the data
In the cloud of the road today, the big data is dealt with by the general Enterprise now does not deal with the so-called semi-structured and unstructured data. Now most of the enterprise management software is to use the database or data storage to do data storage or analysis, these only account for the traditional enterprise all the data 15%, the remaining 85%, only need to use this large data processing platform for further analysis, in which to find the source of competitive differentiation, To make the customer experience better.
But the real value of the big data we often say is idealized. In the 3-dimensional space consisting of 3V velocity, volume volume, and produced type of large data, it is often a contradiction of mutual exclusion. Of course, the author deliberately did not mention the fourth V, the reason is that value is more uncertain factors, compared to other 3V index requirements more difficult to define, so did not enter 3V space.
If a different view, the 3 V to form an ellipse, on the edge of the trajectory of any three points can be composed of a 3V triangle, and often the shape of the triangle will certainly have a preference: some large, real-time requirements low, some real-time requirements, data is not particularly large. Of course, different manufacturers may draw the oval shape is not the same, but these are large data to be analyzed and dealt with the category. and a new data processing source to introduce new data processing platform, after the introduction, but also to these existing enterprises in the BI system or into the data warehousing, enterprises can use the existing management software directly to call, see its results. So from this point of view, in fact, large data and traditional corporate data is complementary.
IaaS, PAAs, SaaS three-cloud service model, IaaS and PAAs should theoretically be the technology to support SaaS services, and now, especially in the country, the three-service-form clouds are unstable, SaaS, which should have been the most revenue, still hasn't been fully played at home. In other words, the current domestic cloud industry has not been fully developed, and now a swarm of large data into the wave, perhaps storage technology, network technology to meet the needs of large data, but the cloud-chasing users can really tell their own large data needs? Do users with unknown requirements want to enable large data? , in the end should how to deal with large data, refer to the solution of the product, may be a good stepping stone.
To view a large number of data solution providers, advocating technology as the establishment of industry standardization solutions, rely on products to conquer, rely on technology to serve users, rely on the concept of leading the industry, by the brand to build the ecosystem.