This period of time to contact large data related projects more, naturally some experience and feelings. Feel oneself before to this field of understanding is not comprehensive, even a bit blind, then in specific project or concept certification stage went a lot of detours. But it is good to know the brothers and sisters who have contacted many partners in the process of these projects, and learn a lot from them. Now try to get these ideas sorted out and share them with you, and hope that the children who are interested in the big data will help you a bit. This ppt has two main parts: one part is about the large data application scenario and the difference between the traditional solution, etc., there is also a part of the domestic large data some solution providers and some practical application cases. Only the first part is posted here, hoping to help you sort out some of the conceptual problems of large data.
In the figure above, precise data means that each piece of data has an exact meaning and a definite value that conveys a clear message. For example, a manufacturing record. Traditional relational databases to handle this type of data. Based on this kind of data, the performance business value is the strong point through complex logic analysis.
Large data age data is characterized by a large number of fuzzy data. A single piece of data has no definite value and clear meaning. For example, a click record for a Web page. The advantage of Hadoop is that it can summarize and sort the mass of fuzzy data into meaningful data, and then generate business value through a large number of samples than the equivalent.
So, essentially, there are two different techniques for different objects in different scenarios. If you want to use Hadoop to replace the RISC architecture of the database, BI application. Then we must break the original business classic for decades of data structure, redefine the data model, table structure and so on. Or what I've mentioned before is to start another martial art. But then whether the efficiency must be higher than before, whether the effect must be better than before, from the results of my several project test is not optimistic.
However, in some cases, large data technology can better solve some traditional structural data problems than RISC architecture, such as ETL. In some industries, ETL work often requires a very long process. The use of map/reduce technology can greatly shorten the work flow of ETL, improve efficiency, and with the increasing amount of data, this advantage will become more and more obvious. So the key to using Hadoop to try to replace the original RISC architecture is to see if the data is large enough and the data type is diverse.
The above picture is taken from the bi reasrch. With the delay requirement of the data query as the longitudinal axis, the data volume and the degree of structure for the horizontal axis, the application scenarios of the Hadoop technology and the traditional relational-RDBMS are listed. The reason Hadoop appears is to deal with the offline analysis of a huge amount of unstructured data. Therefore, the application scenario is also based on this kind of strength, that is, the data quantity is big, the structure degree is low, and the real-time requirement of the analysis is not high. Of course, with the development of its technology, the outer edge is expanded by the addition of different components such as hive. But it is almost impossible to replace the original RDBMS completely.
As the first picture says, in the big data age, there is no solution that can be all. The future of the enterprise will also be a variety of solutions to deal with various types of data environment. Next, try to classify several types of application scenarios for the current database, and list some solution names at home and abroad. Foreign programs for simplicity, I only list features that are quite distinct. Not writing Exadata is because it's a bit of a hybrid solution, and it's a bit inappropriate to simply position it in one area. And the domestic can and it has the same type of solution, not to mention the first. Another day, I will tidy up my exadata some superficial understanding to everyone to spray. With regard to domestic programmes, I have listed only those providers that I know or have cooperated with, and of course there are many omissions. Of course, there are some I think the characteristics are not clear what the core technology is not mentioned. Here only list their names and focus on the field type, detailed some of the introduction is not posted here, anyway they can be found in Sina Weibo, hehe.
Of course, the scenarios listed in the previous illustration are not unique to the solution. Some scenarios are capable of multiple scenarios.
For example, MONGO DB can also do map/reduce work. Hive can provide a SQL interface for the Hadoop system and so on
Finally, I'd like to talk about some of my general feelings about large data solutions providers in China. Of course, the same sentence, these views are only in my contact with a few of the solutions, does not represent the overall situation in the country, I do not have so much energy to understand, and do not have this ability. These feelings are for reference only.
On the appropriate customer base, I said above is only my personal suggestions. I think that these domestic solutions suppliers, through a number of practical enterprise application case practice, as well as the help of some partners to truly mature, to commercial, to challenge those foreign well-known products. I think from the present point of view, technology is not a problem, the direction of the road is not wrong. The key is to their own planning and technology to commercial, to the production process of the operational capacity. I also sincerely hope that the big companies in China can give the domestic technology and ideas of the solution providers some opportunities, so that they can accumulate experience, grow and develop.
Write this first. There is also a part of the domestic large data programs and foreign solutions, as well as the Intel Hadoop program with Cloudera contrast, the current I write mainly for the internal share of the use of the company, it is not public. Interested friends we can discuss them orally next time. There are a few of my own personal involvement in the industry case, back to the company if they have permission to share it openly. Finally wish you a happy weekend, haha!
Original connection: http://blog.sina.com.cn/s/blog_62242b8d01014d1w.html