Overview of big data: Architecture and Algorithm
Like the concepts of mobile Internet, o2o, and wearable devices, "Big Data" has swept the world from the beginning to the beginning, from the initial technical term to the formation of social phenomena that penetrate all walks of life, it took only a few years. So, Will Big Data look like a lot of popular concepts that have been so hot and difficult to trace? In the future, people will look up quietly and find that the wind is gone and the water is not busy, leaving the sunset in vain, can't help but lament whether it is dead or not? The background of this book seems to show signs of this. When a concept is popular, he can talk with you in an endless manner to catch a passer-by from the street. When a new term comes overwhelming, it makes you familiar with the extent to which you will vomit once again. These are indeed the signs of a typical bubble breaking. At present, there are more and more voices of questioning. In this crazy atmosphere, rational questioning is the most valuable. After all, big data is ultimately a Minority Game, the status quo seems like everyone is in the state of big data, which makes it seem irrational. However, from the perspective of social development trends, it is obvious that big data is one of the largest trends that can be seen by the naked eye. From the traditional IT industry to the Internet, the Internet to the mobile Internet, from the mobile Internet with smart phones and pad as the main terminal carrier to the mobile Internet with wearable devices, then, to the Internet of Things, this must be an irrevocable rule of development and a forward direction. With this trend, more and more excess data is generated, and more forms are generated. Big Data is obviously a clear and inevitable development trend derived from this. Therefore, in the final analysis, the concept of big data is a field that is too hot in the short term, but not enough in the long term. If we look back to the present in ten years, we may find that today we are wandering in the mountains, trying to find a grain of lamb that leads to the top of the hill. We certainly cannot follow the trend blindly and constantly pursue hot spots, but ignoring the power of the trend is also not a rational choice. The contents of this book are the background of the birth of this book. Currently, there are many books on the market that talk about big data, and there are concepts for the masses, there are also books that explain big data technology. This book belongs to the second category and focuses on the architecture and algorithms related to big data processing. I believe this is a book that comprehensively sorts big data technologies. Since the end of 2010, I began to pay attention to and collect technical materials in this area. Of course, I have not heard of the concept of big data yet. The so-called big data is now something that will happen later. I was initially concerned with nosql-related technologies, especially a series of work related to Google and Amazon. At that time, I vaguely thought this was a new technological development trend, even with a large technological paradigm shift, more and more energy has been invested, including collecting, reading, and sorting related technical materials in different categories, use your spare time to start a chapter to write a book slowly, and apply these technologies and systems as much as possible in actual work. It took about three years to write the book intermittently, which is quite consistent with the initial estimate. This is because I have not invested much time, however, I still want to write a high-quality technical book, so I have to work hard. On the other hand, big data processing, as a new field, involves too many technical points, and is in a rapid development process. This field covers the widest range of knowledge in my field, starting from the underlying hardware, it involves basic theories, large-scale data storage systems, distributed architecture design, differentiated system design ideas for different application scenarios, machine learning and Data Mining parallel algorithms, endless new architectures and new systems., it is a bit exaggerated to say that everything is at a glance, but it is rare to grasp a lot of knowledge points. In addition, because of its rapid development, various technologies are complicated, and no mature knowledge classification system is available for reference, we need to constantly sort out the differences and connections between related knowledge points and classify them. It has been a headache for me to organize the varied technologies into clear and reasonable chapters, the context of the entire big data technology system has become increasingly clear. I believe you can clearly find this point by referring to the contents of this book. Another obstacle is that there are many materials and systems that can be referenced, and the quality is uneven. we need to eliminate the disadvantages and optimize the systems, and select theories, solutions, and systems that are representative and have potential for development as much as possible, this is also a very labor-intensive process. Although each chapter of this book only lists a few references, the actual reference documents and systems are several times the same, listing only the essence is to shorten the process of selecting high-quality documents. When I was still studying at the Chinese Emy of Sciences more than a decade ago, I felt deeply about the vigorous development of the Internet and its impact on life and work. At that time, the most intuitive feeling was that the latest international conference papers were very easy to obtain, and often the documents of interest could be downloaded from the Internet after the meeting was over, sometimes some authors leave their papers on the Internet for reference without opening a meeting. I think that with the powerful global information sharing tool such as the Internet, even though the domestic research level was not high at that time, it was still difficult to publish papers at the best international conference, but with the popularization of the Internet, scientific research standards should be able to achieve extremely rapid development, because from the perspective of tracking the latest technological progress, we can see that the starting line is the same, and the advantages of the Chinese people should gradually be able to play out. In fact, this is also true. In recent years, the proportion of papers published by Chinese people in various top international conferences has become higher and higher, which proves that this trend will be further accelerated. This is because the R & D principles of big data-related technologies are the same. Although the domestic strength in this field is far from that of other countries, excellent systems and technical solutions are often proposed by well-known Internet companies such as Google, Amazon, Facebook, and LinkedIn, most of the technical standards in the domestic industry are still at this stage of applying open-source big data systems to solve the problems encountered at hand, but I believe that in the near future, domestic big data systems and solutions with international standards will gradually emerge, with the same principle as the above example of academic progress. At present, most of the excellent systems are open-source and relevant technical documents are easily found. As enterprising technical personnel, what is lacking now is not the lack of learning materials for reference, on the contrary, there are too many materials, but they make many people feel confused and do not know how to start. As long as Chinese technicians are willing to work hard and have a good career development environment and high self-expectations and technical ideals, more and more world-class big data processing systems come from the hands of the Chinese, and they can be exactly around the corner. I hope this book will contribute a little to these technical staff who fully understand and master the excellent technologies of big data processing. Shen Li also participated in part of the compilation of this book. 8. The Editor thanked my wife, father-in-law, mother-in-law, father-in-law, and mother-in-law for completing the book in three years. But without your full support, this book cannot be published in 2020. By the way, my daughter, dear Xue qing, has taken the first step in his life and said the first sentence, I always think of you, standing in this novel world with confusion and helplessness is always distressing. The first time in your countless lives has brought me too much joy, and I spent too little time with you. Every time I hear you knock on the closed door of the study with a small hand, I often feel confused. I don't know if I am right or wrong for you, but I have to feel down. I believe that in the future, I will regret not giving you more time. I will introduce you to this colorful and cruel world and tell you to treat you well in the future. I want to accompany you until you have white-haired years: in the midday sunshine of the early autumn, I saw you sitting in the dark tree of my mother-in-law, telling stories to your children and grandchildren. I still smile, just like today. I can clearly imagine the scenario at that time, just as everything is happening. Although I deeply know that this is an everlasting Hope, this dream will always be in my heart, just like your clear eyes and pure smile. Time is our friend and our enemy. I hope you will forgive me. Zhang junlin, in Beijing, March 2014
Overview of big data: Architecture and Algorithm