Written at the beginning of 2012, retrospect and Prospect

Source: Internet
Author: User
Tags zookeeper
2011 has passed and 2012 has begun, and I would like to review the past and look ahead in this article: So far, the work has been for half a year, in the NetEase Hangzhou, the six months of harvest, has grown a lot, as a technician, engineers looking back over the past year and thinking about the next year when thinking more is with the industry, Technology-related things, a little boring, a little messy. 1. Two months of internship in the past year, work six months, in the enterprise to realize the real development (relative), NetEase hang research cattle a lot (through their knowledge of things than the university four years more), for their professional experience mature a lot of work, the main content of the Distributed file system and repair bugs, I was in college and I started to get in touch with distributed and is probably in a sophomore when the beginning of contact, then is to do the distributed file system, then feel that this thing is difficult, can not be expected, but in this six months time, I have a thorough understanding of the entire file system, and also become their main work, It also has a deep understanding of it. Now it seems that the university needs to look up in awe, has become a part of my life, and towards a deeper step forward, including a variety of different applications for the implementation of the file system, the pros and cons are clear to the heart. And when you get enough of it, start thinking about things that go a little further: large data access, high concurrency, low latency, distributed, Cap,nosql,rdbs, and so on. Now that big data has become a trend, to solve the problem of high concurrent access with large data and low latency requirements has become a common problem faced by the industry, and the solutions to these problems cover almost all aspects of the computer, the volume of data is increasing, the emergence of SSDs completely subvert the traditional disk IO characteristics, The relational database, which is the ultimate in traditional disk IO, has almost supported the entire computer industry in the past few decades, with its memory page cache, log system, checkpoint,b+ tree structure, index, and so on, all designed to be optimized around the traditional mechanical disk IO, But the emergence of SSDs completely overturned everything, no longer unbearable to the head of a move, there is no random read and write the huge gap between reading and writing, and the SSD price will be further reduced this year, further popularization, and comprehensive SSD application will subvert the traditional rdbs,rdbs is going to be a huge change, and has been in the laboratory of PCM Storage has also made a huge breakthrough, technical problems have been solved, the volume production may be very fast, it has a lower cost than SSD, longer life and close to the memory of access speed, the IBM laboratory said about 5 years or so will be able to enter the mass production, this to the traditional disk subversion is huge. In one word: Memory is disk, disk is tape. In fact, memory should be said to be a new generation of solid storage, we are in an era of change, the past few decades, the computer theorem will be broken, the rule of data for decades of Rdbs status is constantly shaken (nosql), there are onlyOne: The computer is becoming more and more integrated into our life, today, the computer into each of our lives, we are constantly producing a variety of data, which is collected by the computer processing, now we produce annual data equivalent to the last 20 years of the sum of data produced, In this case the Rdbs face of such a large and unformatted amount of data is a bit weak, so the nosql springing up. At the same time, the production of large amount of data also touched the development of storage equipment, low latency access, high transmission speed of the demand under the birth of SSD, and the next generation of PCM. In such a subversive era, it is inevitable that the hero will also produce heroes, do not know who can stand on the top of the tide of data, this is a data for the king of the era. 2. To pay tribute to Google here, perhaps a lot of people do not like Google, because the promotion of openness, he did not see too much of his own core system open source, we can see only a few of the castles in the castle, but these papers are great degenerate, Because of these papers I want to salute the great Google, maybe he didn't open his own core system, but several papers he published in 2004 and in the next few years set off a wave of data that might end up in the data wave, but these papers will be the cornerstone of the next generation of data. Before Google, distributed systems were a mysterious and daunting beast for people outside the big business, but with the advent of the first three papers, where the cornerstone of distributed file systems sprang up, people jumped out of the traditional POSIX distributed file systems (such as NFS, These systems are widely used and indeed great, but they are not so good for a variety of reasons, and MapReduce solves the lack of tools that people face in large data processing. And BigTable is opened the prelude of NoSQL. These papers were implemented in 3, 4 years by Hadoop one by one, so that they were so close to our ability to touch the entire industry that almost all went crazy. And among those about Chuby is easy to ignore, including their Hadoop version of Zookeeper, because the things mentioned above can be said to be backstage, and zookeeper can be said to be backstage backstage, he hid in these after it is difficult to find, But the one who really knows him will know his greatness, he solves many of the problems in a distributed environment, contributes to the availability of the entire distributed system and the consistency of data, and lets these developers who are engaged in distributed development to solve big data problems bring a sharp weapon, Zookeeper's code is very short, the core two things Paxos and two phase of the submission of the variant, but so refined code and the problem he solved is a memorable sweet, to Paxos and Chuby and zookeeper salute. And it all came from GOOGL.E's paper, I can not help but adore this leader. You might say that true Google's core algorithms are not published, but those are very mathematical things, close to philosophy, these are engineering things but they give us a huge response, back to the past, these are the current data age of the cornerstone and source, Google may not have finally stood on top of the data, but at the beginning of the war he won and saluted again. 3. In half a year's work, due to the needs of the work (these work involves all aspects), so that I have a deeper understanding of things: the operating system at the bottom (file system, process, memory, IO, etc.), hardware characteristics, the complex environment of various problems, Compilation principle (in doing a server internal metadata to the integration of SQL, perhaps everyone to sneer at this ...) And so on, to make me realize all the aspects of the real system, and more deficiencies, but also so that I really understand the so-called computer, language is the most important factor (JS can also write JVM), what you are engaged in, in the real system needs to understand all aspects, disk IO random Read and write and the huge gap in sequential reading and writing, Due to the CPU cache (multilevel cache) and the various calculations generated by the huge difference in intuitive results, the huge memory-consuming operating system cache that is seen on a large memory server makes it a great optimization for file reading and writing, with various asynchronous models and lock-free algorithms (mainly CAs, the main view application, in some systems (environment Software and hardware) can bring a huge increase, but the system will drag the system to bring about the efficiency of the improvement, due to the great changes in hardware development, such as cow (in fact NoSQL's core idea is cow, in fact, in the SSD can also see Cow) and so on. I think of ebay's summary of the system: asynchronous everything you can see, minimize all unnecessary operations and make the system as simple and clear as possible. 4. In the next year will do, this six months to understand the subversion and shock me, know more and feel their own ignorance, perhaps in the next year my theme is only one: to understand what you can see, learn, learn, learn. 5. In the new year obsessed with technology, there is a good learning environment, can continue to follow the trend of the Times (technology), meditation, meditation, do more. Madness not survive ...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.