Distributed Basic Learning
The so-called distributed, here, very narrowly refers to Google's Troika, GFS, Map/reduce, BigTable as the core of the framework of distributed storage and computing systems. People who are usually beginners, like me, will start with Google's several classic papers. They outline a distributed storage and calculation of a basic blueprint, can be a glimpse of its charm, but after all, because of the lack of some code and examples of implementation, the color is somewhat mottled, lack of a point of sensibility. Luckily we also have Open Source and Hadoop. Hadoop is a Java-based, Open-source, distributed storage and computing project. As one of the most prestigious open source projects in the field, its users are big names like the cloud, including Yahoo, Amazon,facebook and so on (well, there may be a school, but this really does not mean anything ... )。 Hadoop itself, which implements distributed file system HDFs, and distributed Computing (Map/reduce) frameworks, in addition, it's not a man in combat, Hadoop contains a series of extension projects, including a distributed file database HBase (corresponding to Google's BigTable), Distributed Collaborative Service zookeeper (corresponding to Google's chubby), etc...
So, a looking good gold partner surfaced, Google's thesis + Hadoop implementation, follow the framework of the paper to see the specific implementation, with the implementation of further understanding of the logic of the paper, looks at least beautiful. There are many predecessors, have done Hadoop related source analysis work, I focus on the most is here, the current blogger has completed the analysis of the HDFS, map/reduce analysis is hot, update the frequency of high, detailed analysis, are rare to see, so, You must not miss the passing through. In addition, there are a lot of Hadoop followers and users posted related articles, such as: here, here. You can also go to the Chinese site of Hadoop (I don't know whether it's civil or official ...). ), collecting some learning materials ...
I personally benefited from the above information, and I have to do the collation, and the original source analysis of some different, not in accordance with the implementation of the module, but based on the context of the paper and the implementation of the basic context of the system to carry out, also calculate, from another angle to give some things. In view of the individual's understanding of the distributed system is very shallow, lack of sufficient practical experience, in-depth questions do not swim, and only to do carding and analysis, Daniel so far, can be a detour to the line ...
A. Distributed File system
Distributed file system, in the entire distributed system in the lowest level of the most basic position, storage, no data, and then a good computing platform, and then perfect database system, have become a boat without water. So, what is the Distributed file system, as the name implies, is distributed + File system. It contains both aspects of the content, from the point of view of the file system customer use, it is a standard file system, provides a series of APIs, so that files or directories to create, move, delete, and file read and write operations. From an internal implementation perspective, distributed systems are no longer responsible for managing local disks as normal file systems, and their file contents and directory structures are not stored on local disks, but are transmitted over the network to remote systems. And, the same file storage is not only on one machine, but on a cluster of machines distributed storage, collaborative services, is called distributed ...
Therefore, consider a distributed file system implementation, in fact, it may be from these two aspects to analyze, and then one. First of all, see how it to achieve the file system required basic additions and deletions to check the function. Then, see how it considers the characteristics of distributed systems, providing better fault tolerance, load balancing, and so on. The two together, we understand a distributed file system, the overall implementation of the model ...