Learning materials for architecture-related fields

Source: Internet
Author: User
Tags value store
Document directory
  • 2. virtual machines
  • 3. Design Revisited
  • 4. Programming Model
  • 5. distributed algorithms
  • 6. Overlay networking, and P2P DHT
  • 7. Distributed Systems
  • 8. Controversial Computing Models
  • 9. Debugging

Engineers often encounter growth bottlenecks after a certain stage. To break through this bottleneck, you need to learn more in the technical field, understand the nature of the problems in this field, methodology and design concepts, and the development history. The following are some learning materials for architecture-related fields, and some simple comments are provided for reference by interested engineers. I hope you can learn more about the system design prinles les through understanding and learning about these fields, and step into the realm of freedom in your work.

1. Operating Systems

Mach[Intro: http://www-2.cs.cmu.edu/afs/cs/project/mach/public/www/mach.html, Paper: http://www-2.cs.cmu.edu/afs/cs/project/mach/public/www/doc/publications.html]

In traditional kernel implementation, interrupt response is implemented in a "big function. The reason for calling a big function is that the process from the interrupt entry to the exit is the same control flow. When there is an interrupt re-entry, the implementation logic will become very complicated. Most operating systems, such as UNIX, use the monolithic kernel architecture.

The Mach project, which began in 1985, proposed a new microkernel structure, which made the academic community, who felt that there was no way to follow since the development of UNIX in 1970s, suddenly found a Failover point, it also began a heated debate between monokernel and microkernel.

Let's play a note: Richard Rashid, the leader of Mach, was a professor of CMU. He was commissioned by Bill Gates to lobby Jim Gray to join MS. As a result, I was moved in and set up Microsoft Research. He has made several 21-Century Computing keynotes in China.

Exokernel[Intro: http://pdos.csail.mit.edu/exo/,Paper: http://pdos.csail.mit.edu/PDOS-papers.html#Exokernels]

Although the microkernel structure is good, it is not widely used in practice. Because of its poor performance, we gradually find that the OS is not due to the complexity of implementation, it is more about how to improve the flexibility of the application to use resources. This is why the debate on OS kernel architecture gradually fades out of sight after the emergence of kernel extension (such as loadable module in Linux.

Exokernel appears in this context. It does not provide typical actions (such as process and virtual memory) for traditional operating systems, but focuses on resource isolation and multiplexing (resource isolation and multiplexing ), proposed by MIT. On top of exokernel, a library named libOS is provided to implement interfaces of various operating systems. This structure provides the maximum flexibility for the application, so that different applications can or focus on the fairness of scheduling or real-time response, or focus on improving resource usage efficiency to optimize performance. From today's perspective, exokernel is more like a virtual
Machine monitor.

Singularity[Intro: http://research.microsoft.com/ OS /Singularity/, Paper: http://www.research.microsoft.com/ OS /singularity/publications/hotos2005_broadnewresearch.htm]

Singularity emerged in the early 21st century of virus, where spyware was inexhaustible and endless, and was proposed by Microsoft Research. Both academia and industry are discussing how to provide a trust-worthy computing environment and how to make computer systems more manage-ability. Singularity believes that to solve these problems, the underlying system must provide hard isolation, and the hardware virtual memory mechanism that people previously relied on cannot provide high flexibility and good performance. After runtime such as. Net and Java appeared, a software-level solution became possible.

Based on microkernel, Singularity builds a set of type-safed assembly as ABI through. Net, and defines the message passing mechanism for data exchange, which fundamentally prevents the possibility of modifying Isolated Data. In addition, the application security check is added to provide a controllable and manageable operating system. Thanks to the continuous optimization of. Net CLR and the development of hardware, the performance loss of Singularity after these checks is still acceptable compared with the excellent features provided by it.

At present, this design is still in the lab stage. If it is possible to win, there is still a UNIX opportunity for that year.

2. Virtual Machines

VMWare["Memory Resource Management in VMware ESX Server", OSDI '02, Best paper award]

Vmware, which is familiar to users, does not need to be said.

ZEN["Xen and the Art of Virtualization alization", OSDI '04]

Excellent Performance VMM, from Cambridge.

Denali["Scale and Performance in the Denali Isolation Kernel", OSDI '02, UW]

The application level virtual machine designed for internet services can run thousands of VMs on normal machines. Its VMM is based on isolation kernel and provides isolation, but does not require absolute fairness in resource allocation to reduce performance consumption.

Entropia["The Entropia Virtual Machine for Desktop Grids", VEE '05]

To use the company's desktop machine resources for computation in a unified manner, computation tasks need to be well packaged to ensure that normal use of machines is not affected and user data is isolated. Entropia provides such a computing environment and implements an application level virtual machine Based on windows. The basic practice is to redirect the syscall called by the computing task to ensure isolation. Similar work also involves FVM: "A Feather-weight
Virtual Machine for Windows Applications ".

3. Design Revisited

"Are Virtual Machine Monitors Microkernels Done Right ?", HotOS '05

This question sounds very confusing. It means that VMMs is actually the correct implementation method of Microkernel. VMM and Microkernel are discussed in detail, which is an excellent reference for understanding these two concepts.

"Thirty Years Is Long Enough: Getting Beyond C", HotOS '05

C may be the most successful programming language in the world, but its shortcomings are also very obvious. For example, if thread is not supported, it seems a little inadequate in today's highly parallel hardware structure, and this aspect is the strength of functional programming language. How can we combine the advantages of both, is a very promising field.

4. Programming Model

"Why Threads Are a Bad Idea"

It is difficult for a server with a single thread structure to achieve high performance because of memory usage, switching overhead, synchronization overhead, and programming complexity brought about by ensuring the correctness of the lock.

"SEDA: An Architecture for Well-Conditioned, Scalable Internet Services", OSDI '01

Thread is not good, but event cannot solve all the problems, so we are looking for a combination of methods. Seda splits the application into multiple stages. Different stages are connected through queue. In the same stage, multiple threads can be started to execute events in the queue, and the number of threads can be automatically adjusted through feedback.

Software Transactional Memory

If the memory can provide the transaction semantics, the world we face will be completely different. Language, compiler, OS, and runtime will all be fundamentally changed. Although intel is currently working on hardware transactional memory, it is estimated that it will not be commercially available in the foreseeable future, so people turn to software solutions. It is conceivable that this scheme cannot be implemented on native assembly. Currently, C #, Haskell, and other languages are available. For more information, see Wikipedia.

5. Distributed Algorithms

Logical clock, ["Time, clocks, and the ordering of events in a distributed system", Leslie Lamport, 1978]

This is a classic paper about logic clock, time stamp, and distributed synchronization.

Byzantine["The Byzantine generals problem", Leslie Lamport, 1982]

There are various errors in distributed systems, which can be stopped if there is an error. If there is an error, it will be slowed down. What's more serious is that a mistake will cause malicious behavior. The final malicious behavior is like a general rebellion, which will have a serious impact on the system. For such problems, Lamport proposes Byzantine failure model. For a state machine consisting of 3f + 1 replica, as long as the number of reverse replica is less than or equal to F, the entire state machine still works normally.

Paxos["The part-time Parliament", Leslie Lamport, 1998]

How to achieve consensus in an asynchronous distributed environment is the most fundamental problem in Distributed Algorithm Research. Paxos is the peak of such algorithms. But this paper is too difficult. It is said that 3.5 people around the world can understand it, so Lamport later wrote a popular version of paper: "Paxos Made Simple", but it is still difficult to understand. For more information, see"
ABCD's of Paxos (PODC '01), the description of the replicated state machine will seriously inspire your understanding of the nature of the parallel world. This is the strength of the Turing Award.

The name Leslie Lamport has appeared repeatedly. He has been digging holes in the field of distributed computing and eventually becomes a master. There are also a few anecdotes about him. I remember that he wrote this on the MSR homepage. "When I was studying logical clock, Bill Gates was still wearing in diaper )..." (The original text cannot be found now ). In addition, when he was writing paper, he liked to change the names of other cool people into the orchestration. This may be why he hasn't received the Turing Award.

For more information about Lamport's achievements, see the paper presented to his 60-year-old birthday: "Lamport on mutual exclusion: 27 years of planting seeds", PODC '01.

6. Overlay Networking, and P2P DHT

RON["Resilient Overlay Networks", SOSP '01]

RON describes how to build an overlay at the application layer to provide a second-level WAN network layer fault recovery speed. The existing routing protocol to restore communication takes at least several minutes. This fast recovery feature and flexibility make overlay networking widely used.

Application Level Multicast

"End System Multicast", SigMetrics '00

"Scalable Application Layer Multicast", SigComm '02

A lot of ALM paper describes how to build a mesh network for robust transmission control information. In addition, a multicast tree is created to efficiently transmit data, then, we will make some layered delivery based on the characteristics of multimedia data. Systems such as cool stream and pplive that appeared in the past few years are commercial products of these systems.

P2P

The emergence of P2P has changed the network. According to the structure of various P2P networks, there are three types.

1. Napster-type, centralized directory service, data transmission Peer to peer.

2. Gnutella queries by gossip between neighbors, also known as unstructured P2P.

3. DHT, which is different from unstructured P2P, is that the query performed by DHT is guaranteed. If the data exists, it can be returned within a certain hop number. The hop number is usually logN, and N is the number of system nodes.

Typical DHT types include CAN, Chord, Pastry, and Tapestry. These research focuses on the algorithm layer, and the main work of the system is to establish a WAN storage system on it. Some people also conduct research on mechanisms, such as how to encourage users to share and prevent cheating.

7. Distributed Systems

GFS/MapReduce/BigTable/Chubby/Sawzall

Google's paper series are quite familiar to everyone. This can be queried.

Storage

There are too many papers in the Distributed storage system. The following lists the most relevant articles.

"Chain Replication for Supporting High Throughput and Availability", OSDI '04.

"Dynamo: Amazon's Highly Available Key-value Store", SOSP '07.

"BitVault: a Highly Reliable Distributed Data Retention Platform", sigops osr '07.

"PacificA: Replication in Log-Based Distributed Storage Systems", MSR-TR.

Distributed simulation

"Simulating Large-Scale P2P Systems with the WiDS Toolkit", MASCOTS '05. The interesting thing about Distributed simulation is that the simulated protocol is distributed, and the simulation engine itself is also distributed. The time and event of Logical and physical must be processed carefully in the system.

8. Controversial Computing Models

Today's software systems have become more complex than humans can grasp. Many systems are still released with many deterministic or non-deterministic buckets, only continuous patches are supported. Since, as humans, the precision of the features determines that we cannot fix System bugs, we can only start from other perspectives on a way to make the system work in this frustrating environment. This is like a distributed system, which cannot be avoided. We choose to make the system as a whole to provide high reliability.

The following three are typical examples. Basically, the main research focuses on 1) how to properly Save the status; 2) how to capture errors and restore the status; 3) how to ensure that the overall recovery is not affected during unit-level recovery.

Recovery Oriented Computing

Failure oblivious computing, OSDI '04

Treating Bugs as Allergies, SOSP '05

9. Debugging

The system is very complex, and humans cannot directly analyze it logically. They can only observe it at a macro level through the data mining method.

Black box debugging["Performance debugging for Distributed Systems of black boxes", sosp '03]

Performance debugging for large systems is very difficult, because many of the problems are uncertain and cannot be reproduced. You can only mine logs to find the matched call/message to locate the problem.

CP-miner["A tool for finding copy-paste and related bugs in operating system code", osdi '04]

Many people use copy-paste when reusing code. However, some simple CP may cause serious problems, such as the duplicate names of local variables. CP-miner analyzes the code, creates a syntax tree structure, and then mines this type of error.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.