[Erlang 0060] Notes on Joe Armstrong's paper building a reliable distributed system for software errors

Source: Internet
Author: User
I read two papers over the weekend: "on designing and deploying Internet-scale services" and Joe Armstrong's paper "building a reliable distributed system in the face of software errors ", there are quite a lot of practical contents in these two papers. I will take notes here and forget them. on Designing and deploying Internet-scale services [HTML] English version "Making reliable distributed systems in the presence of sodware errors" [PDF] Chinese version "Building Reliable Distributed Systems for software errors" [PDF] Architecture Definition
An architecture is the set of SIGNI implements cant decisions about the organization of a sodware system, the selection of the structural elements and their interfaces by which the system is composed, together with their behaviour as speci already ed in the collaborations among those elements, the composition of these structural and behavioural elements into progressively larger subsystems, and the specified tural style that guides this Organization-these elements and their interfaces, their collaborations, and their composition.
Booch, Rumbaugh, and Jacob [19]
In the architecture definition I have come into contact with, the definition mentioned in Joe Armstrong's paper is quite pragmatic and has practical guiding value; first, the consensus with Martin Fowler and others on the architecture is that the architecture is an important decision for a group of system organizations. The decision includes the components of the system, interfaces between elements, and collaboration between elements; it is a synthesis method that builds these structures and behavior elements into a larger subsystem. It is also a construction style. Under its guidance, it refers to the element and element interface, the element collaboration and merging methods are organized.

The abstract concept of the architecture can be mapped to a specific practical process from six specific aspects. At this time, we have dropped from a thousand miles to land. 1. problem area-what type of problem is our architecture designed to solve? The software architecture must not be generic, but designed to solve a specific type of problems. The architecture that lacks the description of which problems to solve is incomplete. Notes: Similarly, the lack of contextual descriptions of programming languages is nothing more than a fight between Guan Gong and Qin Qiong. Some "Technical Experts" have been in place, it is hard to apply the design scheme of the previous project to the current project. This is not as simple as laziness, but lacks the professional attitude of being responsible for the project. even worse, once the project fails, it leaves the house and leaves a bad project to the team. The damage to the team takes a longer time to fix it.

2. Philosophy-what is the principle behind the software constructor? What is the core idea of architecture? Notes: After solving the problem or completing a design, we must review the solution ideas and design principles. The specific technical details may not be available, however, both thinking and design methods can be reused. when reading open-source projects, learning language skills is one aspect. More importantly, it depends on the problems solved by this project, how to solve them, and the overall idea.

3. Software Construction guide-how do we plan a system? We need a clear software build guide set. our system will be written and maintained by a programmer team-so it is important for all programmers and system designers to understand the system architecture and its potential philosophy. from a practical point of view, this knowledge is presented in the form of a software constructor guide for easier maintenance. A complete software constructor set includes a set of programming rules, sample programs, and training materials.Notes: I am deeply touched by this. Currently, some projects such as playstationsuite SDK use C # As scripts, however, you will find that although these projects are in the same C # language, both Class Library organization and naming rules have their own rules and styles. 4. pre-defined parts-designing with a "select from a set of pre-defined parts" is far easier than designing from scratch. the OTP library of Erlang contains a complete set of ready-to-use parts (called behaviour Library), which can be used by some common systems. for example, behaviour such as gen_server can be used to build a client-server system. behaviour such as gen_event can be used to build event-based programs.Notes: To learn a language, an important part is to master the class libraries provided by the language. On the one hand, the class libraries can be directly used without repeated efforts; on the other hand, you can learn the programming style of this language from the class library.

5. Description Method -- how do we describe the interface of a certain part? How do we describe the communication protocol between two components in the system? How do we describe the static and dynamic structures in the system? To answer these questions, we will introduce some special symbols, some of which are used to describe the program's APIs, while others are used to describe the protocol and system structure. Notes: To learn a new kind of knowledge, the threshold comes from concepts and symbol systems. Both of them simplify information and contain a large amount of information, it is hard to understand without additional information support. when we first came into contact with a new kind of knowledge, we could not even correctly describe the problems we encountered because we did not understand the terms and concepts in this field.
6. configuration method-how do we start, stop, and configure our system? Can we reconfigure the system during work? Notes: Programming to complete the business functions is not everything, but also consider the configuration, deployment, consider the timing of these issues, even at the beginning of the design should be considered; for different hardware devices, the configuration is also different; for routine maintenance, you must prepare some scripts for O & M. erlang fault isolation in order to build a fault-tolerant software system that still has reasonable behavior when there is a software error, Erlang's solution to the problem is as follows: 1. task hierarchy, and try to complete the top-level Task 2. if an error occurs during task completion and an error cannot be corrected, immediately cancel restarting the new task. The most essential problem is fault isolation, the operating system uses the process concept to implement fault isolation. The process provides a protection area. A process error does not affect the operation of other processes. different applications written by different programmers run in different processes. Errors of one application do not have any side effects on other applications running in the system. of course, this option meets the initial requirements. however, because all processes use the same CPU and physical memory, when different processes compete for CPU resources or use a large amount of memory, other processes in the system may have a negative impact. the degree of conflict between processes depends on the design features of the operating system. erlang processes and concurrent programming are part of the language, rather than provided by the host operating system. Erlang applications are built through a large number of parallel processes that communicate with each other. The reason for this is: Infrastructure-- We can use a group of processes that communicate with each other to organize our system and define a channel for message passing between processes, we can easily divide the system into well-defined sub-parts, and implement and test these sub-parts separately.

Huge potential efficiency-- The system is designed to be implemented by many independent parallel processes, which can be conveniently implemented on a multi-processor or a distributed Processor Network. note that this improvement in efficiency is only a potential improvement. Only when applications can be broken down into many truly independent tasks can they be put into effect. if there is a strong data dependency between tasks, this improvement is often impossible.Fault Isolation-- Concurrent processes without data sharing provide a powerful fault isolation method. software errors of a concurrent process do not affect the running of other processes in the system. among the three concurrent operations, the first two are not essential features. A built-in scheduler can provide different pseudo-parallel (pseudo-parallel) between processes) time Division. the third feature isNature. Each independent activity is executed in a completely independent process. these processes do not share data and communicate with each other only through message transmission, which limits the impact of software errors.

Once any public resources, such as memory, pointer to memory, or mutex, are shared between processes, a software error in a process may cause damage to shared resources. because eliminating such software errors in large software systems is still an unsolved problem, I think the only practical way to build a large reliable system is to break the system into many independent parallel processes, it also provides some mechanisms to monitor and restart these processes. In a sense, the operating system provides something that has been forgotten by Programming Language designers ". however, in programming languages such as Erlang, the operating system is almost unnecessary. what the OS actually provides to Erlang is only some device drivers, and the mechanisms provided by the OS, such as processes, message transmission, scheduling, and memory management, are not required. it is quite easy to write such a concurrent program for the concurrent programming language copl. It has three steps:
1. Identify real concurrent activities from real-world activities
2. Identify all message channels between concurrent activities
3. write down all the messages that can flow in different message channels. It is worth noting that the concurrency provided by copl must be true concurrency. Therefore, objects in the form of processes are truly concurrent, inter-process message transmission is also a real asynchronous message, which is impersonated by Remote Procedure Call, unlike in many object-oriented languages.
Erlang world view Overview

• Everything is a process.
• Processes are strongly isolated.

• Process Creation and destruction is a lightweight operation.

• Message Passing is the only way for processes to interact.
• Processes have unique names.
• If you know the name of a process you can send it a message.
• Processes share no resources.
• Error handling is non-local.

• Processes do what they are supposed to do or fail.

Mind Map

Erlang

View more documents from ligaoren

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.