Erlang in the smoke--a guide to diagnosing, debugging and solving Erlang production system problems

Source: Internet
Author: User

English formerly:Stuff Goes Bad:erlang in Anger

English Author:FRED hebert Download Address: http://vdisk.weibo.com/s/iGQ-rFuJU0-4 Translator Preface


In my nearly 20 years of software development, I have used many other programming languages in addition to Erlang. There are work needs of C + +, Java, but also as a hobby to use Lisp, Haskell, Scala, etc., of which my favorite is Erlang. In addition to my telecom software development background, there is also a very important reason for Erlang's unique design philosophy and problem-solving approach.

You hear about Erlang, often because of its good support for high concurrency. In fact, the core feature of Erlang is fault tolerance, and in a way, concurrency is just a byproduct of the constraint of fault tolerance. Fault tolerance is the DNA of the Erlang language and the essential difference from all other programming languages.

We know that the most important part of software development is the handling of errors. All other programming languages focus on "defense", with powerful static types, static analysis tools, and a lot of testing to find all the errors before the software is deployed to the production environment. Erlang's focus is on "tolerance", which allows the software to run with errors, but it provides mechanisms and tools for handling errors. Borrowing the software system as a human analogy, other programming languages focus only on sanitation to prevent illness, while Erlang provides an immune system that allows the virus to invade, enhance the immune system and improve survivability by fighting against the virus.

The impact of this difference on software development is fundamental. As you know, for the development and maintenance of large-scale systems, the most feared is the inability to control the impact of changes. We hope that each change is best affected only by one place, and we do this through good modularity and abstraction. But if this change is unfortunate enough to escape static checks and tests, and there is a problem at run time, even if the changes are localized at the static level, the whole system will crash. In Erlang, not only can the static level of the change isolation, but also can be run-time error isolation [1], let the runtime error localization, which greatly reduces the risk of software release, deployment.

Powerful concurrency support is also one of the features of Erlang, which is often emulated in other languages. However, there is a fundamental difference between Erlang and imitators: Fair dispatch. [2] In order to be fair, Erlang was "unscrupulous" and "outrageous". Why bother to do the work? For a high-concurrency system, soft real-time, low latency, and responsiveness are often desirable goals, but also a difficult task. In particular, when the system is overloaded, it is desirable to have consistent, predictable service degradation capabilities. While Fair dispatch is the best way to achieve these goals, Erlang is also the only language that is currently being fairly dispatched in parallel.

Because Erlang's uniqueness in fault-tolerant and concurrency-fair scheduling, it can be said that Erlang has been imitated over the years, but never surpassed.

In a sense, Erlang is not only a programming language, but also a system platform. It not only provides support for the development phase, but also provides powerful support for the operational phase that is not available in other languages. In fact, the problems found in the static and testing phases are often "less interesting" problems, and those that escaped are really hard to deal with. [3] In particular, concurrency and distributed bugs are often difficult to detect by static inspection and testing, and traditional debugging methods are not effective. Erlang provides a powerful way to diagnose, debug, and resolve runtime problems. Using Erlang's remote shell, tracing, introspection mechanism, and powerful concurrency and fault-tolerant support, we can drill down into the system to diagnose, track, and remediate problems as the system works. Even "highly invasive" surgical procedures are performed online when needed. Once you have solved a difficult problem in this way, you can never leave it again. If you want to choose between static type and this ability, I will not hesitate to choose the latter [4].

For the problem with Erlang [5], there are two most mentioned: one is lack of static type support and the other is performance problem. Erlang is a dynamic type language that is often considered unsuitable for architecting large systems. I myself also like the static type very much. A powerful static type system not only greatly improves the readability of the code, but also provides us with a powerful framework for thinking and designing at the logical level, as well as allowing the compiler, IDE, and so on to get deeper code structure and semantic information to provide more advanced static analysis support.

[6] However, I have some different views on building large systems. If the Internet is the largest system at present, I believe no one will oppose it. So what is the reason for such a huge system to be built up? Obviously not because of static types, the root cause lies in how the system is organized and interacted. Each part of the Internet is an entity that is isolated from each other, communicates with each other through well-defined protocols, and the failure of one component does not cause problems for other parts. This approach is isomorphic to the design philosophy of Erlang. Each Erlang system is a small Internet system, each process corresponding to a host, interprocess messaging protocol, a process crash will not affect other processes .... Erlang's philosophy of Design: crash-oriented and protocol-oriented are the best way to architect large systems [7].

Now, of course, you can have your cake and eat it. Erlang already supports rich static type definition and labeling functions [8], and can be type-deduced and static-checked through the Dialyzer tool.

Again, the performance issue. In compute-intensive areas, Erlang does not perform very high [9]. So, if you are writing a tool program that requires a lot of computation, then Erlang is not appropriate. However, if the part involved in the calculation is only a local module in the system, it is much more difficult to solve the design problems of the system level: concurrency, distributed, scaling, fault tolerance, short response time, online upgrade, debugging operations and so on, then Erlang is the best choice. At this point, you can use Erlang as a tool to solve these system-level challenges, and local computing hotspots can be done in other languages (such as C language) and even hardware. Erlang offers a variety of ways to integrate with other languages and hardware, making it easy for you to choose from your needs (security, performance).

I have developed a WEBRTC real-time Media gateway [10] in the previous period, using the Erlang +c scheme. Part of the media processing is written in C, interacting with NIF and Erlang, and the system-level challenges are given to Erlang. System on-line for several months, the number of users reached millions of. During the period, the system is stable, the expansion is convenient, the processing performance is also good (especially when the high load service degradation is satisfactory). Not that the use of other languages is impossible, but the effort is more than 10 times times [11].

There are also many books on Erlang available on the market, almost all of which focus on basic Erlang syntax and design methods. [12] There is basically no book on how to solve problems in the Erlang production system (for example, overloading, memory leaks and fragmentation, CPU over-occupancy, etc.). This book can be said to fill the gap in this area.

[13] Author Fred Hebert is experienced in building and operational large Erlang systems, and is good at writing. In this book, he not only share his years of practical experience in the design, problem diagnosis, debugging and the solution of sharing to everyone, but also on Erlang's memory and scheduler working principle and strong running time monitoring and introspection ability to do in-depth analysis and introduction. What is more commendable, he also put his own solution to the problem of tools, easy for everyone to use.

After reading this book, readers can have a deeper understanding of Erlang's design philosophy and virtual machine working mechanism. Having mastered the knowledge and tools of the book does not guarantee peace of mind when it comes to the operation of Erlang production systems, but at least knowing where to start, how to collect data, and how to position the problem in the face of difficulties, lay a solid foundation for a real solution to the problem at last.

Finally, if you find any questions in the translation, please write (dhui@263.net). I hope you enjoy your reading.


Aaron

2014.11 in Shanghai

[1] The process with the help of the operating system can also be run-time error isolation, but the granularity is too large, too heavy.

[2] To enable efficient scheduling across the OS, Erlang gave up time slices based on the reduction approach. Reduction is counted almost everywhere in the system to achieve the goal of fair Dispatch.

[3] For example, a competition-related bug, once added a breakpoint, the competition may not appear.

[4] In fact, Erlang now supports type definition, labeling, and derivation.

[5] Those who are too subjective are not discussed, for example, some people think that Erlang syntax is weird.

[6] These views are only for Erlang. For other dynamic type languages, there are problems that are difficult to understand and maintain when the program size becomes larger. In Erlang, the problem of many dynamic type languages can be largely avoided due to its let it crash philosophy.

[7] The current fiery Micro-service architecture is somewhat similar to the philosophy of Erlang.

[8] In any case, adding type labeling to a program is a good practice.

[9] This aspect of performance is probably about 1/7 of the C language.

[10] The main role is to complete the browser WEBRTC media streaming and IMS network media flow between the interoperability, the need for a large number of transcoding and control

[11] This is my own comparison number, I have developed a similar system in C + +.

[12] On the one hand, because most systems are not so large and complex, as long as they are designed and developed in accordance with the basic Erlang principles, there is generally no problem; On the other hand, there are not many people who are experienced, have time and can write well.

Fred Hebert is also the author of Learn some Erlang for great good book.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.