Designing Data-intensive Applications

Source: Internet
Author: User

Here's what's in the preface to the book. , my English proficiency is limited , There is no understanding of the place also please advise , This is the book's own notes and summary .

Data is the center of many challenges in today's system design, and some difficult issues such as system scalability , consistency, reliability, effectiveness, and maintainability need to be clarified . In addition, we have a variety of tools, including relational databases, NoSQL databases, streaming or batching, and Message Queuing, which is the best choice for your application? How do you understand these popular terms?

In this practical and comprehensive guide, author Martin Kleppmann helps you understand this diverse area by analyzing the pros and cons of various technologies for data processing and storage. In software development, the world is changing, but its basic principles remain the same. Software engineers and architects should learn more about how to apply these ideas in practice, and how to make the most of big data in modern applications.

In summary, the book has the following five features:

    • Explore the technical details behind the system you're currently using to learn how to use and manipulate them more effectively
    • Make the most informed decisions by identifying the pros and cons of different tools
    • Trade-offs between consistency, scalability, fault tolerance and complexity
    • Understand the research and practice of distributed system technology depends on the database you are using
    • Explore the technical details behind most online services and learn about their architecture

The author begins by telling the reader the purpose of writing the book:

Technology is a powerful force to promote social development . data, software and information can be used to do bad things: consolidate unfair power structures, undermine human rights, and protect vested interests. But they can also be used to do good: let the outside world hear the voices of the people at the bottom, create opportunities for everyone, and avoid disasters. This book is dedicated to directing people forward in the right direction.

If you have worked in the field of software engineering in recent years, especially in the area of service-side/back-end development, your ears may be heavilyaboutHot terms for data processing and storage flooded: nosql! Big Data! Full Domain (Web-scale)!Shards(sharding)! Final Consistency(eventual consistency)! ACID (atomicity, consistency, isolation, durability)! CAP (consistency, availability, partitioning fault tolerance) theory!Cloud Computing Services(Cloud Services)! Mapreduce! real-time!

Over the past decade we have seen interesting trends in databases, distributed systems, and applications built on them. These developments have different impulses :

  • < Span lang= "ZH-CN" > Internet companies such as Google, Yahoo, Amazon, Facebook,linkedin, Microsoft and Twitter are dealing with massive amounts of data and traffic, Such demands force them to create new tools that enable them to handle this massive data efficiently
  • < Span lang= "ZH-CN" > Enterprise Agility, low cost development, by shortening the development cycle and building a flexible data model to the changing < Span lang= "ZH-CN" > market need to react quickly
  • Successful free open source software in many businesses has become the preferred choice for commercial or custom in-house software
  • CPU clock frequency growth has been saturated in recent years, but multicore CPUs have become the server standard, the network is increasingly fast, which means that parallel computing power has been greatly improved
  • Even if you work on a small team, you can still build distributed systems on multiple computers and even multiple geographies with infrastructure services (Iaas) such as AWS (Amazon WEB Services)
  • Many services are now expected to be highly reliable, and the prolonged downtime caused by power outages or repairs is becoming increasingly unacceptable.

data-intensive applications are leveraging these technologies to expand the possibilities. Many applications are now data-intensive, not computationally intensive. Because the computing power of CPUs is no longer a bottleneck for these applications, the greater challenge comes from the amount, complexity, and speed of the data being updated.

Tools and technologies that help data-intensive applications store and process data are quickly adapting to these changes. while the new database system ("NoSQL") has aroused widespread concern in the industry, Message Queuing, caching, search indexing, batch processing and streaming frameworks, and related technologies are equally important, and many applications use a combination of many technologies.

The ubiquitous buzzwords mean that we are passionate about trying new things , which is a good thing. However, as software engineers and architects, if we want to build great applications, we need to have a more in-depth understanding of These new technologies and to weigh the pros and cons of these technologies, and we need to dig deeper into the details behind these fresh technologies .

Fortunately,,Original AIM,Behind the rapid development of technology,Regardless of which version of the specific tool you use,have long-lasting principles.If you understand these principles , you'll know where each tool works and how to make the most of it and how to avoid mistakes. This is the purpose of this book. .

The goal of this book is to help you understand the ever-changing technology of data processing and storage. This is not an exclusive tutorial for a particular tool, nor is it a full-drying rationale?? On the textbook. Instead, we'll cover a number of success stories for data systems: those that run in a production environment that support many popular applications and that meet scalability, performance, and reliability requirements on a daily basis.

We'll delve into the internals of these systems, comb their key algorithms, discuss their principles and trade-offs when they face different goals. In this process, we will try to find an effective way to explore data systems-not just to know how they work, but also why they work in this way, and what questions we need to ask.

After reading this book, you will be able to properly select the right technology on demand and learn how to use multiple technologies to build a robust application architecture. Fortunately, it's rarely necessary to build your own database storage engine from scratch. However, you must be able to intuitively understand what is going on inside your system, so that you can infer their behavior, make reasonable and effective design decisions, and be able to track down problems that may arise.

who should read this book?

If you develop an app that has a server/backend that stores or processes data, and your app uses the Internet (for example, Web applications, mobile apps, or sensors connected to the Internet), then this book is for you.

This book is intended for software engineers, software architects, and technical managers who prefer programming. If you need to make decisions about the architecture of the system you are developing, for example, if you need to choose which tools can solve a given problem and figure out how best to apply them, this is quite true. But even without this demand, this book can help you better understand their merits and demerits.

You should have some experience with developing Web applications or network services, and you should be familiar with relational databases and SQL. It is better to be familiar with some non-relational databases and other data-related tools, but not required. A general understanding of common network protocols such as TCP and HTTP is also helpful for this book. This book has nothing to do with the programming language or platform you use .

If you meet any of the following listed below, this is a valuable book:

  • You want to learn how to make your data system extensible, for example, to support Web or mobile apps for millions of of users.
  • You need to increase the availability of your application (minimizing downtime) and running stability.
  • You are looking for ways to make your system long-term and easy to maintain, even as the system continues to evolve, even as requirements and technologies evolve .
  • You have a natural curiosity about the workings of things, and you want to know the rationale behind most websites and online services. This book analyzes the internal structure of various databases and data processing systems, and it is interesting to explore the highlights of these system designs.

sometimes, when we talk about scalable data systems, some people make comments like "You're not Google or Amazon." " Stop worrying about size, use relational database only There is a truth in this statement: Building a system for an unwanted scale is a waste of energy, and there's a that might lock you in a rigid design. In fact, this is a form of premature optimization "Premature optimization is the root of all evils" . But it is also important to choose the right tools, each with its own distinct advantages and disadvantages. we will see that the relational database is important but he is not the definitive judge of the data processing technology.

scope of the book

This book is not intended to give detailed instructions on how to install or how to use a dedicated software package or API, because it already has a lot of documentation. instead, we'll discuss the principles and tradeoffs that are critical to the data system, and we'll explore the different design decisions made by different products.

In the electronic version of this book, we include full-text links to online resources. all links are verified at the time of publication, but unfortunately, due to the nature of the network itself, there are frequent interruptions to some links, and if you are experiencing this or you are reading a printed copy of this book, you can use a search engine to find these references. for academic papers, you can search for an open access (open-access) PDF file by searching for the relevant title in Google Scholar search. Alternatively, you can find all the references in https://github.com/ept/ddia-references and we will maintain the latest links here.

We focus primarily on the architecture of data systems and how they are integrated into data-intensive applications. Because of the length of the book does not involve the deployment, operation, security, management and other areas-these are complex and important topics so that in this book with some one-sided rough annotations are not enough to give them the correct evaluation, any of them can also be a separate book.

Many of the technologies described in this book belong to the Big data domain. However, the term "big data" is overused and vaguely defined . This book uses more explicit terminology, such as single-node vs distributed systems, online/interactive vs offline/batch systems.

This book favors free and Open source software (FOSS) because reading, modifying, and executing source code is an effective way to understand how things work. using an open platform can also reduce the risk of vendor lockout. however, where appropriate, we also discuss proprietary software (closed source software, software as a service, or in- house software that is only described in the literature but not publicly released).

references and extended reading

Much of what we have discussed in this book has been mentioned elsewhere in some form----conference reports, research papers, blog posts, codes, bug trackers, mailing lists. This book summarizes the most important ideas from many different sources, including the provenance of all the original documents in the book. if you want to delve deeper into every detail of the knowledge, the resources at the end of each chapter are good, and most of them are available online for free.

Designing Data-intensive Applications

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.