designing data intensive applications by martin kleppmann

Read about designing data intensive applications by martin kleppmann, The latest news, videos, and discussion topics about designing data intensive applications by martin kleppmann from alibabacloud.com

Reliable, scalable, and maintainable Data Systems------Designing data-intensive applications Reading notes 1

Frankly, it is also a chance coincidence, in the postgraduate stage into the field of distributed systems learning. Whether it's large-scale storage or computing, its core is the need to use distributed technologies to leverage parallelism to address data-intensive applications. Recently began to chew this "designing

Data model and Query language------"Designing data-intensive Applications" Reading notes 2

the database engine, which allows the database system to introduce performance improvements without requiring any changes to the query. However, SQL is more functionally limited and has limited flexibility, which gives the database more automatic space for optimization. Declarative languages are generally suitable for parallel execution because they specify only the pattern of the result, not the algorithm used to determine the result. 4. SummaryThe

Copy mechanism and copy synchronization------"Designing Data-intensive Applications" Reading notes 6

architecture is very reasonable. However, because of the reasons mentioned above, we usually do not adopt the synchronous replication method. This results in significant inconsistencies in the data: if you perform the same query on both leader and Follwer, you may get different results because not all writes are fed back on the follower in real time. This inconsistency is only temporary, so this situation is called final consistency. For this situati

Designing data-intensive applications: opportunities and challenges for distributed systems

In the first part of "Designing Data-intensive Applications" (see above), the basic theory and knowledge of the data system are introduced, all based on single node. In the second part of the Ddia (distributed data), the field of

Transaction and Isolation levels------"Designing data-intensive Applications" Reading notes 10

write transactions. If the database tracks the activity of each transaction in great detail, it can accurately determine which transactions need to be aborted, but these costs can become significant. Less verbose tracking of transactions can be faster, but may result in more transactions being aborted. A serializable isolation snapshot is advantageous compared to a two-phase lock: One transaction does not need to block locks that wait for another transaction to hold. Summary:In this article, we

Coding and pattern------"Designing data-intensive Applications" Reading notes 5

PROTOCOLBUF and thrift. Each time the database schema changes, the administrator must manually update the mapping from the database column name to the field tag. And Avro is a simple mode conversion every time the runtime is run. Any program that reads a new data file will perceive that the record field has changed. 4. SummaryCoding details not only affect productivity, but more importantly, the architecture of

Linear consistency and full sequence broadcast------"Designing data-intensive Applications" Reading notes 12

same order, assuming there is concurrent write, all nodes will agree on the first message written to the user name. While full-order broadcasts can guarantee a linear write to a program, it is assumed that the nodes that read operations do not guarantee linear reads because of the latency of the message delivery, so the result of the read operation may be outdated.Of course, this can be achieved by returning the location of the most recent log messages, by querying the location, waiting for all

Peer structure and quorum mechanism------"Designing data-intensive Applications" Reading notes 8

capture dependencies between operations, but this is not sufficient to resolve situations where multiple replicas are written in parallel. Instead, we need to use the version number of each replica and each key. Each copy increments its own version number when processing writes, and tracks the version number seen from the other replicas. This information indicates which values to overwrite and which values are saved as a sibling version. A collection of version numbers for all replicas is calle

The trouble of Distributed system------"Designing data-intensive Applications" Reading notes 11

after another event.4. Unreliable leasesIn a distributed system, it is sometimes necessary to ensure that the storage service file is accessible only to one client at a time, because if multiple clients try to write it, the file is corrupted. You need to implement a distributed lock by obtaining a lease from the lock service before accessing the file. But sometimes this lock is not as reliable as we might think, as shown in:If the client holding the lease is 1 suspended for a reason such as a G

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.