Read notes for articles under All-time-favorites aggregation on the high Scalability website

Source: Internet
Author: User
Tags cassandra

Most of the articles seem a bit old, and don't know what the architecture of FB, TUMBLR, Pinterest and Twitter are like now.

1, clustering vs sharding? Auto/manual (need to remove join, add cache,nosql doesn't seem to be as mature as MySQL?) But Hbase/cassandra seems to be able to do it again.

2, technology for business services, architecture for application services, so innovation lies in the discovery of real valuable problems (demand)

3. Apply a specific database? Materialized "Data Items", lock-free transactions, append-only storage, for large scale design: General FS, ceph/... (Distributed Object Database)

4, LB: Shorten the path between user and "content"

5, howto protect data? Howto use them?

6. User table (the table storing information for users) is not sharded.

7, shard with large capacity planning (means ' hash big ') <--add timestamp to hash key?

8. Mapping (Shard/Storage) & Reverse-mapping (query)

9, Cache:memcache/redis (support data structure richer point)--do not know now memcached function is perfect?

10. Scripting:sharding filter scheme, migrating data (not so good)

11, Pyres:python over Redis? (Resque--)

12, Dev:everyone have access to everything, be careful. (Unified global View) Small teams with git may not be appropriate, and Git vs svn is sometimes just a performance reason to manipulate large repo

13, SOA: The actual DB Proxy is also a service!

14. Keep It & Fun

15, Architect is doing the right thing,if growth can be handled by adding more of the same stuff. (Horizontal expansion)

16, do not be afraid (?) ) Loss of part of the data, based on data nature Cap/base

17, Master-slave lag (the disadvantage of master-slave replication): Of course, the main-master replication will introduce the distributed consistency problem, the first should be Shard writes (how to really do the non-join design?) By adding redundancy? )

18. Keep Load at <= 50% (live capacity must be controllable) (or "Set aside resilience")

19, use Tool,not Framework (the former means small composable, the latter is actually an intrusive design, such as the disgusting spring)

20, to avoid (distributed) Joins:de-normalize? Designed to be extensible/"stretched" from the start

21. Turn the website into a service (API): Twitter's early success practices

22. Prevent abuse

23, Cache vs Log: Note the similarities between the two, the cache is actually cached in the recent hot spot data, and LOG analysis can be deleted, that is, will not run out of storage space

24:facebook 2011:batching IO, avoid hbase hot keys?

Java <--> Thrift <--> PHP

Sharding plan: Hand slicing? This is supposed to be before the data center.

10000 Writes per sec per Server

25, Dropbox 2011:python for backend and client (Python write Tortoisehg/mercury actually good performance, sublime not also python write); but it can't be used on Android (-_-)

Memory Fragmentation Issues

Ps:rsync synchronization of a large number of deep nested small files when poor performance, as a one-time compression download

26, Anti-Spam (Mollom 2011)

To protect the ML algorithm, users cannot submit wrongly rejected data

Free user input to help improve training ml

Disks, SSD-to Cassandra:raid 10 (stripe & Mirror), for heavy writes & row caching;aging mechanism (just for privacy, "right to oblivion" in European legislation)

You can design your own local data with HTML5 local storage to store all of your users ~

Client lb: First request a list of available servers, and then order the request, one cannot replace one (here can not randomly request!)

What is the reputation mechanism for IP addresses?

AWS Virtual Server: IO is the bottleneck, scale up

Coredump analysis of 16GB leaks? Oh Crazy

27, Redis:lpush/ltrim;lrem;zadd/zrevrange/zrank/zrange;sadd;pub/sub; the usual get/set

28. Reliability: Mttr indicator <--MTBF

Capacity Planning & Expect Failure (k), Alone to defeat! )

Read notes for articles under All-time-favorites aggregation on the high Scalability website

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.