From http://duanple.blog.163.com/blog/static/709717672011330101333271/
I. Google papers
1. Google series of papers
2. the anatomy of a large-scale hypertextual Web Search Engine
3. Planet-oriented network search: Google cluster architecture
4. GFS: Google File System
5. mapreduce: simplied data processing on large clusters
6. bigtable: a distributed storage system for structured data
7. Chubby: The chubby lock service for loosely-coupled distributed systems
8. sawzall: interpreting the data -- Parallel Analysis with sawzall
9. Pregel: A system for large-scale graph Processing
10. dremel: Interactive Analysis of webscale Datasets
11. percolator: large-scale incremental processing using distributed transactions and restrictions
12. External Store: Providing scalable, highly available storage for interactive services
13. Case study GFS: evolution on fast-forward
14. Google File System II: Dawn of the multiplying master nodes
15. Tenzing-a SQL implementation on the mapreduce framework
Google series paper translation set (Collection)
Ii. Distributed and SQL theory Series
00. appraising two decades of Distributed Computing Theory Research
0. How to Build a highly available system using consensus
1. Distributed theory Series
2. A Brief History of consensus _ 2 PC and transaction commit
3. Question about the Byzantine general-Leslie Lamport
4. Impossibility of distributed consensus with one faulty Process
5. Leases: lease Mechanism
6. paxos made simple
7. The part time Parliament -- Leslie Lamport
8. Fast paxos -- Leslie Lamport
9. paxos made live-an engineering perspective
10. Uniform consensus is harder than consensus
11. The transaction concept: Es and limitations -- Jim Gray
12. 2pc-2 phase submission: notes on data base operating systems -- Jim Gray
13. 3pc3 phase commit: nonblocking Commit Protocols
14. life beyond distributed transactions: an apostate's opinion
15. A comparison of the Byzantine Agreement problem and the transaction commit problem -- Jim Gray
16. Consensus on transaction commit -- Jim Gray & Leslie Lamport
21. time clocks and the ordering of events in a distributed system -- Leslie Lamport
22. Distributed snapshots: determining global states of a distributed system -- Leslie Lamport
23. virtual time and global states of Distributed Systems
24. timestamps in message-passing systems that preserve the Partial Ordering
25. Fundamentals of distributed computing: A Practical tour of vector clock Systems
Iii. nosql theory Series
0. Towards robust distributed systems: brewer's 2000 podc KEY NOTES
1. Cap Theory
2. Harvest, yield, and scalable Tolerant Systems
3. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant Web Services
4. base model: base an acid alternative
5. Final consistency
6. Design Mode of scalability
7. scalability Principle
8. nosql Ecosystem
9. scalability-availability-stability-patterns
10. The 5 minute rule and the 5 byte rule
11. The five-minute rule 20 years later (and how Flash Memory changes the rules)
12. Debate on mapreduce
13. mapreduce: a huge backend
14. mapreduce: a huge regression (II)
15. mapreduce and parallel databases, friends or enemies? (Zz)
16. mapreduce and parallel dbmss-friends or foes)
17. mapreduce: a flexible data processing tool)
18. A Comparision of approaches to large-scale data analysis)
19. mapreduce cannot hold? (Zz)
4. basic algorithms and data structures
1. Summary of large data volumes and massive data processing methods
2. Summary of large data volumes and massive data processing methods (continued)
3. Consistent hashing and random trees
4. Merkle trees
5. scalable Bloom Filters
6. Introduction to distributed Hash Tables
7. B-trees and Relational Database Systems
8. The log-structured merge-tree
9. Lock Free Data Structure
10. Data Structures for Spatial Database
11. Gossip
12. Lock free Algorithm
13. The graph Traversal Pattern
5. Basic System and practical experience
1. Data Structure and algorithm principles behind MySQL Indexes
2. Dynamo: Amazon's highly available key-Value Store
3. Cassandra-a decentralized Structured Storage System
4. pnuts: Yahoo! 'S hosted data serving Platform
5. Yahoo! Distributed Data Platform pnuts introduction and sentiment (zz)
6. leveldb: a fast and lightweight key-value Repository)
7. leveldb: implementation)
8. External Store: Providing scalable, highly available storage for interactive services
9. Designs, lessons and advice from building large distributed systems -- Jeff Dean
10. Challenges in building large-scale information retrieval systems -- Jeff Dean
Vi. Other auxiliary systems
1. The ganglia distributed monitoring system: design, implementation, and experience
2. chukwa: a large-scale monitoring system
3. Scribe: A Way to aggregate data and why not, to directly fill the HDFS?
4. Benchmarking cloud serving systems with ycsb
VII. hadoop-related
0. hadoop reading list
1. The hadoop Distributed File System)
2. HDFS Scalability: The Limits to Growth)
3.
4. hbase Architecture)
5. hfile: a block-indexed file format to store sorted key-value pairs
6. hfile v2
7. hive-A warehousing solution over a map-Reduce framework
8. hive-a petabyte Scale Data Warehouse using hadoop
9. Hive rcfile efficient storage structure
10. zookeeper: Wait-free coordination for Internet-scale systems
11. The Life and Times of a zookeeper
12. Avro: Big Data data format (zz)
13. Apache hadoop goes realtime at Facebook
14. Overview of hadoop platform Optimization
15. The Anatomy of hadoop I/O Pipeline
16. hadoop fair scheduler Guide
17. Next-generation Apache hadoop mapreduce
18. Apache hadoop 0.23
8. Miscellaneous
Reflections on trusting trust -- Ken Thompson
Who needs an effecect?
Go to statements considered harmfull -- edsger W. Dijkstra
No silver bullet essence and accidents of Software Engineering -- Frederick P. Brooks
Reprinted by: 2011-4-30
Source: http://duanple.blog.163.com/blog/static/709717672011330101333271/
I recommend a related article: http://blog.nosqlfan.com/html/1647.html
Most of the papers listed are the same, but some of them are unique.