Microsoft vs. Yahoo! The long-running purchase war may put many people tired of news. But today we see this about Yahoo! The technical news is worth reading: Size matters: yahoo claims 2-petabyte database is World's biggest, busiest. Yahoo! VP Waqar Hasan of disclosure Yahoo! The current data warehouse capacity is 2 Pb. It is used to analyze 0.5 billion of user access behavior every month and process 24 billion events every day. It is called the largest and busiest database in the world.
Although some data warehouses are larger than Yahoo!'s. However, those databases, non-relational data, or compressed raw data cannot be analyzed in real time. Yahoo also has hundreds of TB of data. Yahoo! A data warehouse stores structured and analytic data. It is estimated that the next year may expand to dozens of petabytes. Ebay claims that the total data volume is 6 Pb, but according to some messages, the maximum single dB is only 1.4 Pb.
Yahoo! In 2005, I bought a startup company named Mahat technologies (Waqar Hasan). This company developed a new dB based on PostgreSQL databases, it features column-based instead of Row-based. It is hard to understand that the data writing speed will slow down, but the reading speed will be much faster, during his speech, Lei Ming gave an example of his optimization at Baidu. It is very similar to this idea, so I said "inspired" to me "]. Yahoo! After buying the product, it was continuously improved (internal code: elcaro ?) Such as compression, enhanced parallel processing capabilities, and optimized query. The user-targeted interface is still PostgreSQL. This should be considered another successful case of postgresql in top-level enterprises.
Such a large database is not built using the traditional SMP architecture, but uses a common PC as a cluster (less than 1000 servers are used ). Obviously, this is the share nothing instead of the share storage dB cluster. Through the unique design method described above, we can effectively analyze this massive amount of data. This is a great technological innovation and a completely different computing mode from Google map reduce.
What makes people feel deeply is that the data listed in the article about the world's super-large database doesn't seem amazing now. I used to say that information explosion has just arrived.