Yahoo! Data warehouse: The busiest data warehouse in the world

Source: Internet
Author: User

Microsoft vs. Yahoo! The long-running purchase war may put many people tired of news. But today we see this about Yahoo! The technical news is worth reading: Size matters: yahoo claims 2-petabyte database is World's biggest, busiest. Yahoo! VP Waqar Hasan of disclosure Yahoo! The current data warehouse capacity is 2 Pb. It is used to analyze 0.5 billion of user access behavior every month and process 24 billion events every day. It is called the largest and busiest database in the world.

Although some data warehouses are larger than Yahoo!'s. However, those databases, non-relational data, or compressed raw data cannot be analyzed in real time. Yahoo also has hundreds of TB of data. Yahoo! A data warehouse stores structured and analytic data. It is estimated that the next year may expand to dozens of petabytes. Ebay claims that the total data volume is 6 Pb, but according to some messages, the maximum single dB is only 1.4 Pb.

Yahoo! In 2005, I bought a startup company named Mahat technologies (Waqar Hasan). This company developed a new dB based on PostgreSQL databases, it features column-based instead of Row-based. It is hard to understand that the data writing speed will slow down, but the reading speed will be much faster, during his speech, Lei Ming gave an example of his optimization at Baidu. It is very similar to this idea, so I said "inspired" to me "]. Yahoo! After buying the product, it was continuously improved (internal code: elcaro ?) Such as compression, enhanced parallel processing capabilities, and optimized query. The user-targeted interface is still PostgreSQL. This should be considered another successful case of postgresql in top-level enterprises.

Such a large database is not built using the traditional SMP architecture, but uses a common PC as a cluster (less than 1000 servers are used ). Obviously, this is the share nothing instead of the share storage dB cluster. Through the unique design method described above, we can effectively analyze this massive amount of data. This is a great technological innovation and a completely different computing mode from Google map reduce.

What makes people feel deeply is that the data listed in the article about the world's super-large database doesn't seem amazing now. I used to say that information explosion has just arrived.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.