Facebook Big Data: Handle more than 2.5 billion content and 500TB of data per day
Source: Internet
Author: User
KeywordsCan can every day can every day big data can every day big data 500 million can every day big data 500 million these
local time today, http://www.aliyun.com/zixun/aggregation/1560.html ">facebook in California headquarters to several reporters on the" Big Data "statistics, For example, the Facebook system handles 2.5 billion messages a day, 500+ TB data, clicks the like button 2.7 billion times, uploads 300 million photos, and scans around 105TB every half hour. Facebook also disclosed the details of the new project "Project Prism" for the first time.
The data is particularly important for Facebook, says Jay Parikh, vice president of infrastructure technology at Facebook. By processing this data quickly, Facebook is able to launch new products, know the user response, and adjust the product design almost in real time.
Another statistic revealed by Facebook shows that more than petabytes of data is stored in a separate Hadoop disk cluster, which Parikh says is the world's largest single Hadoop system. But he points out that while the size of the data is huge for small businesses, no one will care about the 100PB of data stored in your database in a few months ' time. Because the data is growing fast and we're getting more hungry for data, in a few months ' time, the petabytes of disk clusters are no longer news.
In addition, Parikh said, the data are not only helpful to Facebook, but advertisers are also beneficiaries. Parikh explains: "By tracking the ads posted on the site on all levels of users (gender, age, hobbies), we can be targeted to increase the intensity of advertising, so that the effect is more obvious." For example, if the advertising effect is better in California than elsewhere, we will put more ads in California to maximize the effectiveness of advertisers.
Facebook does not even need to make any changes to see the impact of the data. As long as the historical data, Faceboo can build a model, and then data simulation, you can see the ad click Rate (CTR) multiplied. At the same time, there is a system called Gatekeeper that tests the changes brought about by data from a small percentage of the user base.
The next thing to talk about is the new project, "Project Prism." Now Facebook is actually storing all of its user databases (constantly changing) in a particular data center, while other data centers are used to store other data and redundant data. However, as the user database grows, a data center will not be enough to store all the data, then the entire user database needs to be moved to a larger data center. The whole process of data transfer is also a waste of resources.
Parikh said: "Project Prism" allows us to store this "huge warehouse" (the user database) separately but still does not affect the entire view of the data, meaning that the data can be hosted separately on Facebook in California, Virginia State, Oregon State, North Carolina State is even a data center in Sweden and other fields.
Internally, Facebook chooses not to partition the data or to set up barriers between different business units, such as the advertising department and customer support services. Product developers can view data across departments to assess whether the small adjustments they make will increase the user's stay on the site, raise the user's complaint, or increase the number of clicks on the ad.
As a user, the idea that Facebook's employees can be very knowledgeable about their activities is bound to feel uneasy. But Facebook has promised users that it will take multiple protections to prevent users from abusing their data. All data access records are recorded by Facebook so that you can track which employees are looking at the data. And Facebook will also have intensive training for its employees, each with its own area of data, and will be fired if the employee has exceeded his authority by peeking at the wrong data. Parikh solemnly declared: "We have adopted 0 tolerance policy, absolutely do not condone any illegal use of user data." ”
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.