In the online world, data is money. As the world's largest auction site, ebay has a deep understanding of this. For now, ebay's analysis of a variety of online data is thoughtfulness like a camera installed in front of each customer. There is no doubt that ebay has a staggering amount of data. It handles 100PB of data every day, including 50TB of machine data. ebay, it can be said, faces astronomical data challenges every day.
As early as 2006, ebay set up a large data analysis platform. To accurately analyze the user's shopping behavior, ebay defines hundreds of types of data and analyzes the customer's behavior. However, it also poses new challenges for ebay. You know, the company's data volume is too much to imagine, no one can digest so much data, and no one can build a model based on all the data. In fact, what ebay really applies to is only a small fraction of the data it collects. "The rest of the data, ebay either discards it or stores it." Because maybe someday, the technology will make a breakthrough, and the data becomes useful. "said Lin, CEO of ebay Greater China.
So now how does ebay use this data to promote business innovation and profit growth?
"Portrait" for the user
Ebay has nearly 200 million users, and the site's list of items has more than 30,000 categories. In the day-to-day trading of the platform, ebay handles almost thousands of dollars per second. These transactions are actually just the "tip of the iceberg" of ebay's total data.
Based on large data analysis, ebay has a number of questions to answer every day, such as, "What was the hottest search item yesterday?" "And even such simple questions need to deal with 5 billion of page views." From this point of view, any basic business problem is a huge problem for the company.
Lin A classic example of how ebay uses big data to increase online trading. For example, a young woman browsing the ebay site at Starbucks at 10 in the morning, what should ebay give her?
We have actually done a lot of research on these points. "In fact, when users are 10 o'clock in the morning, 12 o'clock at noon, or at 7 o'clock at night, she browses to different items, and at restaurants or at home, it also affects browsing and search, and the age of the user, the weather at the time and so on, will have an impact on the shopping," Lin said. What ebay has to do is to learn different shopping patterns under different scenarios and push them to the most desired products. ”
It is reported that ebay can from the user's previous browsing record "guess" what kind of goods she wants, you can also figure out what the user might want from a set of hundreds of scenarios, or see what kind of merchandise she buys, and then infer the potential needs of the user, against another female user with similar characteristics. After synthesizing the various considerations, the ebay backend needs to push the product page to the user in just a few seconds. This means that the ebay system needs to have very fast computing speed.
This model of operation, there are a considerable number of man-made factors. For example, a machine can collect tens of thousands of users ' data, but an ebay engineer can define 100 of the data as valid data, and the model is based on these valid data. In addition, when computers automatically "learn" to analyze trends in the formation of various data, ebay needs to set the logic of machine learning in the behavior associated with commodity transactions.
In addition to pushing targeted goods through large data for the user "portrait", ebay has previously tried to use large data for search engine optimization.
In particular, ebay can grasp the user's behavior patterns, so that the search engine more "intuitive." If the time is backward for a few years, users will find that when they use the ebay search engine, they can only understand the literal meaning and look for it literally. Most of the time, search engines do not understand the user's true intentions. But now, ebay is trying to change or rewrite users ' search requests, add synonyms or replacement statements, and give more relevance, and increase online trading. And behind this, all is inseparable from the support of large data.
Provide "intelligence" to the merchant
Ebay also gives businesses a wide range of "intelligence" based on user-shopping data. For example, ebay will tell manufacturers what products they are searching for on the internet, or data from various export industries, and manufacturers will respond immediately.
Most of the time, ebay will be based on its own or other electronic business site transactions, to the merchant to recommend the category they should sell. "This is also the work being done in Greater China on ebay," Lin said, "For example, a Chinese businessman wants to sell the product to Australia, we can tell him by data analysis, he can sell about one months about how many products, pricing should be in the range, there are many businesses in the market to sell the same products, His market share is probably much. ”
On this basis, ebay also tried to figure out the vendor's replenishment frequency. In fact, the overseas warehousing is the business very headache problem, once the calculation error, it may cause inventory backlog or out of stock. And on ebay, once a user orders to find a business out of stock, will be a very serious problem. In this case, ebay can through the past data analysis, to find the first batch of goods sales, as well as in the past when the speed of sales should be replenishment, logistics and how long time. Through the calculation of these data, ebay can calculate the logic of the vendor replenishment.
These data analyses are useful for businesses to develop new sales categories. Because normally it takes four or five months for a merchant to be able to figure out the sales of a short season of goods and how popular they are in every area.
Of course, what ebay does is to provide businesses with a variety of potential business opportunities, as to whether the sellers are willing to put into production, or whether to find the right suppliers to purchase, still need them to complete. Most of the time, ebay recommends selling 200 new categories, and the final business can only find 50 new products suppliers.
On top of that, ebay can also play the role of Quality control (QC), with all the information generated on the platform. For example, a seller will sell 1000 products on ebay, when it was sold to 50 products, there were 5 products out of the question, 200 products, 20 products were out of the question, and 400 products, 40 products appeared quality problems, and so on. What ebay has to do is to warn the sellers in time for their early problems.
Further, when sellers sell 10 or 20 products, ebay will have to detect possible problems based on return rates, buyer reviews, and so on. At the same time, ebay will remind sellers to let them monitor suppliers to improve quality, or choose to put the product down, or modify the description of the article.
Ideally, the quality control system would form a large data loop and help sellers reduce returns and sell more goods. If the sellers are still going their own way after receiving such a notice, ebay will assume that the seller does not value the goods, and that at some stage, ebay will impose a "quota" on the transaction and limit its trading volume.
"The difficulty with quality control is that I need to use data models to find problems when sellers have little transaction volume," he said. This early prediction involves complex operations. "Once the volume is large, the sellers themselves will be able to count the return rate, before the loss is irreversible," Lin said. ”
Trial and error and challenge
Like other online trading platforms, ebay is also sensitive to fakes. For now, the company is trying to make the system "smart" in identifying fakes through large data technology.
In fact, "cyber-counterfeiting" work is not easy. To know, fakes often appear in various forms on the network, and repeatedly banned. Take Rolex as an example, the fake merchant may add a space in the word, also may place two letters interchange position, even the name does not appear Rolex, just the picture shows the Rolex watch appearance. There are so many brands on ebay that there is a natural variety of fakes. In this case, by simply grasping the keyword in the name or description of the product, you cannot grasp the fake.
What ebay is doing right now is creating a model or rule through data analysis that, if a merchant's deal conforms to the rules or characteristics, may be selling fakes.
For example, when a seller's goods sell very cheaply, sell quickly, but the back of a lot of complaints and returns, the system will be the "suspicious" pattern identified, and then the staff to determine whether the seller is selling fakes. In other words, "even if the amount of data is large, people who sell fakes have relatively fixed patterns." "Lin said. In this way, ebay effectively identifies a number of fake businesses.
However, Lin does not hesitate to admit that such a large data analysis method also has its drawbacks. "In the case of fakes, this approach can only be traced back to the problem and cannot be predicted beforehand." "It's not that easy to solve, because no matter what model you use, fake trading can always lie to you for a while," he says. ”
In addition to the lag of the analysis, ebay's Big Data challenge is also reflected in the huge data processing. Although the Enterprise Data Warehouse provides great performance for queries, it still does not meet the needs of ebay storage and flexible processing. You know, these systems cost quite a lot of money, and when ebay adds 50TB of data every day, it's expensive.
On this basis, ebay collects a considerable portion of the data that is currently seen as useless data. After all, the more data is collected, the more variables there are, and the more the resulting "noise" is, the more distorted the model. From this perspective, ebay has to do is record the meaningful data and destroy the unwanted information. The problem is that 85% of the questions ebay wants to analyze are new or unknown, "ebay doesn't know what information might be useful in the future," Lin admits. "Those data that now look ineffective will probably be digested as technology progresses in the next years, and we can now only store that data." ”
But in another case, if all the information is stored, ebay will add hundreds of millions of data each month. In such a vast ocean of data, analytical work cannot be done at all. So for ebay, this is a problem that must be balanced.
It should be noted that ebay's current analysis model is not perfect enough. Whether it's "guessing" users or analyzing business on ebay, ebay is a lot more wrong. For this, Lin a credit card example. In his view, "Banks are actually using big data the most powerful, but no matter how perfect the wind control model, the world still has about 2% of credit card odds." "Moreover, ebay is not used by mature institutions certified models, many times to rely on their own to guess, then the error is not surprising."
(Responsible editor: Lvguang)