In the "Up" section of the big data on Silicon Valley's observations (http://www.china-cloud.com/yunjishu/shujuzhongxin/20141208_44107.html?1418016591), I have basically combed through a relatively complete shape of the big data growth situation in the Silicon Valley region. A friend looked at the "next" after the notice on the micro-blog to give me a message, I heard that the next chapter to introduce some of the company's large data department, ask if you can add a Google, especially Google Maps, Because especially want to know the world's largest search engine and daily essential travel artifact is how to be a excavator.
So, last week I went to Google interview. This article will present a total of four different types of companies in Silicon Valley to play big data, including the famous FLAG three (Apple in the big data this is not outstanding performance).
This article is from Evernote AI director Zeesha Currimbhoy, LINKEDIN Senior director of data, Simon Zhang, former Facebook infrastructure engineer Ashish Thusoo and Google A first-line engineer in a large data department and an exclusive interview with Google Maps. enjoy~~
Evernote: This year, the new AI department Sword Finger Depth study
Evernote's Global conference, CEO Phil Libin, said an important direction of Evernote was "to make Evernote a powerful brain." To achieve this goal, you have to mention the augmented FDI team that they just reorganized. I'm at Stanford for the AI team manager Zeesha Currimbhoy, here to analyze the firsthand information from her.
What is it
Earlier this year, the 2-year-old data-processing team was reorganized as the augmented FDI, led by Zeesha, a total of 10 people less than, very low-key, hardly audible on weekdays. What the hell are they doing?
Unlike what we often call AI (artificial FDI), Evernote's team name is called augmented FDI, usually referred to as IA.
Zeesha is clearly the elder of the team: "I joined Evernote in 2012, directly to the data processing team that was just established, which is now the embryonic form of AI teams." Our first projects are simple and easy projects, such as optimizing the user's input experience according to your personal typing style. ”
The traditional AI refers to a large number of data and algorithms for the machine to learn to analyze and make decisions. And here the IA is to allow the computer to do a certain amount of operation, and the ultimate goal is to arm the human brain, let people to make better decisions. These two concepts in the implementation of the nature has a lot of similarities, but the starting point is completely different.
This distinction is also the highlight of the Evernote AI team. As a note-taking tool, the biggest difference between a Evernote and a search engine like Google is that it's very personal. The user's stored notes, site links, photos, videos, etc. are the embodiment of his way of thinking and concerns.
From where?
The original intention of the Zeesha group was to help users think by analyzing their stored notes to learn their way of thinking, and then using the same pattern to extract information from third party databases (i.e., various open-source information on the Internet). In this sense, the Zeesha version of the future Evernote more like a brain super plug, for the human brain to provide a variety of powerful understandable data support.
At present the whole team's entry point is very small and focused.
"We're not just helping users do the search, but it's more important to push the right information to the user at the right time." ”
The first step in achieving this goal is to categorize the user's own notes and find the correlation points. Earlier this year, Evernote has implemented a feature called "Descriptive Search" in the English version of Mac. Users can directly describe the items they want to search for, and Evernote automatically returns all relevant information.
For example, users can search directly for "all pictures in Prague after 2012", or "all vegetarian menus". Regardless of how the user's notes are categorized, decriptive search searches for relevant information and avoids having to return to a large range of data. This is just the beginning of the AI team's long term goal, which will build a series of intelligent products.
Where to
Needless to say, such a new team naturally faces many challenges. One of the more important technical difficulties is to Evernote the user's data. Although Evernote's user volume has reached 100 million, the AI team has not done cross user data analysis due to the group's focus on personalized analysis, plus privacy protection and many other reasons.
The result is that the team needs to analyze 100 million groups of different small data groups. For example, suppose that I have only 10 notes on Evernote, and that Evernote should be able to use these small amounts of data to analyze effective results. Of course, the direct result of these technologies is that the more users use Evernote, the better the personalized user experience will be. In the long run, it is also a feature that can increase user stickiness.
But Zeesha also admits: "Indeed, we all know that there is no big data there is no so-called intelligent analysis." But what we do now is to find new and appropriate algorithms on this premise. "She didn't go into the idea of what the team was working on, but given that there was not a very successful precedent in the field, we had reason to expect some interesting results from the Evernote AI teams under Zeesha," he said.
Facebook: Big data for accurate and internal communication of external ads
Facebook has a team of more than 30 people who spent nearly 4 years building a Facebook data-processing platform. Today, Facebook still needs more than 100 engineers to support the day-to-day operation of the platform. It is conceivable that the infrastructure of large data analysis is already a time-consuming project.
One of the big values of Facebook is the data released by more than 1.35 billion active users every day. And its large data sector after seven or eight years of groping, only in 2013, the department's Key foundation positioning into the precise advertising, began to build a set of their own data processing systems and teams. and a series of matching acquisition activities, such as buying the world's second largest advertising platform Atlas.
According to former Facebook data infrastructure Manager Ashish Thusoo, Facebook's data processing platform is a self-service, self-managing platform that manages over 1 Exabyte data. Various departments within the company can directly see the processed real-time data, and further analysis according to requirements.
At present, more than 30% of the company's team, including engineers, PRODUCT managers, Business analysts, and many other positions of the people will be used this service every month. The establishment of this data-processing platform makes it easy for different departments to communicate with each other through data, which significantly changes the way the company operates.
Back in history, Facebook first had a large data prototype in 2005, when Zacky himself. The method is simple: data storage and management with Memcache and MySQL.
Soon bugs are emerging, with the rapid increase in the number of users, the use of Memcache and MySQL to hinder Facebook's rapid development Lifecycle (change-fix-release), and the inconsistencies in system synchronization often occur. The solution based on this problem is a distributed database of 1 million read operations per second and millions of write-operation TAO ("The Associations and Objects"), which mainly resolves the bugs that the server hangs out when a particular resource is overloaded.
Zacky's strategy for the first quarter of 2013 was to focus on the company's big data orientation, and specifically to not make excessive demands on profitability, but to demand the following three features based on large data:
Publish new advertising products. such as friends, the management of specific friends and can improve the accuracy of advertisers to launch the function.
In addition to cooperation with Datalogix, Epsilon,acxiom and BlueKai, to enhance the ability of advertisers to advertise.
By acquiring the Atlas advertising Suite, advertisers are better at judging the ROI of digital media advertising investment.
LinkedIn: How big Data directly supports sales and cash-cashing
An important function of LinkedIn's large data division is to analyze the vast numbers of user and employer information on the mining site and to directly support sales and liquidity. The director of the Business Analytics team at its core team, Simon Zhang, said that now everyone in the country is talking about clouds, discussing cloud computing, discussing big data, talking about big data platforms, but few people say: How can I use data to generate more value, in layman's terms, to make money directly.
But the issue is important because of direct income. There are about 3 million company information extracted from the resumes of all the users in LinkedIn in 4.5, and it's impossible to call every company as a salesperson, so the question is: Which company should fight? Would it be useful to call?
The salesman asked Simon, who said that only through data analysis. The answer to this question is that before the big Data department, these decisions are taken in the head.
Simon and the only other three colleagues at the time wrote a model and found that the people who actually bought the LinkedIn service were, in fact, the first line of product managers, and those who were hunting on LinkedIn. But after they made the decision, the boss signed it, which is a puzzle. After the results of the data analysis, their sales staff change the strategy, the target group on these middle managers, sales conversion rate increased by three times times.
At that time LinkedIn was only 500, and Simon supported 200 salespeople on a single person. He predicted that Google would spend 10 technologists dollars on the hunt, and the salesman said, Simon, this is impossible.
"But that's what the data shows, but it's probably more or less." I realized that it was important to process this step. ”
Today, LinkedIn's "headhunting" business accounts for 60% of total revenue. How it developed in four years, he revealed that there were several steps to build the model:
Analyze how many employees it has in each company.
Analyze the company how many people it recruited.
Analyze people's position function position level all the parameters, these are the functions of our model. Then analyze how many HR employees they have, how many are in charge of headhunters, how much they have lost their headhunters, and how long they spend on LinkedIn each day.
That's the first thing LinkedIn's Big data unit does.
Simon told 36 Krypton, the company's internal from large data analysis of this basic item, can continue to iterate the new product line LinkedIn's three business models are talent solutions, marketing solutions and paid subscriptions, but also our traditional three main revenue pillars. In fact, we have another, that is, the fourth business model, called the "Sales Solution", has been online at the end of July this year.
This is sold to enterprise-level users. Back to the sales example, the LinkedIn Big Data system is a great model, just change the inside keyword, or one parameter, can become another product. "We want to help the enterprise users and let them know at the fastest speed who wants to buy your stuff." ”
Although this fourth business model does not seem to contribute much to revenue at present, it accounts for only 1%, but anyway has unlimited imagination, and the company has a high expectation of the product. "I can't tell you how much it's growing, but it's a trend, and Linkedin's business-to-business is a big trend without doubt." "Simon said.
Google: A closed-loop large data biosphere
What is the relationship between Google and big data as the world's largest search engine? Thanks to the message on Weibo, this is really an interesting topic.
Google's major data base product was the first large-scale commercial distributed filesystem GFS (Google File system), released in 2003, consisting mainly of MapReduce and big Table. The former is a software architecture for large data parallel computing, while the latter is regarded as the originator of modern NOSQL database.
GFS provides the possibility of computing large data, and the various file systems and NOSQL databases that emerge today are undeniably affected by Google's earlier projects.
The Map Reduce and BigTable, released separately in 2004 and 2006, laid the cornerstone of Google's three-major data products. The release of these three products is the founder of Sergei Brin and Larry Page, both of whom are PhD students at Stanford University, and the power of scientific research is always a wonderful thing to penetrate into the industry.
In 2011, Google launched a BigQuery of query services and storage services based on Google's infrastructure to provide customers with large data, somewhat similar to Amazon's AWS, which, while not yet an order of magnitude in terms of market share, is more advantageous than the price system. Google through this to meet the Internet companies to fight the trend of service, let a number of Third-party services Integrated BigQuery visual query tool. The market for large data storage and analysis has been preempted.
Google's own business servers, such as BigQuery and GAE (Google App Engine), have built a large data biosphere, creating a closed loop for program creation, data collection, data processing and analysis.
Then look at Google's product lines, search, advertising, maps, images, music, video, these are to rely on large data to support, based on different types of data model optimization to enhance the user experience to enhance the market share.
To speak alone Google maps, the global market in the mobile map has more than 40% of the market share of products, but also the United States to travel artifacts. It marks almost every corner of the globe with Internet coverage, and the 3D visual processing of buildings has been completed as early as last year, and this data processing may be the largest, but it is limited to the level of data concentration. Real data analysis and mining is reflected in: When entering a site, the most recently used by the most users of the path will be first recommended to the user.
Google has also tagged and processed images of Google+,panoramio and other Google cloud platforms, combining picture content with geographic information, and after image recognition and social system scoring, Google is able to push higher-quality images to users, Optimizes the user to see the map the visual feeling.
Big data has made a huge profit for Google, such as Google Ads (AdSense) that are ubiquitous when you surf the internet in America. Of course, it is a double-edged sword, to the owners to bring income at the same time, but how to balance the problem of user privacy, is a large data processing needs to overcome another technical difficulties, perhaps also need to further improve the Internet order to support.
As stated in "Top", most companies have no ability to handle data themselves, except for several leading companies such as Facebook. Finally attached two examples, to say that this side of the big companies do not have independent large data sector is also normal, the adoption of outsourcing cooperation is a common phenomenon:
Pinterest:
Pinterest has tried to build a data-processing platform on its own via Amazon EMR, but eventually decided to use the services provided by Qubole because of its inability to control stability and the rapid increase in data volume. On Qubole, the third-party platform, Pinterest has the ability to handle the vast amount of data generated by its 70 million users every day, and to complete including ETL, search, ad
Different kinds of data processing methods, such as hoc query. Although Pinterest is also a technical company and has good enough engineers to set up a data processing team, they still choose a professional team like Qubole to complete the data processing service.
Nike:
Not only are Silicon Valley's internet companies, but many traditional companies are increasingly starting to use big data-related technologies. A typical example is Nike. Nike has cooperated with API services company Apigee since 2012, on the one hand, through Apigee API to improve the company's internal data management system, so that the data of various departments to integrate, so that the internal operation of the company more smooth and efficient. On the other hand, they also develop Nike Fuel Band related mobile products via APIs. In 2014, it opened the Nike.
The Fuellab project, which opens up the API, allows a large number of developers to exploit data-analysis products from Nike's vast collection of data, successfully linking Nike's traditional retail business, new technology development, and large data values.
(Responsible editor: Mengyishan)