In 2006 Jonathan Goldman went to the business social networking site, LinkedIn, when LinkedIn was just a start-up, with fewer than 8 million registered people, but many members invited their friends and classmates to join, so the number of registrations increased rapidly. But it is not easy for users to find users who are already registered on the site, and the ratio does not meet the expectations of managers. Obviously, some social experiences are missing. As a LinkedIn manager says, it's like when you get to the conference reception and find out that you don't know anyone, you have to stand aside and you may leave early.
Goldman Sachs is a Stanford physics ph. D., and he is fascinated by the increasing number of user connections and rich user profiles. These could have only brought a jumble of data and clumsy analysis, but when he began to explore the connections between users, he began to see new possibilities. So he began to organize his theory, test his conjecture, build models, and predict who the users would be willing to connect with. He felt that the new features he was developing could bring value to the user. But LinkedIn engineers were so busy raising the web's performance that they ignored it, and some colleagues openly expressed a lack of confidence in Goldman: Why do users want LinkedIn to tell them which users to connect with? The website already has a function to import the address book, can import all of the user's contact person.
Fortunately, the company's joint creation of the world and then CEO Reid Hoffman based on his experience in PayPal, believe that the power of data analysis, gave Goldman a high degree of autonomy. One of them is that Goldman can bypass the traditional product release process and advertise the small module on the site's most popular page.
Through this module, Goldman began his experiment, and the user may know someone, such as a user from the same school or work unit, but have not established these relationships on the site, and how they would react if the names were told to the user. Based on the background information that users fill out on the site, he identifies the three users each user may want to connect with and then customizes a set of ads. Within a few days, it was obvious that something wonderful had happened and that the hits had never been higher. Next, Goldman improved his recommendation based on the closed-loop theory, which means that if you knew both Zhang and Doe, then Zhang and Doe would probably know each other too. At the same time, Goldman and his team gave users a key to each recommended operation.
Soon LinkedIn's top brass began to realize that this was a good idea, and it was listed as a standard feature. Since then, things have really started to take off. "The person you might know (arranges you Know)" has a 30% click-through rate, higher than any other promotional ad in the station, resulting in a total of millions of new page views. Thanks to this new feature, LinkedIn's growth rate has increased dramatically.
New career
Goldman Sachs is a good example of the important new members of the organization, "data scientists." This is a very high level of professional position, to have in the data ocean treasure hunt curiosity and corresponding training. The title has been around for several years, and the first appearance was 2008 by D.J Patil (one of the authors) and Jeff Hammerbacher, who later became head of the data and analytics team at LinkedIn and Facebook. But now there are thousands of data scientists working for startups and mature big companies. Their sudden popularity in the industry reflects the current situation, and the information companies need to deal with is emerging from the scale and channels they have never met. If your organization stores a few petabytes of data, or the most important information for your business is tabular, no longer a row of data, or the answer to your biggest problem requires a variety of analytical means of the "mix", you catch up with the big data age.
At this stage, the main enthusiasm for large data is focused on the processing of large data, such as the use of the most extensive Distributed file processing system Hadoop, and related open source tools, cloud computing, data visualization technology. These breakthrough technologies are very important, as important as those who have the ability and brainpower to use good technology. The rapid increase in demand for data scientists has surpassed supply, and in fact, the lack of talent has begun to severely constrain certain industries. Greylock, an investment start-up venture venture company that has invested in Facebook, LinkedIn, Palo Alto NX and Workday, is worried about tight talent reserves and has built up its own recruiting team, Responsible for the delivery of talent to their own investment companies. "These companies, once they have data, need someone to manage the data and discover the truth," said Dan Portillo, head of the recruiting team. ”
Who are they?
Profiting from big data requires hiring scarce data scientists, and managers face three major challenges, identifying, attracting, and leveraging talent. These three tasks are less straightforward than other positions with clear responsibilities. First of all, there is no accepted standard for college projects to train relevant talent, at the same time, where data scientists are located in the organization, how to create maximum value and how to measure their role.
So to dig out the data scientists, first understand what they can do in the business, and secondly, what skills do they need? What existing areas use these skills?
The primary task of data scientists is to discover in the ocean of data that they prefer to look around the world in this way. They will be able to navigate the digital realm, turning large amounts of scattered data into structured, analytical data, identifying rich data sources, consolidating other potentially incomplete data sources, and cleaning up the resulting dataset. In a new competitive environment, where challenges are constantly changing and new data is flowing in, data scientists need to help decision-makers navigate through various analyses, from temporary data analysis (ad hoc) to ongoing data interaction analysis.
Data scientists will encounter technical limitations, but will not allow technology to obstruct their search for novel solutions. When they find something, they communicate their findings and suggest new business directions. Often they are creative in displaying visual information and making the patterns clear and persuasive. They advise product managers and supervisors on the laws that are contained in the data, affecting products, processes, and decisions.
As the business is still in its infancy, data scientists often promote their own tools, even academic research. A group of data scientists hired before Yahoo developed Hadoop. Facebook's data team has developed a hive language for programming on Hadoop. Many other data scientists have enriched or optimized the set of tools, especially data-driven companies such as Google, Amazon, Microsoft, Wal-Mart, Ebay,linkedin, and Twitter.
What kind of person has the ability to do this? What skills make data scientists successful? You can think of them as data hackers, analysts, communication gurus, trusted consultants, and these things combine to be extremely powerful and rare.
The most basic and versatile skill for data scientists is writing code. Maybe five years later, many people will print "data scientists" on their business cards. A more valuable skill is to communicate with all relevant aspects of language, the other is the special ability to tell stories with data, either verbally or visually, or both.
But we feel that data scientists dominate the quality should be a strong curiosity, want to go deep into the problem within the desire to find the core of the problem, extracted into clear conclusions, and to withstand the test. For example, one of the data scientists we know, he studied fraud, but he found that the problem is very similar to the DNA sequencing problem, after merging two completely unrelated worlds, he and his team found a solution that would significantly reduce fraud losses.
Now you know why this new character will be called "scientists". Experimental physicists, for example, also need to design instruments, collect data, experiment repeatedly, and eventually show results. As a result, many companies are looking for people who can handle complex data, but many of the most talented people are in the field of physics or social science. Some of the best and most promising data scientists are PhD students in complex science, such as ecology or systems biology. George is head of the data science team at Silicon Valley Intuit, a graduate of astronomy. More generally, many data scientists in the industry today graduate from Computer science, mathematics, economics, and any data and computational-intensive fields.
(Responsible editor: The good of the Legacy)