Graphlab: Applying large data analysis from concept to production

Source: Internet
Author: User
Keywords Large data Graphlab computer science
Tags adobe analysis application applications based big data business code


Graphlab provides a complete platform for organizations to use scalable machine learning systems to build large data to analyze products, including Zillow, Adobe, Zynga, Pandora, Bosch, ExxonMobil, etc. They data from other applications or services, and transform large data concepts into predictive applications that can be used in production environments through system models such as referral systems, fraud monitoring systems, emotional and social network analysis systems.



Carlos Guestrin is the co-founder and CEO of Graphlab and the amazonprofessor of machine learning at the University of Washington. As the internationally recognized leader in the machine learning World, Carlos has received a number of honors-Popularscience magazine as 2008 "Brilliant 10", with an outstanding contribution from the AI field for Ijcaicomputers and thought Award, meanwhile he is also the president of the American young scientist who won the prize.



This article is based on Graphlab chief Executive Carlos Guestrin's QA content on AWS Services.



Q: What is machine learning? What has been the development of the past 10 years?



Carlos Guestrin: Machine learning is a science that assumes that computers can learn from models by reading a great deal of data, and that the knowledge they learn will be used as a basis for automatic accurate forecasting and decision making. In the past 10 years, we have seen that machine learning has been used in areas such as unmanned vehicle driving, online store preferences, marketing positioning, credit card fraud, and so on. Given its ability to transform "big data" into insights that improve production and life, diversification and large volumes of data make machine learning a hot investment direction.



Q: Can you share the story behind Graphlab's establishment? Why do you start such a business?



Carlos Guestrin:graphlab Prototype was born in 2008 at Carnegie Mellon University, led by me, with two of my students, who were ph. D. and postdoctoral. Prior to this, the team has been working on advanced graph analysis applications. To achieve certain goals, they need to build tools with higher scalability. These tools have received a lot of attention when they were built, and a simple seminar even attracted more than 300 people to participate, 10 times times as expected. This result shows that the market has a large demand, but also proved the advantages of platform design. At that time, the team took advantage of EC2 's ability to make subversive progress in graph analysis and asynchronous communication, and had an order of magnitude performance advantage in comparison with similar graph analysis systems.



By 2012, my wife and I, the same professor of computer science, were considering a new job. While Jeff Bezos persuaded us to go to the University of Washington, the founder and chairman of the Amazon met with our couple and identified two University of Washington machines studying the Amazon Professor position. Then we moved to PNW and met some talented students who wanted to do well in the emerging Big data analysis field. With the support of Madrona Ventures and NEA, the Graphlab company was officially born in March 2014, and the first commercial edition Createtm was released in beta form.



Q: Can you talk about the creation of Graphlab and how it simplifies large data analysis?



Carlos Guestrin: Now, turning raw data into insights and building a predictive application is still challenging and complex, requiring data scientists or equally knowledgeable software engineers to do so. At the same time, it is essential that this work also requires a large number of complex tools to collect, clean, model, analyze and display the results to the store or application. In many cases, the implementation of prototype code in a production environment is a lengthy and expensive process. As a result, many data scientists without programming experience will be useless, and institutions can hardly extract value from their data.



As it emerged, Graphlab provides a platform for data scientists with no programming experience to quickly translate ideas into products that the production environment can use. With a large number of Graphlab users, Graphlab create can help them quickly improve productivity and deliver value quickly without much programming experience and manpower.



Q: Does it support the predictive application deployed in AWS?



Carlos Guestrin: The process of raw data to business transformation predictive analysis often starts with a data scientist, a laptop, and a prototype that must validate key concepts in a large scale test. This process may be shortened by the fact that AWS is very easy to scale, but for many data scientists, it is still difficult to implement the code for the production environment.



This brings Graphlab opportunities. Graphlab create can be run in all AWS environments. With just one line of code to be modified, data scientists can migrate their notebooks based on Graphlab prototypes into AWS. Data sets and models of any size can be loaded and accessed from Amazon S3, while Graphlab provides deployment, monitoring, data pipeline optimization, and prediction services across the AWS cluster.



Q: Can you share some common use cases? How do people use Graphlab create?



Carlos Guestrin: Common Graphlab Create use covers a variety of fields:



Retailing: Recommendation Systems and price forecasts (e.g. airfare)



Financial services: Fraud prevention through conduct and Trade analysis



Biomedicine: Medical record analysis predicts disease, customized drug design



Communication field: Predicting customer churn



Social network analysis: Identifying key networks and community-affected people



Market and media: emotional analysis, Target lock-in



Q: Are applications not only used by enterprises?



Carlos Guestrin: Totally wrong. State and local governments use Graphlab to analyze public sentiment and determine which region's local infrastructure needs to be paid attention to; The biomedical research team uses GRAPHLAB to analyze clinical records to predict patients ' progression Various types of sensor networks use Graphlab to obtain valuable data to help improve air and rail transport safety. In general, Governments, research institutions, health care providers and services are expected to improve operational efficiency through effective data utilization.



Q: So, do early companies need to focus on data science? At what stage should startups start focusing on big data?



Carlos Guestrin: For companies of any size, data science and data-driven decisions are of great importance. Big companies can't just stop at historical analysis, they need to make old customer referral systems efficient, using leading-edge forecasting techniques including real-time analytics. Textual and affective analysis of the research and Comment fields can help to understand the user's emotions and thus reduce the incidence of accidents. Similarly, a start-up company with a business model based on data analysis is also very important, especially for sales, marketing, media, advertising and other fields. At the moment, there are startups with data science at their core, which create highly specialized custom services for a particular vertical area or application, such as the analysis of health care waste, supply chain optimization, and insurance claims.



All of these companies, regardless of size, share a common trait, which is that they have a lot of data, but lack data science resources and computing power. These are the advantages that AWS and Graphlab combine to provide, and by removing the expansion bottlenecks, the big data has been transitioning from hype to real production.



Q: Ten years later, what will the machine learning in your eyes do to the big data?



Carlos Guestrin: Ten years later, compared to the current data scientists and experienced engineers, machine learning will be in the hands of more people who will offer more productivity than they do now. For example, business analysts and line-of-business owners will rely more on the real-time profit forecast provided by the predictive services, and government, healthcare, and private sector service providers will be able to tailor their products to demand. At the same time, for the non-skilled people, machine learning and data-driven decision-making to enhance the independence of the value will be recognized.



What's the next move for Q:graphlab?



Carlos Guestrin:graphlab is on a popular machine-learning path, aiming to achieve the "universal machine learning" vision described above. In the immediate case, we are working on the flagship 1.0 version of the build, Graphlab create will be fully available on October 15. After the initial release, we will deliver the machine learning ability to all organizations. At the same time, we will see more large data requirements to be achieved.



Original link: HTTPS://MEDIUM.COM/AWS-ACTIVATE-STARTUP-BLOG/GRAPHLAB-BIG-DATA-ANALYTICS-SCALED-FROM-INSPIRATION-TO-PRODUCTION-A9891D059CDD



If you need to know the latest AWS information or technical documentation to access the AWS Chinese technology community, if you have more questions please ask at the AWS Technology Forum and experts will answer later.



Subscribe to the "AWS Chinese technology Community" micro-credit public number, real-time command of AWS technology and product information!



The AWS Chinese technology community provides an Amazon Web service technical Exchange platform for the vast majority of developers, pushing the latest news, technical videos, technical documents, wonderful technical blogs and other related highlights from AWS, as well as having AWS community experts to communicate with you directly! Join the AWS Chinese technology community to quickly and better understand the AWS cloud computing technology.



(Translator/Shirongyang Zebian/yuping)


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.