Cloudera brings machine learning open source tools for Hadoop Oryx
Source: Internet
Author: User
KeywordsMachine learning currently recommender systems open source tools recommendation engines
Cloudera, a Hadoop publisher, did not cause much concern when it bought a london-based start-up company last year Myrrix, and Cloudera rarely promoted the company's technology in machine learning. But Myrrix's technology and his founder Sean Owen's value and influence in machine learning are not to be underestimated.
Owen is currently developing an open source machine learning Project--oryx (Oryx, Cloudera also sells a product called Impala, Impala).
Oryx's goal is to help Hadoop users build and deploy machine learning models that can be queried in real time, such as spam filtering and recommendation engines. As data continues to flow, Oryx will support Self-renewal.
Oryx can be extended on demand, whether from modeling or deployment, and Owen sees this as the traditional "sweet spot" of Oryx and Hadoop-the biggest difference between exploratory analysis and operational analysis.
Owen believes that the traditional technology for deploying machine learning on Hadoop--apache Mahout has come to an end.
"Mahout is constrained by the first generation MapReduce can only deal with the limitations of the batch task, users need to do a lot of work to build and machine learning system to run, and Myrrix rewrite mahout, solve all the old problems. If Mahout still have medicine to save, Cloudera will not buy Myrrix. Oryx almost 90% of the code comes from Myrrix, and some of the code comes from Cloudera, "Owen said.
Open source recommendation engine that everyone can use?
Oryx positioning is not a library of machine learning Algorithms, Owen focuses on four key points: regression, classification, clustering and collaborative filtering (that is, recommended). The recommended system is very popular, and Owen is working with several Cloudera customers to help them use the Oryx deployment recommendation system.
The practice of Oryx as a standardized tool for developing recommender systems will give the project a great deal of attention, as the recommendation system is almost standard for mainstream websites, and both the electrical and content sites require a referral system to improve the user experience and conversion rate of the site. But the biggest problem with recommendation engine technology is the lack of standards and open source tools.
The company dedicated to recommending technology standardization is not only a oryx, another cloud computing start-up company mortar data is also actively promoting the development of user recommendation engine technology, and show the advantages of its open source recommendation framework. Other companies injected expect labs, although not open source, tried to automate the recommendation system via the AI API interface.
Not yet a product
Owen believes that all of Cloudera's clients (and most of the Hadoop users) eventually want to deploy an operational machine-analysis system-not just a recommendation, Oryx could become an implementation tool in the future, but Oryx is only an experimental project at the moment.
Owen is still spending a lot of time as a contributor to the Apache Spark Project, and he wants to rewrite Oryx to spark instead of MapReduce as the main processing framework because spark has become a popular technology for next-generation big data applications. Because performance and speed are superior to mapreduce and easier to use, spark currently has a large user and contributor community. This means that spark is more in line with the requirements of the next generation of low latency, real-time processing, and iterative computing for large data applications, including real-time machine learning systems based on oryx development.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.