"Bdtc Sneak Peek" Chen Yixin: Most concerned about large scale data mining algorithms and dimensionality reduction techniques

Source: Internet
Author: User
Keywords Search engine China Unicom BDTC BDTC2014 BDTC sneak peek machinelearning Chen Yixin

December 2014 12-14th, hosted by the China Computer Society (CCF), CCF large data Expert committee, the Chinese Academy of Sciences and CSDN co-organizer of the 2014 China Large Data Technology conference (DA data Marvell Conference 2014,BDTC 2014 will be opened at Crowne Plaza Hotel, New Yunnan, Beijing. The General Assembly lasts three days to promote the development of large data technology in industry applications. To set up a "large data infrastructure", "large Data ecosystem", "large Data Technology", "large Data Application", "large data internet finance technology", "intelligent information processing" and many other theme forums and industry summits. Sponsored by the China Computer Society, CCF large data committee of experts, Nanjing University with the co-organizer of the "2014 second CCF large data academic conference" will also be convened, and the technical conference to share the theme of the report.

The Conference will invite top experts and front-line practitioners in nearly 100 foreign data technology fields to discuss the latest development of OSS, YARN, Spark, Tez, HBase, Kafka, oceanbase, etc., Nosql/newsql, memory calculation, The development trend of flow calculation and graph computing technology, OpenStack ecosystem for large data computing needs, and large data visualization, machine learning/depth learning, business intelligence, data analysis, the latest industry applications, sharing the actual production system of technical characteristics and practical experience.

Before the meeting, CSDN and the conference "Big Data Application" speakers of the University of Washington tenured professor, China Unicom Institute of Large data/cloud computing scientist Chen Yixin a simple communication. Chen Yixin that Hadoop solves the problem of large data storage, but for the telecommunications industry, to realize data value mining and analysis, but also pay attention to large-scale data mining algorithms, dimensionality reduction Technology and "interpretation of Machine learning" (interpretable Machine Learning) and other technologies.

Chen Yixin will share and exchange more of the dry goods of large data of telecommunication on December 14 "2014 China Big Data Technology conference". According to him, the content will include the content of cross-border cooperation with the financial industry, I believe that can let all walks of life large data owners, technicians, marketing personnel to be inspired to broaden their thinking. Click here to register!

Chen Yixin

Professor of Life, University of Washington, China Unicom, major data scientist

Ph. D, professor of computer science, University of Washington, Beijing Union Medical College Health Statistics, China Unicom Institute of Large data/cloud computing chief scientist, China Technology department 973 project leader.

Research areas for data mining, medical large data, artificial intelligence, cloud computing and so on. In Tkde, TKDD, Jair, AIJ and other world-class periodicals and ICML, KDD, Ijcai, AAAI and other top-level conferences and published more than 100 papers. A member of the procedural committee of a number of top academic journals in a large data field and a number of first-class international conferences. For the National Science Foundation, the Hong Kong Research Fund Committee, the Austrian National Science Foundation, the Swiss National Science Foundation, China Science and Technology Evaluation Center of the Scientific and Technological Assessment Committee. China University of Science and Technology undertakes the Ministry of Education 111, one of the experts in the Expert Group, China Computer Society, the First Committee of large data experts. Its research continued to be funded by the National Science Foundation, the United States Department of Energy, the National Bureau of Health, the United States Center for Research Science, the United States Microsoft, the Sloan Cancer Center, the American Bath Jewish Medical Fund, and the China Ministry of Science and Technology 973 program. Awarded the best Paper Awards for international conferences such as KDD (2014), Aaai (2010), Ictai (2005), ICMLC (2004), and ICDM (2013), RTAs (2012), KDD (2009), ITA (2004) Nominated for the best Paper Award for international conferences. Its groundbreaking research was won by the Microsoft Youth Professor Award (2007), the American Energy Science Computing Center launched the Project Assignment Award (2007), and the DOE Distinguished Youth Professor Award (2006).

Chen Yixin interview questions and answers are as follows:

CSDN: What large data technologies have you used in your company? What are your satisfaction with these technologies and where are you dissatisfied?

Chen Yixin: Unicom is currently using the main large data technology focused on the field of storage and query, the typical representative is the Flow log query system, this system is based on Hadoop architecture for the underlying technical framework, the application layer to hbase for support. This technology is very good to support the current Unicom's Internet traffic query work, but also effectively for Unicom to reduce operating costs, the effect is significant, but Hadoop itself can do data mining and analysis work is relatively limited, to some extent, limited unicom in the field of large data further development.

CSDN: According to your understanding, the current similar enterprises, in the data, the biggest difficulties encountered?

Chen Yixin: The biggest difficulty is the combination of large data analysis techniques and application scenarios, Hadoop's biggest contribution is to solve the problem of large data storage, large data "big" problem has been solved, large data to really solve is the problem of mining data value, and this problem is not only a simple large data technology use problems, How to combine the achievements of machine learning, artificial intelligence and other decades of research and development into the business scene for data product innovation, large data business innovation, and even business model innovation is the most difficult.

CSDN: What are some of the technologies you are looking at and studying in large data areas, and why are you bullish on them?

Chen Yixin: In addition to Hadoop, we are now tracking a variety of new large data technologies, such as memory based analysis tools spark, stream data analysis stream, social analysis Graphlab and the MPP based on the memory database, etc. However, the most concern is still large-scale data mining algorithms and dimensionality reduction technology, regardless of the tool or architecture, ultimately need to rely on advanced algorithms to support. Distributed processing can only bring the acceleration of the constant level, and the efficiency improvement that the algorithm brings is can be the polynomial level or even the exponential level. So, the final solution to many of the complex problems in practice is to rely on algorithms, not just mapreduce distributed architectures. For example, the MVC algorithm that we put forward at the 2013 ICML Conference is to increase the scale of the data set which can be effectively processed by the algorithm by reducing the complexity of n three times to n squared.

In addition an important technology is "interpretive machine learning" (interpretable Machine Learning). In many fields of application (such as electricity quotient, medical treatment, etc.), we not only need to obtain accurate and reliable data models, but also hope that these models can be interpreted, understood, and thus transformed into marketing, treatment and so on action decisions. The FFD classification algorithm, which we presented at the 2014 KDD Conference, won the best Student Thesis award for some breakthroughs in hermeneutics. The explanation of machine learning has been paid attention to, in addition to us, at present Harvard University, MIT, Seattle University of Washington, Cornell University and some other machine-learning scholars are also conducting research.

CSDN: Please talk about the topic you are about to share at this conference.

Chen Yixin: For large data in this hot area, the current focus I think is to find the application of this technology breakthrough, which requires a good industry knowledge and technical strength to be able to do. I will share two cases one is "search engine project", this project is our understanding industry problem based on the use of search engine this technology to solve the traditional marketing problems, this search engine is a major feature of "search people" rather than search documents. Another case is "financial credit", the biggest feature of this case is "cross-border cooperation", is our cooperation with the financial industry to match two industry data using large data technology and algorithms for personal credit rating, to achieve small personal consumption loans for rapid approval. I hope two cases will inspire you.

CSDN: Which listeners should know these topics best? What topics can you share to help your audience solve problems?

Chen Yixin: All walks of life large data owners, technicians, marketing personnel can understand this topic, from this topic to get some inspiration, broaden the way of thinking, we also just play a role. Because large data applications in this field is often a cross-border behavior, not the same industry, the impact of the collision, not the same industry data collisions, only different industries, different practitioners of the exchange can stimulate the development of this technology and even promote the development of various industries. Data owners can learn from our topic how to play their own data advantages how to capture greater value, the benefit of society, technicians can refer to how to transform technology into products, marketers can understand how to open up new business opportunities.

CSDN invites you to participate in China's large data award-winning survey activities, just answer 23 questions will have the opportunity to obtain the highest value of 2700 Yuan Award (a total of 10), speed to participate in it!

National Large data Innovation project selection activities are also in full swing, details click here.

The 2014 China Large Data Technology Conference (Marvell conference 2014,BDTC 2014) will be held at Crowne Plaza Hotel, New Yunnan, December 12, 2014 14th. Heritage since 2008, after seven precipitation, "China's large Data technology conference" is currently the most influential, the largest large-scale data field technology event. At this session, you will not only be able to learn about Apache Hadoop submitter uma maheswara Rao G (a member of the project Management Committee), Yi Liu, and members of the Apache Hadoop and Tez Project Management Committee Bikas Saha and other shares of the general large data open source project of the latest achievements and development trends, but also from Tencent, Ali, Cloudera, LinkedIn, NetEase and other institutions of the dozens of dry goods to share. There are a few discount tickets for the current ticket purchase.

Free Subscribe to the "CSDN large data" micro-letter public number, real-time understanding of the latest big data progress!

CSDN large data, focus on large data information, technology and experience sharing and discussion, to provide Hadoop, Spark, Impala, Storm, HBase, MongoDB, SOLR, machine learning, intelligent algorithms and other related large data views, large data technology, large data platform, large data practice , large data industry information and other services.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.