Ai technology plays an increasingly important role in the field of large data, and it technicians have an increasing interest in IBM Watson. Recently, Li Yonghui, an outstanding engineer at IBM's Ministry of Science and Technology, attended the 2014 China Large Data Technology conference and received an exclusive interview with CSDN Cloud computing, taking Watson as an example to analyze the practice and prospect of the application of AI technology in the industry. Li Yonghui believes that the combination of structured and unstructured data generates wisdom is the future direction of large data, and that the cognitive computing technology represented by Watson is to achieve this goal, and its capabilities will be integrated into various industry solutions or IT products, including the adoption of a cloud delivery model.
The contents of the interview are summarized as follows:
CSDN: Let me introduce you to some recent developments in IBM's understanding of large data technology.
Li Yonghui: First of all, congratulations on the success of the 2014 China Large data Conference, we are delighted to see more than 1000 people attending the conference, including various manufacturers and many overseas experts, which represents the great influence and concern of large data in China. From the IBM point of view, large data 4V characteristics (including: Volume data capacity; Produced data type; velocity of velocity data; Veracity data authenticity or value data in the future will have a lot of different development, so that the future of the system from the hardware, software development direction will focus on these 4 v cooperation.
IBM believes that the traditional database technology for structured data processing, as well as the NoSQL technology for unstructured information, will be greatly developed in the future. We want to create insights to guide business innovation with structured data and unstructured data, which is a major direction for us to see the future development of large data areas.
CSDN: The mining of different types of data value, IBM also now special attention to artificial intelligence, what do you think the big data problems need us to adopt artificial intelligence technology to solve?
Li Yonghui: IBM classifies AI as a development direction of cognitive computing, which is characterized by the processing of traditional structured data to future large data and unstructured flow data, from simple data query to future discovery data and data mining. Now we just look at the data produced by people, there may be more sensors in the future of data generated, the network of things generated data, machine-generated data, wearable equipment generated data, etc., need a more intelligent analysis system to help select.
As for the direction of future use, with the cost of computing resources reduced, the opening of tools and the development of some open source products, the threshold will be less and less, and the future of cognitive computing will be in many directions, for different industries will have its own characteristics, and each developed products will have a lower threshold, Integrate with some IT products in the future or use it to develop together.
Csdn:watson is the representative of IBM's cognitive computing technology, how do you think its advantages in industry are reflected?
Li Yonghui: Watson was designed and studied by IBM to commemorate the 100 anniversary, and took part in the 2011 American video game Show "The brink of danger" Jeopardy and won the second man-machine war. We use the depth of question and answer (DEEPQA) technology, through semantic analysis to capture key words, analysis of the nature of the problem, the solution to a number of machines, parallel to do the search comparison, combined to obtain a evidence based (Evidence based) analysis results.
The feature of this machine is the combination of IBM's best technology in every field, the hardware platform is a scale-out to 2,880 IBM Power Processor Core cluster, with IBM GPFS parallel file system to do high-performance computing, flexible expansion, highly parallel and tiered storage management, The software combines IBM's many years of research and development, including human natural language parsing ability, self-learning ability, unstructured data analysis functions together, so it is IBM's strongest device and the best software combination. Future Watson technology will be used in a variety of areas, Watson's outstanding technology will be integrated into IBM's new hardware, software and services products.
CSDN: You just said that there are various solutions for different industries, can you introduce different industry solutions based on Watson? Are there some that can be delivered through the cloud environment?
Li Yonghui: After winning the second man-machine war, we wanted to commercialize Watson, through its technology into our products, and also with different industry experts. The first thing we choose is the medical profession, which hopes to solve the problem of cancer treatment that humans are not currently dealing with, or to suggest a cure for cancer.
The second industry we cut into the financial industry, because we see a lot of demand in the financial industry, such as risk management, customer relationship management, as well as the analysis of financial statements and so on, in fact, can be through Watson's analytical ability to provide assistance. We have support from different industries in the future.
In the cloud of environmental development, one of Watson's future plans is to turn to a service-like approach, including providing some APIs in the cloud to provide usage. There are currently 7 services on the IBM Bluemix platform for users to use. Many of these services are related to language parsing and analysis. Visible in the future, we will continue to provide more power to the online.
CSDN: Language analysis/Analysis in large data also has a lot of applications, but Chinese and foreign languages will have some differences, need localization, Watson system for China did what localization research and development? What other successful cases are there?
Li Yonghui: The analysis of language has a great relationship with the culture and language use habits of every country, and it needs a process of learning. In the service provided by Watson, only part of the service currently has a Chinese differentiation, to facilitate the next step in the application of large data analysis applications. At the same time, IBM in China set up a Chinese research institute, but also for the Chinese culture related technology research and product development, in the future will have a better combination.
Watson is not like the general machine is to sell a set of hardware, or sell a set of software to customers, but to provide a platform, hoping to have a good combination with the industry, is the combination of the customer's own way of development to apply. To promote the commercialization of Watson, IBM this year set up a new Watson business group to provide a combination of Watson for some special industries. Just mentioned that our first combination is a combination of the medical industry, in the United States there are some medical industry enterprises to study together. In addition, in the financial industry we have a global and Citibank, DBS and other cooperation. Since some of Watson's services are open to the web, we anticipate that there will be a variety of applications that will be integrated into the future.
CSDN: What did you say about anticancer applications, and now what is its latest development?
Li Yonghui: Anti-Cancer is our first application, in 2012 years or so, after 1-2 years of installation, application debugging and learning, the most important is learning, because the medical industry involves a large number of historical data, including patient cases and a large number of medical periodicals and so on. Through continuous machine learning, the system provides an evidence-based proposal to some physicians to help them determine the next step in the diagnosis and treatment of cancer. Because machines cannot make medical decisions for doctors, Watson will only provide a suggestion and list the data or link chains behind the recommendations, and Watson will provide an evidence-based analysis of the results or recommendations when doctors make judgments.
There have been some practical successes in the United States, where they found very rare symptoms in the process of treating some cancer patients, and fewer than 10 doctors worldwide knew the way to treat them, but through machines to learn the latest medical journals and research reports, He can tell all physicians to consider the next best action or provide evidence based analysis results. In general, compared with the vast medical knowledge, doctors can spend less time on learning new things every year, and with the help of machine learning, doctors can increase the accuracy and reliability of symptom diagnosis.
CSDN: Anticancer may be the biggest problem facing the medical profession at present, is it possible to use the same principle to get the solution for other problems?
Li Yonghui: From the solution, in the field of cancer can also be used, but every industry, and even the medical industry within each of the professional, have their own professional knowledge, to do related information collection, analysis and so on, after debugging can be used. So the two cases that we have worked with the medical industry before will have a better result than the 1 years of cooperation.
CSDN: The training cycle is related to the amount of data, the complexity of the illness, and what factors are involved?
Li Yonghui: First of all, each industry specialized knowledge has its terminology, those parsing, analysis language may not understand. IBM Watson Some technology is to find some new terminology, and then know the relevance, know that knowledge is important or unimportant, and then this information to scan the data when we will know to find some relevant data out, this is a difficult point.
Secondly, each industry, especially the professional industry, its own development is very fast. Like the medical profession, the emergence of biotechnology, as well as wearable equipment, the body's quality, body pathology characteristics of the collection of data may be a huge increase, how to put those historical medical data and new data to cross the intersection of the analysis of a useful data, this is a challenge in the future.
In addition, the mechanism of the machine itself may need to be handled in a more efficient manner when dealing with large amounts of data.
CSDN: You always emphasize open source and openness, what kind of efforts do we have in these two aspects?
Li Yonghui: IBM is an important contributor to the open source industry. One of the international standards in the field of language analysis is Uima, which provides a common platform for unstructured analysis, reduces repetitive development, and in fact has a lot of IBM contributions, which is one of the main technologies that Watson develops in the process of developing voice analysis. In the next step, we will develop a level-extending technique through Watson to achieve a significant parallelism in voice analysis. As for openness, just mentioned that a core processor like IBM's power--is very high-performance, and this is the first time the industry has a high-end processor technology open to the industry through the OpenPOWER Alliance, and some CAPI high-speed IO interfaces will also open up, We expect to bring a lot of hardware innovations and changes to the IT industry in the future.
CSDN: You just mentioned that Watson had some services to put on the web, what is the need for third party developers to apply some of Watson's results to their skills?
Li Yonghui: We see that the future of Watson's use scenario may have two directions, one that allows more users to use Watson's capabilities through an open API or SaaS, which is essentially a service offered in a free way, and a professional system, IBM will be more closely integrated with the industry to develop the characteristics of the application, with the industry to integrate every analytical system requires a lot of training, each industry requirements are different. As we have just cited, the treatment of cancer in the medical profession is not the same as the treatment of colds.
CSDN: Can you sum up the key research and development direction of Watson in the future?
Li Yonghui: IBM is an IT company that is very focused on research and development, and one of the benefits is the combination of hardware, software and services available, including cloud services. In addition to leveraging the results of the OpenPOWER Alliance, such as better integration of GPU, FPGA acceleration, memory sharing and other technologies to enhance the large-scale real-time processing capabilities of the hardware platform through CAPI high-speed connection channels, a major development direction of future Watson, is to expand the number of different industries combined with the so-called cross-industry combination of different industries, but also through the cloud, to provide more and better services to the general public, so that they can also share the achievements of Watson's research and development.
CSDN: What advice does IBM have to implement big data if he wants to use artificial intelligence technology?
Li Yonghui: First of all, we suggest that enterprises should understand his industry, within the scope of business, what can be combined with large data, or unstructured data, to further enhance their services, we see some areas such as customer relationship management, risk management, and so on, combined with large data analysis, such as social data analysis, Would be of great benefit. This is the industry, and the first step is to dig out the data that is relevant to the business chain. The second step, the proposal to start small, successful can be used on a large scale. In addition to the traditional database, the use of some unstructured data analysis tools, combined with these two, can better achieve the next step of development.
From the choice of platform, we should also consider the future development, how to improve the utilization rate in a multi-tenant environment, how to provide a better platform. IBM hopes that through power hardware, some IBM software products can give customers a more and better choice.
CSDN: Finally, what are your feelings and suggestions for this year's BDTC conference?
Li Yonghui: First of all, thank the organizers for inviting IBM to participate in the 2014 Big Data Technology conference. We hope that more people will join the Big data field in the future, and it is hoped that at this conference, IBM and China's local technical experts can make more exchanges across the region, so that China continues to develop in the depth of large data. At the same time, also hope that China's IT technology industry in the future can be more and even contribute research and development results to open, open source and other communities to jointly promote the development of the industry.