2014 China Large data technology conference officially fell on December 14, nearly hundred technical experts here to share their latest research and practical results, this article from China Economic Network management consultant Yang, the main interpretation of the relationship between large data and in-depth learning, as well as the future development of industry technology.
Below is the author's original text:
December 2014 12-14th, hosted by the China Computer Society (CCF), CCF large data Expert committee, the Chinese Academy of Sciences and CSDN co-organizer to promote large data research, application and industrial development as the main theme of the 2014 China Large Data Technology conference (DA Data Marvell Conference 2014,BDTC 2014) and the second CCF large data conference in Beijing new Yunnan Crowne Plaza Hotel Grand opening.
In recent years, both domestic and foreign internet companies and traditional enterprises have realized the potential value of data securitization and scale, how to store and process hundreds of TB or even EB level data at low cost and efficiently is a great challenge. "To value the data" makes almost every industry facing big data problems. "Big Data" sparked a new round of it "industrial revolution".
Cheng released 2015 year Big Data Development trend forecast
Chinese Academy of Sciences Research Institute of Computing, CCF large Data expert committee Secretary General Cheng at the meeting published a large data white paper and development trends report. "China's large data technology and Industry development White Paper (2014)" mainly introduces the background and dynamics of large data, typical applications of large data, large data technology progress, large data IT industry chain and ecological environment, and large data development trends and suggestions. At the same time for the 2015 large data development trend prediction, Cheng summed up for these words: integration, Cross-border, Foundation, breakthrough.
A large data analysis combined with intelligent computing has become a hot spot, including large data and neural computing, depth learning, semantic computing and other related technologies of artificial intelligence, which has become a hotspot in the field of large data analysis.
Second, the data science to promote the integration of multi-disciplinary, with the social digital degree gradually deepened, more and more disciplines in the data level tend to be consistent. A unified study can be carried out using similar ideas.
Cross-disciplinary data fusion analysis and application will become a major trend in the future development of large data analysis applications.
Four, large data will be connected with the Internet of things, mobile interconnection, cloud computing, social computing, and other hot technology areas of cross integration, resulting in a number of integrated applications. Internet of things and mobile computing strengthen the integration with the physical world and people, the large data and cloud computing to strengthen the back-end data storage management and computing capabilities.
Five, large data diversification processing mode and hardware and software infrastructure gradually tamping. Memory computing will continue to be the primary means of improving the performance of large data processing. The memory calculation represented by Spark is gradually moving to commercial and coexisting with Hadoop, which is specially designed for the system and hardware of large data processing, and the large data processing and diversification mode coexist, and the large data processing platform of integration is gradually becoming a trend. There is a point of view of this diversification to some extent become integrated, the future large data diversification processing model coexist and may become an integrated platform.
Vi. large data security and privacy remain worrisome.
Seven, the new computing model will make a breakthrough, crowdsourcing technology, may not be the future of large data-speaking depth of learning.
Eight, various visualization technology and tools to enhance the large data analysis. Before the analysis, the data should be explored. Visualization will play a big role in this process.
Nine, the large data technology curriculum system construction and the talented person training is the question which needs the high attention.
The open source system will become the mainstream technology and system choice in large data field.
Biopo sharing large data distributed machine learning platform
At the plenary meeting of the first day of China's Big Data technology conference, Biopo, ICML 2014, a professor at Carnegie Mellon University, said Biopo (Eric P. Xing), a large amount of resources are wasted on trunking communications in the face of the current data-processing platform. Even the better platform, the calculation time is only 20%, communication time accounted for 80%, such as Hadoop communication time accounted for 90%.
The Petuum, which his team developed, is a new type of large data distributed machine learning platform, which includes two sets of data and program parallel functions, and also has a better research and targeted use of the characteristics of machine learning. The approximate structure is to include a parameter server, provide a virtual parallel memory programming, in the programming time does not have to each machine to carry on the individual communication, also has one is called the scheduler, can the model effective segmentation, even is the dynamic segmentation, then does the distribution.
This parameter server has a programming interface where write memory reading does not require special instructions for each machine, using the more ingenious so-called half synchronization coordination mechanism, which can significantly reduce the time spent on communication, and enhance the time to calculate, so with the adjustment of the parameters of the synchronization, communication time will be significantly reduced, Reduced to even less than the computational time, so that the computer's resources are the most significant use.
Petuum is also open source project, the current observation not only can achieve a lot, basically equivalent to the best system now. According to Professor Biopo, who has just received the latest results from his students, it is surprising that there is a group that uses this system to make independent comparisons with Spark and Hadoop. The vision of the petuum system, which includes both software and the underlying software, is now a molecule in the Hadoop ecosystem, which can be opened for your own development after downloading.
Yukei: Deep learning will play a key role in the era of artificial intelligence
Baidu Research Institute deputy Dean, depth study Laboratory director, picture Search Department senior director Yukei introduced Baidu in artificial intelligence development. The most important technology for internet companies is artificial intelligence based on large data. What is AI? Perception, thinking and control are several important aspects of artificial intelligence. The real intelligent system, with the evolution of experience, the more intelligent! and experience is data.
Yukei in his speech that: one of the essential characteristics of artificial intelligence is the ability to learn, that is, whether the ability of the system with the accumulation of empirical data and constantly evolve and improve. So the advent of the large data age provides unprecedented opportunities for the development of artificial intelligence. In the context of this era, deep learning in all aspects of the breakthrough progress is not accidental, because from a statistical and computational point of view, depth learning may be the best way we can find, for the vast number of data to find complex laws.
In addition, he believes that there are two significant advantages to deep learning: first of all, it is a rich set of modeling language, or modeling framework, we can use this set of language system to express the rich relationship and structure of the data, such as the 2D space structure in the image, the temporal structure of natural language, and secondly, Deep learning is almost the only end-to-end machine learning system, its goal is to directly affect the original data, automatically layer by Step Data feature transformation, the whole learning process directly optimize a problem related to the objective function, and traditional machine learning is often broken down into a few discontinuous steps, is not the goal of consistently optimizing a whole. Therefore, the era of interconnected things, data explosion, but also means the advent of artificial intelligence era, in-depth learning will play a key role. From now until 2020, we will see continuous breakthroughs in artificial intelligence in the areas of speech recognition, computer vision, natural language understanding, robotics, and autopilot.
The age of intelligence: will large data be equated with depth learning?
The organizers of the meeting arranged for a group visit by the media to several experts. I asked experts such as Guojie a question: "Just Cheng Secretary-General said the big Data panel experts voted to choose next year's Big data technology trend is to say that large data with neural computing and artificial intelligence, some members said next year first is to the domestic big internet companies in the promotion of large data development has more expectations, Does this mean that 2015-year data is equated with AI, or is it a sign of deep learning? ”
"Guojie" so many experts to equate large data with AI, but not only large data, the whole intelligent technology, is China in the future a very important direction. The past few decades is digital, next is automation, then networked, intelligent also done, but relatively not so important, but the more later, now found that the demand for intelligent more and more high, I hope it contributes to the industry more and more. So intelligentization must be a very important subject matter. But there is no end to intelligence, for example, as if the shadow of a person, like the sun over there, the old want to step on the shadow, is always a boundary. What was thought to be intelligent is now considered not intelligent.
Now intelligence is a little bit of a drag on the new economic normality that we all care about. Because China's economy was driven by factors, it is now slowing GDP growth, and the next step is to increase GDP, with the innovation of the human brain. In the past so many years to engage in, is relying on "sinews" extension, rely on sweat. And the future of artificial intelligence by innovation, by the brain.
This is the beginning of a new era, so the so-called normal is not going to decline, I think it is going up, to ascend into more people to find ways to intelligence. Environmental pollution, resource depletion of the contradictions will be reduced. Because consumption of brain and consumption of resources are two problems, but consumption of brain may cause bipolar society in the world is very serious, income and so may be several times, smart people and not smart people in the future in the social gap, more reflected. How to make the whole society fairer and more popular, this is a new subject.
"Biopo" You just that question is that big data and AI and depth learning how to equate? Is such a relationship, large data is a research object, artificial intelligence is a goal, we have to achieve the goal of artificial intelligence, to understand the data, the methodology is machine learning or intelligent computing. Deep learning is a finger inside 10 fingers in machine learning. For example, just the teacher talked about the network security, to encrypt data to do a classification, you can not use the depth of learning to solve. Therefore, the relationship between the three is not equal to the relationship between the equals, not even the tolerance of the relationship, or different levels of the argument. Like deep learning in artificial intelligence, machine learning where can be used, where can not be used, in the academic sector and industry are very clear. Less mature we will slowly study, but this does not mean that the depth of learning is all-encompassing.
Recently, some members of the community, such as Hawking, like the CEO of Tesla, they say that artificial intelligence will lead to revolution, or artificial intelligence will be more powerful than humans, this should be popular science and entertainment topics, should not be taken seriously. Artificial intelligence is not a substitute for human subjects, it should do things that people can not do. For example, artificial intelligence, no scientist has great interest in doing bionic robot. So bionic and artificial intelligence is two things, engineering level and technology success Bionic example is not many, we are through bionics do some popular science propaganda, such as aircraft. The principle of aircraft and the flying principle of birds is completely different, machine learning and the brain operating principle is completely different, they are talking about the popularization of work. For AI we cannot equate it with depth learning, as if studying the human brain can achieve artificial intelligence.
"Cheng" I agree with Professor Biopo's view that the big data and artificial intelligence must not be equated, the large data itself can be used as a discipline in the future, but it is still a phenomenon. But on the other hand, we are talking about the conclusion that we call intelligent computing, that is to say, how to express the intelligence in large data calculation, or solve the problem of intelligence, which may be the concern of academia and industry for some time in the future, including machine intelligence, artificial intelligence, all kinds of intelligence.
What problem does AI solve? Solve people's thinking? Or solve the people's predictions, or solve the phenomenon? What is the mechanism of artificial intelligence, in short, at least large data can be promoted in the category of artificial intelligence to make predictive decisions.