Xionghui: How did I harvest the data mining life?

Source: Internet
Author: User
Keywords Data mining very very very large data
Tags analysis application applications business business intelligence business process class company

Keynote speaker: Xionghui

Moderator: Carey

Organizer: Zhongguancun Large data Industry Alliance

Mr. Xionghui is the Associate director of the Department of Management Science and information systems at Rutgers-New Jersey State University in the United States, a tenured professor. The main areas of research include: data mining, large data, business intelligence, mobile computing, and information security. Obtained the highest academic award at the University of Rochester in 2009. He was also invited to participate in the organization of international top-level meetings (such as KDD, ICDM,ICML,ICDE and SDM) and as a member/chair of the International Conference Organizing Committee. He is also currently deputy editor of the IEEE Transactions on Knowledge and Data UB (TKDE) and knowledge and information Bae (Kais), China Computer Association Newsletter Editorial Board, and China Computer Society large Data Committee.

The following is a full transcript of the sharing:

My background is special, I am a PhD in computer science, and the first Chinese to graduate directly from a PhD in computer science to a business school. Because the success of large data applications depends on both the technical and the domain capabilities, I have the opportunity to be in a more unique position on the combination of computer technology and business applications, talk about my personal understanding of large data.

1995, I graduated from the University of Shenzhen, is the tide of entrepreneurship, went to Shenzhen. Work six months later, the first time to feel the anguish of life hesitation. So read a lot of Chinese history, philosophy, Bingshu. Start rethinking the direction of life. The future career that was given to me at that time must meet three requirements:

1) is the occupation of which one is genuinely interested; 2 The personal value increases with age; 3. Regardless of the rise and fall of the national economy, the occupation must have the very high safety guarantee.

It was just a concept of data mining that was fortunate to understand. Immediately a kind of say not come out of the cordial feeling, data mining should be their lifelong career. Data mining is a career that meets three requirements.

First of all, I am interested in history, Bingshu and fortune-telling. Data mining is about predicting the future from history, seeking certainty from uncertainty, and looking for rules from complex and changeable things. These are the usual interests of their own.

Second, personal experience, engaged in data mining practice more, people will become smarter, judge will be more accurate. With more experience, personal value will naturally increase with age. For example, my consulting fee is much higher than 2005, because I am more experienced and more efficient. (Editor's note: Teacher Xiong's science automation major)

Finally, because the development of the whole human society relies more and more on information technology, the need for data mining professionals will only become more and more big. At the same time, data mining this professional to the comprehensive requirements of talent, data mining personnel need to have a strong field knowledge and technical knowledge. The disciplinary division of education in China objectively leads to the difficulty of cultivating talents across borders. In other words, the threshold is high for the industry to do well. Therefore, excellent data mining talent professional supportability is very high. In fact, there are a lot of data mining experts in ancient times, such as Bowen, Chaoleven and so on.

The concept of data mining is to solve the problem of massive data analysis. So why present the new Big data concept? The root cause is the blowout development of data-driven application and many new application requirements. Now many applications, such as high-frequency data trading and online social media, produce not only large amounts of data, but also in the process of dynamic change. For example, we were fishing in a clear stream before, and now we are fishing in the fast-muddy Yangtze River. Solving these new large data applications requires our ability to observe, analyze, and solve problems in a timely manner.

Data mining this occupation is particularly like a doctor. Data with different characteristics from different fields is like a patient with a different condition than a doctor. Like doctors, data-mining people must grow from practice. In fact, my research has always been closely combined with practice, and many of my research problems have been generated from practice. Talk about some experience:

1 The hardest part is problem extraction, in other words, how to find a valuable problem from a practical application scenario. This requires strong domain knowledge to be able to penetrate the business process to find the business pain point and the ability to extract the actual business pain point into a mathematical problem. For example, we do a business-to-business market analysis for a 500-strong US company, with the aim of discovering valuable new customers, digging up existing customers, and shortening the time to sign a contract. These are real business needs, but how do they turn into data mining issues? Because now the new, microscopic data collection means, we have the opportunity to people, the entire business process of detailed observation, and to decompose, can find no efficient link, efficient changed efficient, the whole business process more efficient.

2 for most data mining projects, the most time-consuming and energetic is feature UB. Many of the project's success or failure does not care about the model, the extraction of appropriate characteristics. In many cases, feature extraction requires a full understanding of the domain. Like a doctor, there is a lot of experience here. I usually do project have percent time in understanding data, data preprocessing

3 Instance selection is important in many cases, the more data is not the better, the more appropriate data is the better. Much of the data itself is not noise, but it is not very relevant to the problem we want to analyze, in which case we need to do reasonable data clipping. In fact, when it comes to large data, there are many misunderstandings about the word "big". Large is a relative concept, there are many applications, but the amount of data to be processed in unit time than the existing computing capacity. For example, in high-frequency trading, we may need to process 1M of data in a millisecond. 1M is not big, but the problem is that time requires realtime.

4 before modeling also need to understand the characteristics of the data, such as noise, sparsity, and data distribution. It's like a doctor must know the patient's illness before choosing the treatment.

Finally, I want to say that life is a data mining process. The whole life of a person consists of several key points, at each point, we need to make decisions based on the information we have.

Interactive content:

Q1: Data Quweicunzhen, in the data mining is not a high proportion of work? Is experience a determinant of analytical ability?

A: That's a good question. We are now in all walks of life are desperately collecting data, a lot of garbage data, now the data unit value density is declining. Data mining like a doctor to a patient, SAS patients and the common cold patients have many common illnesses, our task is to identify the characteristics that can help us distinguish.

On the future, I propose a concept called information class theory. Because of the popularization of large data consciousness and data mining technology, the information asymmetry of human society will be greater. A large part of the future class (gap between rich and poor) is due to the different mastery of information resources. In many fields, the mastery of information resources can cause the situation of high dimension playing low dimension. It was as if there were weapons in the Second World War, and the brave Polish cavalry on the German tank cluster would be a simple slaughter.

Q2: How to identify what data is true data and what data is pseudo data? by Liu Donghua

A: and specific application related, for example, to do the specific market forecast problem, you may have a lot of data, source different, the quality of different, my experience is that when the amount of data is sufficient, the quality of the data would rather be discarded.

Q3: In China, the traditional manufacturing enterprises accounted for a high, but also an important economic pillar, data mining on the traditional manufacturing will have a great value traction? What good advice does the bear teacher have? by Shanyankai:

A: The entire history of human development in the pursuit of standardization, standardization can improve production efficiency, the traditional manufacturing supply chain management is the field of data mining can greatly help. Problem, the human in the pursuit of standardization, any standardized industries are sunset, job opportunities will be reduced

@ Shanyankai: Agree with Bear Teacher's point of view, the most lack of Chinese manufacturing is the standard!

@ pomelo: And the Economic forecasting field

A: Economic forecasting is a good example, the surface of a lot of data can be used to predict the economy, but for China's data, it may only generate electricity, steel production, rail transport and other small indicators are reliable data, so the strong index to see the economy more plausible.

The development of large data also brings about a shift in the focus of the human problem-solving approach. The Orient has long been characterized by inductive method, which is a kind of bottom-up problem-solving method, which is a top-down way in the West for a long time. In modern times, deductive reasoning prevailed, so Western civilization developed highly. But the data-driven problem-solving approach gives inductive thinking a chance to move ahead.

Q4: Which organizations in the country are relatively strong in data mining? by Carey

A: Data mining is full of dynamics, there are still more opportunities for wealth in China than in the United States, with the subsequent system more and more perfect, China's chances of getting rich and less. A lot of wealth is caused by asymmetric information. There are a lot of institutional information asymmetry in China at present stage, each system's formulation or change, has spawned a batch of rich ideas and opportunities. The United States is more technical information asymmetry, that is, through the means of data mining to form information asymmetry, so that in the competition to produce potential energy difference, can form the absolute advantage to combat the absolute disadvantage of the situation. Of course, China is striding forward in its asymmetric technical information.

Because information platform and information awareness of urban and rural differences, rural children in the information technology is a big, in the competition, the disadvantage is more and more, the most terrible is the rural information consciousness completely did not follow. Therefore, the future class will objectively be caused by the different mastery of information resources. Therefore, at the educational level, efforts should be made to narrow the information gap between urban and rural areas.

The development trend of the future society is speeding up, the speed of wealth rotation is speeding up, from rich to poor, from poor to rich, from weak to strong, from strong to weak are more rapid transformation than before. Will be a spiral, the back of the new town metabolism more and more accelerated, hold will be thrown out. Three generations of rich must cultivate a good family information system.

Comments: Now is the era of enterprise information Reset, mining operation data level will open the gap between enterprises, and once the eldest brother, may become an only child! by Shanyankai:

Q5: What are the patterns in the field of investment analysis by the United States with large data? If you are, what areas do you focus on data mining? By the Rain Drunk Heaven

A: Because the future is the era of wireless network, so optimistic about the application of wireless network, such as indoor navigation, webpart car networking, smart home

Q6: Which companies or organizations are more in the field of data mining?  China. by Carey

A: According to the laws of the United States, with data, and economic strength of the company will breed a large number of data mining talent. The United States is flagm+ consulting company + Financial company has a large number of data mining talent. But many of America's new startup are data-driven.

Q7: Recommendations to the Zhongguancun data Industry Alliance by Carey

A: The Alliance built four libraries: think tank, Case library, problem library, database, and form a shared, benefit-sharing mechanism, can be long-term stability

Q8: Who can do well in China's food safety field, forecasting work, how to do? By: Lin Chunyu

A: Rely on the power of the masses. Food safety in the field, with the mobile phone terminal intelligence, the future can rely on people to collect water quality, food quality, so the production of corresponding sensors, and simple test methods, and form masses collection means is the key

Finishing: @ grass

Thank you bear Teacher's wonderful share, welcome all friends to participate in the discussion! Reprint please indicate the source: large data (Bigdatainsight)

(Responsible editor: The good of the Legacy)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.