2014 Zhongguancun Large Data day on December 11, 2014 in Zhongguancun, the General Assembly to "aggregate data assets, promote industrial innovation" as the theme, to explore data asset management and transformation, large data depth technology and industry data application innovation and ecological system construction and so on key issues. The Conference also carries on the question of the demand and practice of the departments in charge of the government, the finance, the operators and so on to realize the path of transformation and industry innovation through the management and operation of data assets.
In the afternoon of the government @big Data Forum, Anxin co-founder Dong Xin brought "data-driven security" keynote speech, mainly on how the big data landed, especially when the government and the company landed, and large data for security, especially information security will be a very good landing point.
Dong Xin: Thank you, Sir, and thank Asiainfo for having such a chance to explore the big data. Today's topic is data assets, and more bluntly, what is the value of big data. We use a circle more popular a joke, what is the big data? He was particularly interested in discussing sex, and everyone was very excited about it, but few people knew what it really meant, and the big data was the same. There are so many data so much noise, there are so many gold mines, it can give the government what kind of value, in the above can produce what kind of value-added, I believe that all scholars, the Government, including enterprises to explore a topic. Let's talk today about how this big data is going to land, especially when the government and business are landing, and we feel that big data for security, especially information security, will be a great place to hit the ground.
Why do you say that? I can bet that everyone here is not safe, I am not talking about personal safety, property security, but your privacy and your digital assets in cyberspace security, why so say? We can see from the last few years, on the Internet, there is a large number of information leakage events, I cite a few examples, in 2013 the largest U.S. retail business, his user information was leaked, there are 70 million user information, more than 10 million of the information was leaked, this thing is very interesting, Before he leaked it, a company with information security had predicted target your information is unsafe and likely to cause leaks, but he felt it was fine, but out of the matter, he dismissed the board, similar to JPMorgan Chase, where 76 million individual customer information was leaked, There are also 2014 years of Sony Pictures 11T data was leaked, Sony and they are not the same as the whole consumer, especially family and personal news, Sony This incident, in the industry seems to feel that only primary school students will occur errors, even some of the most basic measures of security protection, Process and the corresponding product deployment is very junior, in addition to his films have a lot of information even some information including social security leaks out, this is ridiculous.
Why large data is associated with security. He said that future information security of society would be a big data, it was inconceivable at that time. This is 100 years ago Beijing, when Beijing was nine city, outside seven city, everyone may not think so I have a very good gate, I have guard on both sides of the gate and the moat, I feel safe enough, all the people to go from the gate, all the people go to see the arrest warrant, this thing his core is in the city wall is the gate, Then there was a warrant. This thing in the information age is actually the same, exactly the same, there is no difference in logic, in the past 30 years, whether stand-alone or network, information security is centered on defense. What do you mean? There is a pipe card equipment, artificial set up a network of boundaries, within this boundary is my intranet, I want to ensure security, but a large number of cloud generation, in fact, this boundary is increasingly blurred, we have Office software on the phone have office information, we can even put their own computers can be used into the office environment, Our data does not only exist in the data center, the boundary is very blurred, and how traditional security to do? Simple understanding is like antivirus, there is a virus out, let security vendors to analyze the virus code, but in January this year, Obama formally signed a bill, the U.S. federal government and local governments can not rely solely on a signed security system, this series of devices are based on rules, For example, I define what is called spam, as long as my content contains sales invoices, I think it is spam, this rule is very simple, of course, after a few days this invoice changed, that is, you have to follow the rules of the change, and there is no adaptability. Over the past 30 years, the entire core of information security is the signature and the rules, both of which were thought to have died in 12 and 13, because they can only solve the problems that have occurred and they can only defend themselves, and there is no way of knowing for the unknown what is going to happen.
Similarly, we look at the right side of the picture is relatively small, many companies have installed anti-virus software, the various firewall intrusion prevention, intrusion detection and so on, and is the most expensive, but also attacked, the same data leakage, this is why? This is the difference between the entire security system and the application delivery model, simply from the defense, we have the door in this room, we have the lock, but the door and the lock are obviously not defensive thieves. Back in the real world, Beijing no longer has walls, no more gates, no posters on the street, but you are safer and more comfortable walking in this city than before. Why? One of the reasons is interesting, we've deployed a large number of cameras throughout the city, we can observe every single vehicle every day, and when there's been a case, we can trace it through the camera's head and analyze it to investigate the evidence, thus seizing the criminals, as well as within the IT molecule, Previously sold are locked fire walls, sold are border products, the future light has a wall is not enough, also has the camera head, you enterprise all government all with the information because of the related products, these are used to analyze, these so-called camera head, produces all the data will be the sky quantity, we from 2013, at the United States Congress, In fact, security problems in the future there are many solutions, the most recognized solution, the information security into large data, which is recognized in the world as a way. We are also based on this idea of the concept of moving towards this, based on active or even intelligent transmission.
This is the Idc.com home page can be downloaded directly, it is divided into the global information security market, starting from the 70 's, from 70 to 2000 years, security has changed a change, from the 98 stand-alone version of the antivirus into the network version of Antivirus, that is, the virus is no longer rely on floppy disk transmission, but rely on the network to spread. The second change is now, IDC predicts that in the next few years the security market will increase significantly, not everyone can often say that the firewall these things, very good understanding, our information is all over the cloud, how the cloud to ensure security. Second Internet security, devices are no longer just PCs and mobile phones, the future may be air-conditioning, refrigerators, will produce a lot of data, their safety how to do? There is a common denominator between these two security, is the quality of security, in other words, the use of large data to do security intelligence analysis, the word is relatively small it is said that the use of large data analysis methods, information security is at least 10 times times or even a hundredfold performance growth and earnings growth. This is IDC's information security.
Also we look at another information company, Geithner, to deal with cyber fraud and some internal information security leaks, they feel the traditional firewall on the network fraud and security leaks, and currently only 6% of enterprises deployed such a system, they feel that 5 years after at least one-fourth of enterprises, And these companies will be able to deploy these systems within six months, there is a great change in it interaction, no longer is the PC no longer a terminal, a variety of application hosting applications, allowing us to interact with a large number of changes, the traditional security architecture based on the rules based on the signature based on the single machine is dead, The future should be the opportunity big data is based on distributed, which is a trend for future security. They also talked about how future companies would allocate security budgets, at present 90% of the security calculation is assigned to the defense, is to buy all kinds of wall card equipment, only 10% will be placed in detection and analysis, to 2020 the global budget will have a substantial adjustment, 60% will be placed in the detection and analysis, is to put the money on how to take forensics, you come up with questions about how we deal with, how we solve, there will be only a small amount of security budget will be allocated to the traditional defense-centric security.
In other words, the future corporate Internet company or cloud or government, a lot of their money is not to buy high-grade anti-theft door, high-grade locks, but spend a lot of money in the data and IT systems, and the camera behind the massive storage and massive analysis, all of this is large data, I give a concrete example, We have a client that is the headquarters of the Commercial Bank, he has purchased the world's most expensive security equipment and analytics software, but he still has a headache, and they can only analyze what has happened in the last 10 hours, the architecture and logic of hardware and software, and even with so many of the most expensive security devices, But he still finds a lot of attacking security threats coming in, have even caused some impact, so they are very keen to have a large data security analysis system to be able to put all their massive data is no longer the past 10 hours, to the past three days or even three months, and can be real-time search, and according to their request to carry out the corresponding report , all real-time incoming communication can find real-time anomaly, we are helping this bank to do this matter, the answer is our bank is also very satisfied.
From our perspective, we feel that the corporate government and cloud generate a lot of data every day, how do we analyze these data, we are not to analyze the advertising behavior, not to analyze the characteristics of consumption, nor to conduct business decisions, we are from a security perspective to make decisions, We are through the acquisition through the storage intervention to analyze the vast amount of security information, it information, in short, all the machine-related information we are stored, one is to help the enterprise to do security visibility and security intelligence, we can see what I see, anytime, I want to see what happened to someone today. Exactly what time through which computer on which site, what the infection, what it has achieved, and so on, and so we are here to do this, security intelligence we talked about, previously all security vendors are based on rules, Our ideas are not the same as he is, we are mainly in the first day of machine learning, artificial intelligence and now the main popular is the depth of learning, including Baidu spend heavily to allow scientists to do in-depth study, we also do in this direction, by recording all the data down, we can put each user every visit, Every application system can record his footprint, we can do it by model, we are still not doing it now, because our chief scientist of the company has been in the work of security core research for more than 10 years in the industry, and there are a lot of patents in the security model in North America, we hope to be able to learn the machine, The concept of artificial intelligence and deep learning is linked to security, by modeling to find what things are safe, for example, my account was stolen, my client password and some of the password of the forum, the password was stolen, there is a legitimate user, with a legitimate identity, to do a legitimate thing, But this is not normal with our brain, because he is not me. We have a simple, of course, much more complicated than this, we modeled all of this person's behavior, what kind of consumer behavior he is, what kind of behavior he clicks on a different mouse, the data can be quantified, and when his behavior is different one day, we'll find out, OK, there may be a problem, why? Because my data model did not analyze, this time will give the system administrator, to the actual operation, in the end this person is not normal, such as by telephone check. This is our mathematical model in the solution of information security a relatively simple example, of course, the enterprise will be very troublesome, our number of dimensions may be thousands, our data is also very large, we are not only to do the storage, the key is to do analysis, as well as related analysis.
This is something that no one else can do. The traditional approach is a small data leak, all my data come in, discard useless data, I have to do some data to go out, these things have to do, and ultimately may leave a small part of the data to be considered valuable analysis, and these analysis is based on statistical analysis, Instead of the analysis based on machine learning, what we do is that your raw data becomes more and more cluttered, the longer the better, we save all the data, save it to our big data platform, we do a real-time retrieval of him, we do not search his data itself, once you want to find the relevant things, When you want to string something else, you can find it in something that is related to security, and that's what we call the full volume and the whole body analysis.
When we're doing this thing, also relatively speaking, we feel more at the forefront, we also see that a few new companies are doing this, we feel that this phase is particularly like the beginning of the internet in the late 90, like all Web pages can be revenue to come in and do search this thing, theoretically is possible, But the facts are unlikely, but in fact, like Yahoo, like Google on the entire Internet web search to achieve, this in the security field, I put all the enterprise information days to, workflows, events and even network traffic, after these things are collected, this is a good news for us is a bad news, The good news is that we all recognize that it can be done sooner or later, and we start a business in this field.
In short, we are based on a large data analysis model or our analysis of the biggest feature of the product is that we can find some unknown things, based on the rules cannot be found, we through the analysis of data through correlation, there is a very hot word is apt, these two years this word is very popular, Especially the network upgrade between China and America, national security upgrades are related to the word, but in the end it is a high-level, this thing without previous references, it is impossible to pass a simple barcode, it is a long-term targeted attack, through large data analysis can effectively find a more matching, These people will leave traces, he can't steal your data on the first day, he will spend a long time looking at your architecture, your core drawings core code, the core business interests, they will spend some time, this time will have a footprint. In addition, we are equivalent to using large data to help the enterprise security to do a front job, Google can not search documents can not search other things, this is to have a forensics and critical search. The third we are doing this thing, also helps the enterprise to present a all-inclusive, before different devices different applications are isolated, but what we want to do is that what we have done is to put their things back, through a picture of the custom form, how many times, how many attacks, There have been some anomalies recently, there are no major security threats, after analyzing our model to do something, we are through a complete interface, the console to complete this matter.
The other one is that we're going to talk a little bit more. But 90% of people are doing research and development, and there are a lot of people are doing mathematical models to do algorithms, we may feel that this algorithm is a bit of God, actually not, these things in fact, 30 years ago we have seen, not we built, You can do this 30 years ago, from the last few years, in particular, the Internet economy a lot of search, in fact, there are a lot of algorithms can be applied, we are doing these things, we do more or through a series of ways, clustering and classification of our security events to be graded to package, to classify, Then find the difference and create a new model.
Traditional people can do it, just like the customers we serve, not not to do, but the cost is too high, he wanted to do a report, to wait three hours before the corresponding results, but we now help him find what he wants to find, help him find a security attack without discovery, only need less than 24 hours, This is a huge breakthrough, these enterprises are very dependent on the underlying large data technology, we're talking big data, why do we think this data is valuable because we have the technical ability to identify the data to analyze it, and in the security field, apply the technology of large data to the security system, The main people in our company actually have been doing big data in the past few years, our technical partners in China, we have Hadoop in China, the first certification authorization notice, now we are more professional, we will be large data and security focus, the large data and algorithms together, so as to find some previously found. Of course we have a lot of things now, these days we have a part of the product is completely free, your business and government you have to search you have to show, we put this tool has been released online, and can be free to drop, everyone can download within 100T days to, at least do collection, retrieval, All of this is free, we are also based on a series of open source architecture, we make a lot of changes in the open source architecture, including Chinese support including a series of planning and so on, we hope that we will be in the future, at the bottom of the large data processing this architecture we are still waiting for free road But at the algorithm and application level we will have the core competitiveness. This product we released officially free last week, we will do open source early next year, we also want more enterprises to focus on information security, focus on the security incidents have been generated through free and even open source products to manage.
Back to the beginning of the topic, that is, we are a client, bought a variety of security analysis software, but they face the biggest confusion is that these things are the previous generation of products, their support for data and information is too bad, every day the bank generated log a T, the traditional database has been paralyzed, and he also found , he deployed so expensive things, there are 10 of people's operating team writing rules every day, there are 30 outsourcing companies to help them do operations, but there is no other business system to turn, do not know, not clear, because they do not see. So they want to be able to have a set of things, the first help me save, the second help him find something, our products officially in this bank on-line, and achieved very good results. Before their system was the peak was 3,000 data, already a very high bottom line, and to reach the limit, we put it to the large data, we easily reached 50,000, this is enough for large data companies, and we have all the data and logs to show, and in a day, The model found a series of problems he had not found before, including some unusual accesses.
It's simple in terms of working principle, similar to what you said, is from a large number of data, with a security perspective of the enterprise's big data to model, is such a strong point, the future of all his new data come in, the first to have the model, this is our simple, we have some progress in this area. Combining big data with security, I believe, is a relatively new area, but we think that this field is a little bit more like the idea of big data, which is why we believe this is the right direction, and we're looking at a really good implementation around the world, so we're officially setting up a company next year, We also quickly set up the team, and while the company was established, we got an investment in Silicon Valley's top tier, and at the moment we include banks like the operators, all with corresponding actual customers, and the next thing we want to do is we want to be able to find us more and more partners, Together to build this producer, together with the information security and large data can be combined, we provide the most core algorithm, the whole large data with security analysis of this matter, this is our simple idea.
So, when I was talking to a journalist this afternoon, he also mentioned the Big Data security analysis, which they seem to be out of reach, but this year, this thing has completely landed, we see not only like our company, including some giant internet companies, they are also doing analysis, including the largest search engine, They're also writing and trying to find out how much data I find safe, how do I put the ability to dig for individuals to serve the enterprise, I believe that in this respect, we also need to walk together, we also want to be interested in whether the partners or our customers, we can discuss together, How can the artificial intelligence combine the data modelling and the big data, help the enterprise to solve more firewall intrusion prevention, completely cannot do some things.
Thank you.
(Responsible editor: Mengyishan)