With the advent of cloud computing, "big data" has become one of the most widely discussed keywords in the industry, and many companies are already looking for the right bi tool to handle large data collected from different sources, but despite the increased awareness of large data, But only a few companies, such as Google and Facebook, can really use big data to tap into business value.
In fact, with the advent of the big Data age, the enterprise's understanding of large data should not be limited to the understanding of basic technologies such as Apache Hadoop, enterprises should be from the infrastructure perspective to understand and protect the large data owned by the enterprise. Because in the next 3-5 years, we will see those who really understand large data and can use large data for value mining enterprises and do not know the gap between large data value mining enterprises, the real ability to make good use of large data enterprises must have strong competitive advantage, and become the industry's big guy.
In fact, many companies are already starting to look at Big data, manufacturers have also started to introduce their large data products, the relevant meetings continue, which also allows us to see the success of large data literacy, but this is only from the perspective of ideological work, when we look for those who can really tap the value of large data business, Very few, so at present, the excavation of large data value is only in the initial stage.
The real way to get the first bucket of gold in big data in the future is that companies like Facebook and Google, which have an innate advantage in data management and data mining, have reason to believe they will lead the big data age. In addition to them, other companies that want to lead the front end in the big data age must be the leaders in those industries, because they have the ambition to build industry standards with an early layout.
Role of large data
What role does large data play in the IT field? For example, let's talk about the problem. For example, if a pharmaceutical company wanted to get into the top 100 of the pharmaceutical industry, he would have to crawl data from millions of related pages and then analyze it, erase useless information, and finally find valuable information. For a carmaker, it needs real-time phone information about cars that drive on the road.
Although companies have become aware of the mechanism of large data, they do not know how to tap into business value from it. Big data is like a big fish net deep in the bottom of the sea, some tuna, great white shark and other fine, but also have shrimp, shells and other cheap bargains. And our enterprise is the size of all, such a large number of data to explore the value of the problem has become a headache.
Semantic data model in large data
A large part of the data is unstructured data, including voice, video, pictures, documents, forums, Web pages, and so on, how can you easily manipulate the data? Creating a semantic data layer is a great way to extract the available data and build a data semantic model layer over the database to help you understand all the information underground.
After collecting data from different sources, the enterprise puts it together and then begins to analyze and process the data. The traditional approach is to create a data warehouse that extracts the collected data into the established data warehouse and generates reports. But this is a time-consuming process, and it is not flexible, every time you want to make changes, you have to go back to the data warehouse to make changes, quite a headache.
The data capacity of large data is so large that we need to deal with a lot of relevant information from different sources. Different people describe the same thing differently, and semantic techniques can help determine whether these terms are the same thing. For example, some people will call IBM "IBM", some people call it "Analysys Business rogue", in fact, said is a company, in fact, the computer is very stupid, only through this semantic data model layer can be very good judgment.
Risk Management in large data
In data management, it is risky to put all the data in one place, and data should be stored in different places for the sake of data security. such as numerical data can be stored in the database, unstructured data can be stored in documents or tables. We see that increasing the semantic description of the risk information from these different sources means that we can quickly understand the comprehensive risk situation.
One of the biggest benefits of a semantic data model is that, when making changes, you do not need to go back to the bottom of the data to modify the legacy system and database semantics. Because this semantic data model is above the data, it is much less destructive than other technologies, so long as we provide a semantic definition of the data for a source, we can apply it directly to the data from other sources.
This technique is not designed for programmers or database administrators, but for business people. Business people he needs to understand what this data means to him, he doesn't understand the underlying data tables, and he wants to be able to see visually the relationship between sales and other factors over time, only through our semantic data Model layer. In recent years, the boundaries between it and business are beginning to blur, the business unit can better define its own needs, and IT departments can better meet the needs of business units, although not yet the best, but already in the direction of the effort.
Security issues with large data
The need for access to the data collected also assumed that the enterprise needed to secure the data.
The biggest mistake a lot of companies make in data security is to do all the work of architecture, design, development, and so on, and it's a big mistake to start thinking about security. Therefore, real data security should be considered at the beginning of the security architecture issues.
Security architecture is only one aspect, in order to ensure the security of data, it is recommended that enterprises to store data slices. Because it allows for more precise control. In fact, each piece of data is an enterprise's assets, where you can set up the company's employees for this data assets permissions, such as view, modify, delete and so on. Of course, this data is encrypted, so that even if someone hacked into the database to steal this part of the data, we are relatively safe, because no context of data for the thief does not mean much, because the value of large data density is very low.
Here have to mention "' Synth data", the word is submitted by Forrester, mainly refers to the hands of the enterprise "toxic data." For example, you can imagine the data that the wireless company collects, including the user information that is logged in to the signal tower, the user's online time, the user's data, their geographical location, and so on, the enterprise can analyze the user's behavior through the data, but at the same time, The company also collects personal information about users ' credit card passwords, social networking site passwords, and buying habits.
This data should be said to have considerable value, why is it called "poisonous data"? Because once this data outflow falls into the hands of illegal persons, it is bound to cause huge losses to enterprises and individuals.
The world is fair, and income is proportional to risk. But in order to reduce the risk, the encryption of the data becomes particularly critical.
When it comes to large data, the most basic approach is to use transparent data encryption, which is to encrypt all captured data. This ensures that all data for the enterprise is encrypted. In the past, many companies did not want to do this, given the cost, but now there are many open source encryption methods that companies can choose from.
(Responsible editor: The good of the Legacy)