Large data will challenge the enterprise's storage architecture and data center infrastructure, and will trigger the ripple effect of cloud computing, data Warehouse, data mining, business intelligence and so on. In 2011, companies will use more TB (1TB=1000GB) Datasets for business intelligence and Business Analytics, and by 2020 global data usage is expected to rise 44 times-fold to 35.2ZB (1zb=10 billion TB).
Big Data challenges
For a large amount of data information, how to make the complex application of these data into the current data warehouse, business intelligence and data analysis technology research hotspot. Data mining is to find out the hidden regularity of data from a large number of data, and to solve the problem of application quality. The most important application of data mining technology is to make full use of useful data and discard false and useless data. The data in the traditional database is very strong, that is, the data is fully structured data, and the most important feature of the data is semi-structured, so this kind of data mining is much more complicated than the data mining oriented to a single data warehouse.
When it comes to traditional data warehouses, people will inevitably buy storage devices, select servers, whether ibmpower or Oracle architecture, these are in fact well-known in the traditional era of the database brand, build it together, constitute a data warehouse, Microsoft, Cocnos and so on provide solutions.
For the enterprise business, not only to have a high scalability, but also dynamic demand, to allow equipment to expand freely, without going to the Tube data Warehouse, the application of the specific operation of these machines on which platform, these computational capacity of the cost is entirely based on the expansion of the business.
The traditional architecture does this kind of project for 10 to 20 years time, they have a feature, the Data warehouse access and traditional, the so-called difference is that the query is particularly large, the query statement is particularly long, particularly complex, unlike the bank's deposit withdrawals are only in a large number of records to query one or two, it conforms to the characteristics of large data query, The traditional query indexing function is very limited. In the database involves a number of tables of the connection, but also the summary, the standard poor and complex operations. But instead of a lot of concurrent requests, an enterprise is more than 1000 business analysts who analyze data.
Therefore, the first day of the birth of the Data Warehouse, the system has been a bottleneck, to the large query decomposition into small tasks, these small tasks by the parallel server to complete, we emphasize that small machines to many, and not a large number of machine CPUs. As a result, the Data warehouse is inherently MPP, an open architecture CPU plus a parallel expansion of the number of horizontal extensions.
When big data encounters cloud computing
Why does cloud computing prevail? Construction of application systems in the Internet domain: Customer group is uncertain, system size is uncertain, system investment is not fixed, business application has very clear parallel segmentation features, Data Warehouse system construction, Data Warehouse scale can be estimated, data warehouse system investment and business analysis value and return related, The Business intelligence application belongs to the whole application, the SaaS pattern constructs the Data Warehouse system.
Large data management, distributed file system, such as Hadoop, MapReduce data segmentation and access execution, while SQL support, hive+hadoop-represented SQL interface support, in large data technology to build the next generation Data warehouse with cloud computing becomes a hot topic. From the perspective of system requirements, large data architecture presents new challenges to the system:
1. Higher integration. A standard chassis maximizes the completion of specific tasks.
2, more reasonable configuration, faster. Storage, controller, I/O channel, Memory, CPU, network equalization design, for the data warehouse access to the optimal design, than the traditional similar platform higher than an order of magnitude above.
3, the overall energy consumption is lower. Equal computing task, lowest energy consumption.
4. The system is more stable and reliable. Can eliminate a variety of single point of failure, unified a component, device quality and standards.
5. Low management and maintenance cost. The general management of data hiding is integrated.
6, can be planned and foreseen system expansion, upgrade road map.
Cloud computing environment as a large data processing platform
1. Differentiation of basic computing units in cloud computing environments
Although the Enterprise cloud computing platform has a number of parallel computing CPU, but did not create a super CPU with super data processing capability, so the cloud platform needs to have parallel computing ability software system. At the same time, when all the users ' data are all in the cloud, although the storage capacity can be easily expanded, the simple data processing logic is unable to meet the need of the massive data processing request which the user initiates simultaneously.
It can be seen that there are a considerable number of electric business enterprises in the country, with small machines and Oracle for several years, and the country's most cattle Oracle experts continue to optimize his Oracle and minicomputer, the initial development may soon, but later due to the proliferation of data, the business began to be seriously affected, The most typical example is undoubtedly the large-scale access request downtime event that occurred during the previous period of Jingdong Mall, so they began to gradually abandon Oracle or MS and gradually turn to the mysql+x86 distributed architecture.
The current basic computing units are often ordinary X86 servers, they form a large cloud, and future cloud computing units may have storage units, computing units, coordination units, the overall efficiency will be higher.
2. The need for system stability
There are some system stability pursuits in response to large-scale visits, from many aspects, from network stability and database stability. For the system, a large principle needs to be grasped and any single point of failure needs to be eliminated. Not only a single point of failure on the network, there is a single point of failure from your call center, as long as there is a single point of failure must be removed.
Because for the electric business industry, every second is money, e-commerce business if downtime one hours, the loss of how much can be counted out, the electrical business industry needs a very comprehensive technical system monitoring and alarm system. Sometimes you will find that it is too late to deduce your technical problems through technical system monitoring.
(Responsible editor: The good of the Legacy)