With the advent of the cloud era and the introduction of SAAS concepts, more and more enterprises are choosing to provide SaaS application services through Internet platforms such as SaaS application providers and carriers, the data volume of SAAS applications is growing at the TB level. Different SaaS application systems provide different data structures, including text, graphics, and even small databases; with the distribution of SaaS application data on the cloud service platform, SaaS application data may be distributed on different servers. How to mine heterogeneous and heterogeneous data is a challenge for enterprises in the cloud era.
Challenges for Enterprise Data Mining in the cloud age
Mining efficiency: after entering the cloud computing era, Bi's thinking has been transformed. Previously, it was based on closed enterprise data mining, and faced with the heterogeneous data volume after the introduction of Internet applications (it is estimated that by 2020, the explosive growth of data volume will exceed 35zb (1 ZB = 1 billion TB )) currently, the efficiency of parallel mining algorithms is very low.
Multi-source data: after the introduction of cloud computing, enterprise data may be stored on a platform that provides public cloud services, or on a self-built private cloud, mining different data sources is also a challenge.
Heterogeneous Data: The biggest feature of Web data is semi-structured, such as documents, reports, webpages, sounds, images, and videos. Cloud computing brings a large number of SAAS applications based on internet models, organizing effective data is a challenge.
Data mining for SaaS applications is expected to introduce fast parallel mining algorithms through the massive data storage platform to improve the quality of data mining.
How to choose a reasonable infrastructure
For enterprises, how to integrate and mine various application data and extract suitable commercial information is an urgent need of enterprises. Most traditional bi models are based on data warehouses and relational databases. In the face of the rapid growth of heterogeneous data, the traditional data warehouse and the original parallel computing technology are unable to solve massive data mining due to low mining efficiency, affecting the timely extraction of data.
For a long time, business intelligence systems are often built based on traditional SMP architecture minicomputers. With the increasing performance, availability, and scalability of the x86 Platform in recent years, the X86 platform has begun to erode the share of minicomputers in more and more market fields, business Intelligence has also become another battlefield for the X86 architecture to launch an attack on the server-level CPU. For example, the exadata database cloud server launched by Oracle Based on the Intel Xeon platform uses the unique smartscan technology and the design of the data processing process moving down, based on the X86 architecture, it also provides high OLAP performance (Data Warehouse applications) and OLTP performance. In addition, IBM has launched a business intelligence solution based on the X86 platform, based on IBM ex5 architecture server and XIV grid storage system provides intelligent information processing capabilities without losing to minicomputers.
1. High Availability:
The basic architecture layer of Bi requires the establishment of a data mining cloud service platform, which must be highly available.
From the perspective of high availability, we need to solve three problems in a centralized manner: First, data protection requires the use of hardware mechanisms such as CRC and ECC to verify and correct transmitted data, if it cannot be corrected, the damaged data will be isolated to avoid larger data and system restart and downtime.
Currently, Intel Xeon 7500 or E7 has many advantages, such as low cost, high performance, high reliability (RAS), and good scalability. In terms of scalability, The X86 platform's horizontal outward scaling function consists of more than two machines to form a cluster. It can meet the load requirements of key application environments of most enterprises, including databases with high memory and CPU requirements, commercial applications, and virtualization. In this way, the traditional UNIX dual-host solution can avoid many dilemmas, such as "high costs, serious idle and waste of Backup Server resources, and user services forced to stop during host failover.
In addition, some 7500 designs have minimized scheduled downtime, including system partition management technology, hot addition of CPU and memory, and hot removal, minimizing system maintenance time.
Data Mining cloud services still rely on Virtualization Technology and require independent allocation and scheduling of computing resources. That is to say, virtualization technology is the support of Data Mining cloud service technology.
Never be fooled by concepts
Big Data has many different usage cases. Therefore, enterprises need to adopt different data mining platforms based on their own business conditions. For customers who focus on application analysis and processing requirements, there are many specialized solutions, such as HP vertica, and many high-performance NAS or target systems.
Similarly, for videos, security monitoring, closed-circuit television, analog simulation, large bandwidth or throughput, you can consider HP ibrix, Dell exanet, bluearc, HDS, netapp, data direct networks, Oracle 7000, EMC isilon, and vnx.
In general, users may face a lot of hype about persuading you to migrate to more expensive systems. Maybe your current system is good enough-if it can be expanded, what the vendor provides to you may not necessarily run well in your current environment.
Users need to be cautious about the hype about big data. They may want to narrow down your selection. In addition to the opportunities brought by big data, there are many different aspects to consider, such as its features, applications, use instances, and deployment solutions.