Big Data analysis guide
TMF frameworx Best Practices
Unleashing business value in Big Data
Preface
This article is excerpted from TMF big data analytics guidebook.
TMF document copyright information
Copyright©Telemanagement Forum 2013. All rights reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published, and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this section are supported ded on all such copies and derivative works. however, this document itself may not be modified in any way, including by removing the copyright notice or references to TM Forum, cannot T as needed for the purpose of developing any document or deliverable produced by a tm Forum Collaboration Project Team (in which case the rules applicable to copyrights, as set forth in the TM Forum IPR policy, must be followed) or as required to translate it into ages other than English.
In the face of many emerging big data analysis technologies, CSP needs a clear reference model to facilitate understanding of different big data analysis technologies and rational defining processes, in this way, you can select the correct technical framework and routes for specific business use cases.
To meet the above requirements, this Guide provides reference cases, reusable components, and reference implementation frameworks for big data analysis, helping CSP obtain commercial value through big data analysis technology.
This Guide includes the following ):
1. Basic Concepts and technologies of big data and Big Data Analysis
1.1 Big Data
1.2 Big Data Analysis
1.3 Big Data Analysis Technology
2. Reference Model: Big Data Analysis Solution
2.1 Overview
2.2 Data Loading
2.3 Data Management
2.4 Data Analysis
2.5 Data Storage
2.6 Data governance
2.7 Data Processing
3. Big Data Analysis business value Roadmap
4. Big Data Analysis Cases
5. Big Data analysis component
1. Basic Concepts and technologies of big data and Big Data Analysis 1.1.
Many standards organizations, consulting companies, and trading groups have tried the definition of "Big Data" and its differences from "common" data, and the conclusions are slightly different, because all views use big data features (3 V, 4 V, etc.) to describe them, but there is no more involved definition itself. As of the release of this Guide, the 3v model (volume, velocity, and variety) is still the most popular definition of big data.
A new definition of "Big Data" is as follows:
Big Data can use summarized statistics, and its data volume can be used to deduce and predict data behavior in a certain period of time in the future.
The original article is as follows:
A newer model (Big Data Paris, 2013) looks at big data as utilizing Inductive Statistics with data, the volume of which allows inferring laws and predicting to a certain extent Future behaviors of the data.
The above definition comes from: http://www.andsi.fr/tag/dsi-big-data/
Traditional Business Intelligence uses descriptive statistics.
1.2. Big Data Analysis
No matter which definition, the value of big data lies in analysis results, prediction, and execution. The TMF big data analysis project focuses not on big data, but on big data analysis technologies and methods.
Big Data analysis requires high-performance massive data processing capabilities and reasonable response time. To meet these conditions, some non-traditional technologies have emerged over the past 10 years and are good at sharing nothing, large-scale parallelism, and horizontal scaling.
1.3. Big Data Analysis Technology
MapreduceFramework and hadoop
- Mapreduce Programming Model
- HDFS (Distributed File System)
- Hbase (Distributed Database)
- Pig, hive (Data Access)
- Impala (Real-time ad-hoc query)
NosqlStorage
Four nosql Databases
- Key-value storage (such as Amazon dynamo and voldemold)
- Columnar storage (such as cassendra and hbase)
- File Storage (such as MongoDB)
- Graph storage (such as neo4j and Allegro graph)
Based on HDFSReal-Time query
For example, impala
Search
2. Reference Model
The purpose of the reference model is to provide functional components of the big data analysis platform. The roles and responsibilities of different functional components can be clearly defined to reach consensus in the big data analysis field.
2.1 Overview
It is a reference model for big data analysis, including the overview of the big data ecosystem and its platform function levels. Based on data relevance and data density, all functions are layered to provide external and internal APIs to other functional layers and third-party applications.
Note:
1. the reference model is designed to meet the overall needs of any big data use case. Based on the specific circumstances of each use case, it may only need to involve the function subset of the Reference Model.
2. The hierarchy in the Reference Model is an abstract group of similar functions and is not a component of a big data platform. Therefore, the actual ing of each layered function to a big data platform depends on the specific implementation of the manufacturer.
3. The hierarchy in the reference model does not have hierarchical and Sequential features, such as the OSI Layer 7 model and TCP/IP Layer 4 model of ISO. In addition to the data loading layer receiving data from external data sources, the order and combination between other layers can be changed according to the actual situation.
4. Data storage can be considered as a component of the big data platform. In addition to storing raw data and processed data, it can also be used for data flow between different layers.
5. Laws and regulations protecting consumers' privacy often weaken CSP's ability to make money using data and reduce the possibility of establishing partnerships in the data value chain. The privacy, security, and supervision functions in "Data Governance" are used to solve the above problems through data privacy protection technology. The big data analysis application can be considered as a combination of different layers in the Reference Model.
6. "batch processing" refers to offline processing (or scheduled Processing). It is executed as needed and assumes that there is a large amount of memory space. After an external request occurs, batch processing can process a limited number of datasets within a limited period of time. In the batch processing mode, the signaling stream and the data stream are separated, while in the stream processing mode, the signaling stream is included in the data stream. Stream processing mode refers to online processing, which continuously processes data streams as needed. Stream processing modes can be related to complex event processing technologies, real-time learning, real-time prediction, and other technologies.
7. the reference model can be regarded as a PAAs that supports business intelligence. The data management layer and data analysis Layer Cover all functions of business intelligence and can be used by external applications or user interfaces, you can use it locally or on the cloud.
2.2 Data load Integration
Establish connections between different systems for data transfer.
Data Import
Import data from an external data source to the big data platform. data can be tagged to indicate the data source from.
Data formatting
Unified format of data from different data sources. For example, imsi from different interfaces of 2g, 3G, and 4G may adopt different encoding formats. Therefore, this function is uniformly formatted before data is transferred to other layers.
2.3 Data Management Conversion
Map the original data to the data model to make it meaningful and useful data. Typical data conversions include:
- Comparison
- Date and Time
- Logic
- Formula
- Statistics
- Text
- Triangle Method
- Encoding
- List Management
- URL Management
Association
Associate data from various data sources that represent the same business entity. For example, you can associate msisdn from CDRS with a user number from CRM (both represent the same business entity-user) to provide more information about the user.
Regionalization
Combine multiple data sources pointing to the same business entity (such as users) to form a full view of the information of this entity. In some cases, the data source comes from multiple CSP databases, and in some cases, some data comes from big data analysis results.
For example, users can accurately predict their gender, age, education level, and income based on their browsing history and locations.
Data Operations
Data operations include:
- Merge
- Intersection
- Sort
- Filter
- Compression
- Deduplication/Replication
- Group
- Summary
Data quality assurance
Data quality assurance includes:
- Data cleansing
- Data Integrity assurance
For example, data with incorrect check value is discarded after being written to the log.
2.4 Data Analysis
This layer supports big data analysis through batch processing and stream processing modes, including metric computing, data modeling, complex event processing, and machine learning.
The data analysis layer relies on many technologies, including:
- Event Mode detection
- Real-time learning
- Event Abstraction
- Event-Level Modeling
- Event relationship detection (causal relationship, combination relationship, timing relationship)
- Event-driven processing
- Trigger-based action execution
Key features of the data processing layer include:
Descriptive modeling, predictive modeling, and guided Modeling
Descriptive modeling, predictive modeling, and guided modeling (interpreting the past, predicting the future, and recommending the best countermeasures) using machine learning and data mining algorithms, including:
- Classification Analysis
- Cluster Analysis
- Mode Mining
- Recommendation and collaborative filtering
- Statistical Relationship Learning
- Text, voice, and video analysis
Complex Event Processing
Most complex event handling solutions and concepts can be divided into the following two categories:
- Computing-oriented complex event processing solution:
Execute an online algorithm on the event data that enters the system. For example, the average value of the event data that enters the system is calculated continuously.
- Detection-oriented complex event handling solution:
Focus on Event combination detection (or event mode detection ). For example, events that match a specific sequence are detected.
Complex Event processing provides the possibility for Big Data Analysis scenarios that require real-time processing, and provides online functions such as stream data processing, event Association, and KPI computing. Based on the business rules provided by users, complex event processing triggers alarms for subsequent actions of external systems.
In the big data environment, complex event processing can be implemented by a complex event processor capable of large-scale parallel computing, such as storm, an open-source Twitter project.
Trigger-based action execution
The results of big data analysis can trigger alarms and execute actions.
- Alarm: Send an alarm to the user for subsequent decision (machine> person ).
- Trigger: trigger an alarm to another system and automatically execute the corresponding action (machine> machine ).
For example, the network performance monitoring system uses the complex event processing technology to detect network element alarms. When the number or severity of alarms exceeds the threshold, the system generates a severe alarm to the maintenance personnel, and trigger policy changes (route network traffic to other network elements ).
Indicator Calculation
Calculate related business indicators, such as TMF business indicators (including framework indicators, customer experience management indicators, balance points, etc.), and any other indicators.
Report Generation
Data reports can be generated in real time, by day, week, or month, or on demand. Reports are used to visualize big data analysis results. Currently, many efficient visualization tools are available.