Content Summary: The image of mobile phone users is an important step for telecom operators to realize "data-driven business and operation". Firstly, this paper introduces the method of personal privacy protection in the process of mobile phone user portrait, then analyzes the data source and big data realization technology of mobile phone user portrait, and finally, analyzes the application of mobile phone user portrait in personal credit by Data sample example.
Introduction
With the development of computer network technology, the era of "data is the resource" has come to the age of big data. User portrait is an important measure for telecom operators to avoid the risk of pipelining and realize "data-driven business and operation". User portrait and application of Big Data technology is closely related to customer classification, which is the accumulation of many attribute tags of individual customers; On the other hand, the personal credit evaluation of mobile phone users in the field of consumer finance is the application process of the combination and classification of many features in mobile phone user portrait.
1. Mobile phone user portrait and privacy protection
The so-called user portrait, refers to the individual user all the information label collection, namely collects and analyzes the user's population attribute, the social contact, the behavior preference and so on the main information, the user all the label synthesis, outlines this user's overall characteristic and the contour. Under the condition of internet economy, satisfying the individual demand of consumers becomes the main means of differentiated competition of operators, the user portrait can find the customer type more accurately, become the telecom operator to avoid the risk of pipeline, and realize the important help of "data-driven business and operation" .
The user portrait is an image metaphor. With the help of big data technology, we can learn more about mobile phone users, but due to the implementation costs and privacy protection restrictions, this image is not holographic "photography" or "video", is on-demand design, can not be infinitely refined, that is, the user "portrait" does not consider the cost and demand and has super high "pixel" is unrealistic. In general, user portraits are closely related to customer classification. Classify customer groups in big data analysis, such as cluster analysis, Judgment logic analysis and so on, can divide users into different categories according to the characteristics, these multidimensional angles of customer classification, forming a series of different attribute tags. For a single customer, it is the intersection of these classification ranges, that is, the accumulation of numerous labels on a single customer, making the customer image gradually plump and faintly present. At the same time, the superposition of many features can also be seen from the quantitative to qualitative changes in the "surge" phenomenon, on the basis of the label information, the ability to re-demand from the various characteristics of the combination, to form a relatively complete "large attribute" label, the implementation of further classification. From this point of view, the personal credit evaluation of mobile phone users is also the application process of many features in the image of mobile phone users.
As far as telecom operators are concerned, the principle of risk prevention should be the first step to strengthen the daily control and monitoring of system security, including: Process planning, permission grading, download monitoring, compliance patrol and other contents (see table 1). Second, the use of big data technology to do threat intelligence and security data analysis work. In recent years, with the continuous evolution of cyber security techniques, apt (advanced persistent Threat, senior persistent threat), which relies on social engineering, is not only targeted for user information, but is a long-term threat to the security of the entire network system for specific purposes. The advent of APT has promoted the concept of data security of telecom operators from the real-time defense of Vulnerability-centric to security analytic management based on threat intelligence. Security analytic management to use big Data technology, focus on the analysis of mining business system: ① Business process data: Including enterprise organization structure, business links, business chain, staff level and division of responsibilities, attendance records, etc., these data are often difficult to obtain directly from the machine, and help to search and locate the potential threat point ; ② network data: Includes FPC (full packet capture) PSTR (packet), session or flow data, ③ device, host, and application logs: including Web proxy logs, routers, firewall logs, VPN logs, Windows security and system logs, etc. ; ④ Alarm data: That is, the detection tool found abnormal and sent notifications. Through comprehensive data analysis, we can find out the signs of intrusion in time, and strive to successfully prevent and complete the traceability of the attackers before they complete their mission. Of course, the above-mentioned daily real-time defense is still the basis of security management, constitutes a security defense depth, or the subsequent security analysis can not be discussed.
Finally, the data desensitization is done for the purpose of protecting personal data privacy. Data desensitization mainly corresponds to the application and release of data analysis, at present the main technologies are: watermark, generalization, encryption, distortion, merging and so on. Among them, the watermark refers to the masking of the local information, and the distortion refers to the disturbance disposition of the original data by adding noise, but also to keep the original data statistic property unchanged; Encryption is the application of cryptography technology to encapsulate data, this method has the best protection effect but expensive; generalization is to generalize the data, A more abstract description, such as for age 18, can be generalized to the age range for [14,25]; merging is to quantify the numerical indicators according to a certain standard classification, the formation of attribute index parameters, such as the value of more than 5000 yuan mobile phone model, is divided into high-grade mobile phone class, recorded as parameter 1, The merging method takes into account the requirements of distortion and generalization, and is often used in user portraits.
At present, many departments which are closely related to the people's life, such as public security registration, social Security, Housing Provident Fund Management, are involved in a number of personal home address, social relations, professional experience, personal income and other sensitive information, hackers are the preferred target of attack; But these departments are not unworthy, In the continuous improvement of personal information security and privacy protection technology, the full use of big data technology to carry out comprehensive business for the people to serve. It should be said that other departments can do, telecommunications operators can also be able to do.
2. Main technical sources and technical framework
2.1 Main data sources
The data of mobile user portrait can be divided into four categories: population attribute data, social network data, behavioral preference data and other data.
2.1.1 Population attribute Data
Demographic data refers to the mobile phone user's name, age, gender, cell phone type, mobile phone user Unique identification, subscription package type and other basic information, as well as the extension of the mobile phone users of the actual telephone number, cell phone registration, identity card residential address. Since September 1, 2015, mobile phone card real-name registration system implementation, the original "temporary account", "Group card", "Agent Card", as well as mobile phone card registration information is not complete, the machine owner and the actual holder of the information does not match the phenomenon will be stopped, this part of the information will become the image of mobile phone users important basic data.
2.1.2 Social network data
Mobile phone is an important communication tool for people to communicate, from the phone users of the main called Communication records can be depicted in the user's social network. The study of social networks is divided into two forms, the first of which is to focus on the relationship between the other nodes and the core points, and the connection intensity, known as "self-centered network (Ego-centric networks)", with one person as the core point. The second form is based on the whole network as the core, in the specific scope of the network of all members, called "Social Center Network" (Socio-centric Networks), the research focus on the network structure and how the information spread within the network. Social networking techniques have static and dynamic interaction methods, and common research tools include Ucinet, Pajek, NWB, NodeXL, and Gephi software.
Mobile User portrait The social network is primarily about the first form of network, the personal-centric social network. Can be based on whether a period of time has been called, the length of the call, the time of the call, and other factors to mark the degree of social interaction and the relationship between the stability of the connection. For example, some people think that the call record, the main call more mobile phone users may be in a relatively dominant position, but only by a single indicator is prone to miscalculation, the main call more likely to be engaged in the logistics industry courier, and is called more mobile phone users, may also be a tour guide or conference Organization service personnel. Therefore, it is necessary to consider other factors, such as the length of two-way calls during a period of time. On the other hand, according to the idea of "flock together, flock together", some relevant information of the group with close connection in the social network of mobile phone users, such as ARPU value, the overall price level of mobile phone model, can also indirectly reflect the social environment and status of the user. In addition, social networks can alleviate the problem of asymmetric information and the impact of "reputation restraint" on mobile phone user behavior.
2.1.3 Behavioral Preference Data
There are two ways to get the internet behavior of mobile users. First, through the telecommunications operators themselves operating web logs to dig. such as China Telecom's "number Blackstone" website has travel, group purchase, performance, shopping, people's livelihood information query, as well as water and electricity, cable payment and other modules, mobile phone users log on to the site to browse and carry out shopping consumption, its behavior data can be recorded in the site log. The second is the signaling analysis via mobile internet. Unlike the former website log analysis, this part of the behavior data collection process is more complex, at present, the main is to carry out a GB message order analysis. Commonly used wireshark and compass and other signaling decoding analysis system, the acquisition of the GB message to decode the data translation, and access to the site domain name or through the text and image analysis of the user access to the content of the Web page to identify, and finally realize the Internet behavior analysis. Mobile phone users GPRS online process through 5 steps, respectively, the attachment process, PDP activation, WAP connection, data transmission and release of continuous. GB message acquisition is mainly in the "WAP connection phase", Internet signaling data acquisition methods include: Classification acquisition, switch Port image acquisition. Mobile phone users access to the data are: Start the Internet time (also known as online time), the end of the Internet time (offline time), the user live in the community, the type of web browsing, browsing the site of the traffic transmitted.
At the same time, using the communication base station location technology, the user's location and activity track can be recorded under the condition that the users allow.
2.1.4 Other Exception data
For example: In the real-name real-name phone, a user has multiple mobile phone number (more than 10), or a short period of frequent replacement of mobile phone numbers, as well as the amount of unpaid payments, in order to distinguish the unintended arrears, to focus on the accumulated overdue payment and the longest overdue records.
Of course, the data of the user portrait can be further combined according to the demand of the application scenario. For example, for a mobile user's social network, you can analyze the relationship between the stable and intimate network members (intimate relationship) on the basis of the (2.1.2) item, and further analyze the age structure and plans in these members (2.1.1), The Internet behavior in (2.1.3), especially the abnormal situation of the members with very individual connections in the 2.1.4, has become one of the main channels for detecting and finding clues in detection cases.
2.2 Mpp+hadoop Big Data Technology framework
The user portrait is not holographic, and the big data is characterized by large data volume but often the value of sparse, so to dig from the massive data of valuable characteristics, the premise is designed to provide cost-effective affordable big data technology solutions.
Similar to the phenomenon in economics, there has been a "ternary paradox" has plagued the data storage and query analysis of resource coordination and management, that is, because of its own attribute characteristics, Hadoop and MPP can only meet the following two functions, but not to meet all the requirements. Specifically, data analysis primarily achieves the following objectives:
(1) real-time. In this regard, the single-node execution system has a distinct advantage, and this aspect of MPP is outstanding, and other ways will weaken the real-time performance to some extent. While the latest spark technology has helped Hadoop improve real-time performance, implementation costs have been high and relevant technologies need to be further developed.
(2) extensible, that is, to increase the amount of data to expand. the MPP expansion to a certain extent will be limited by factors such as transmission, and Hadoop's mapreduce performance is better in this respect.
(3) The processing ability of complex data query and complex analysis. both Hadoop and MPP can be implemented by algorithms, but there is a difference between the difficulty and the proficiency level.
Figure 1: Carrier "Mpp+hadoop" Big Data technology framework
The "Mpp+hadoop" mix-and-match mode used by telecom operators can solve this problem (as shown in Figure 1). Among them, the MPP mainly for the BSS domain (Business support system) related data and part of the OSS domain (Network management support system) data, mainly including user identity information, fee bills, arrears information, package information, registered address and access network type, user terminal type, such as the accuracy and real-time requirements of the data information. Hadoop mainly focuses on MSS domain (Management support system), NSS domain (network security system) and some OSS domain (NMS) data, including active location trajectory, line time, communication duration, frequency, access to application time, Internet preferences, complaint information, user perception status, social network, Security threat intelligence, and so on.
3. Application examples of user portrait in credit collection
In March 2015, China Unicom and the investment bank Wing Lung Bank invested in the formation of the recruitment of consumer finance company, and actively carry out the Internet consumer finance business. Because of the characteristics of unsecured and unsecured, this kind of consumer financial business can adopt a relatively flexible credit policy and expand the credit range, but it also faces some risks. In order to improve the level of performance, it is necessary for telecom operators to manage personal consumption credit from the perspective of user portrait.
The essence of personal credit evaluation is a category identification problem. Due to the basic process of personal credit, which is scattered in different sources of local information, integration can be a complete description of consumer credit status, so the carrier's user portrait method is also applicable to personal credit assessment (2)
Figure 2: Application flow of Mobile user portrait in personal credit
It can be understood that the application of the so-called user portrait is the process of re-merging the user tags according to the situation and reordering the importance of the labels. It is generally thought that the sample of user portrait can be screened from the record of bank personal credit, but this kind of logical discriminant to the recovery of the loan that has been issued, will fall into the "priori misleading" in essence. Because, the selected loan business users have been the bank's wind control department conducted the necessary review and screening, resulting in credit overdue bad debts, is based on the pre-loan audit after the adoption of the sample, not a real complete sample of the first instance. Here, we take the people's Bank of Anhui branch as an example, to the provincial part of the individual users to apply for loans empirical analysis. First of all, select 3,525 Mobile phone users to apply for personal loans as a sample, the use of two methods for credit evaluation, one method is the bank according to the existing applicant's audit data for the letter, the other way is through the mobile phone user portrait method to credit. The steps are as follows:
3.1 First, the Bank risk control department staff according to the bank's own credit rating standard (analytic hierarchy process, referred to as the AHP method) all samples are graded; The specific indicators are shown in table 2. Generally divided into 9 levels, in order to further simplify the distinction, we take the bank final lending decision as the basis, the sample is divided into "can be granted" sample and "non-credit" sample two. (Of course, there is no guarantee that there will be no bad loans in the future, and a certain amount of small bad debt is a normal phenomenon in the banking business.)
3.2 According to the situation needs of the application of credit, the quantitative specification of mobile phone user portrait is re-merged and combined, as shown in table 3.
3.3 The samples were further divided into two parts, one for the test set, 60% for the sample, and the other for the test set, the number of samples accounted for 40%, and the two sample sets of both the available and non-trusted samples to occupy the same ratio.
3.4 Based on the bank's traditional grading results, using the support component machine in the supervised learning algorithm, the dimension reduction of the image features of the mobile phone user in the test sample was made, and the key attribute value combination was found. In the process of classification, the information obtained by many variables is also relatively high, and its judgment will be more correct. However, many variables, representing the cost and time to collect samples increased, the best way is to be able to use fewer variables, but can get good judgement correctness. In the personal credit, the effective user portrait indicator is shown in table 4 (due to constraints, not the implementation of mobile phone users online behavior data collection and analysis).
3.5 According to the above-mentioned mobile phone user image attribute value combination, the test samples are classified, and the classification results compared with the traditional bank AHP method, the compliance rate of 94.35%, the effect of the basic standards (as shown in table 5).
In particular, it is important to note that the number of available credit samples of the mobile user portrait method is less than that of the bank's traditional methods, which indicates that the method is stricter, more conservative and has better robustness. It can be found that the use of mobile phone user portrait In the way of credit in the "thin information" state, has a high effectiveness. Of course, in the "thick information" scenario, also has a certain amount of information to supplement the function, the comprehensive utilization effect is better.
4. Concluding remarks
In the context of "Internet +", big Data technology will continue to expand in the Business application field of telecom operators, and the continuous innovation of smartphone technology and function has triggered the change of consumption mode, the opening of industry chain and the data fusion in a wider range. On the premise of properly solving personal information security and privacy protection, customer-centric portrait of mobile phone users is helpful for operators to make full use of existing data resources, to realize accurate marketing and personalized service, and to implement useful exploration and practice in personal credit.
About the author
Ding Wei, senior engineer, PhD, mainly engaged in investment planning and big data analysis work; Wang Problem, senior engineer, Master, mainly engaged in communication network consulting planning and design work; Liu Nao Hai, associate researcher, PhD, mainly engaged in credit risk management, big data and internet finance. Han, research assistant, PhD, mainly engaged in smart city, emergency communications and big data analysis work.
Reference documents
[1] Hu Kun. Liu Minghui. Liu Dysprosium. Research on security control and privacy protection of telecom operators ' Application data [J]. Information and communication Technology, 2013 (6)
[2] Cho Hongming. The research on the data desensitization method of the operator for Big Data application [J]. Mobile communications, 2015 (13)
[3] Li Jingwen. Gongdapeng Wang Rui. Chen Ningjiang. Research on OSS domain data modeling and acquisition method based on Hadoop [J]. Telecommunications science, 2015, (1)
[4] symplectic Ishinghui. Chen Zhenyu. Research on network data sharing platform of telecom operators based on HADOOP+MPP architecture [J]. Telecommunications science, 2014, (4)
[5] Chen Ching Gold Zhang Yan Chen Cunchang. Big Data analysis in cloud computing environment [J]. Post and telecommunications design technology, 2015 (5)
[6] Wei Jinwu. Shihung. Zhang Jiheng. Li Wei. Key technologies for big data and operator landing recommendations [J]. Post and telecommunications design technology, 2015 (5)
[7] Zhangxiong. The application research of mobile phone customers ' internet behavior based on GB message order analysis [D]. Guangzhou: South China University of Technology, 2012
[8] Zhang Kang. The implementation scheme of mobile User portrait in big Data platform [J], Information and communication, 2014 (2)
[9] Pang ran. Research and application of WAP log mining [D]. Beijing: China posts and Telecommunications, 2008
[10] Chen Bo. Automatic similarity record matching method based on iterative SVM in the credit system [J]. Financial Computerization 2010 (4)
[11] Xu Hongco. Brenda. Liang. The least squares support vector machine regression model based on MapReduce [J]. Computer application research. 2015 (8)
[12] Liu Nao Hai Ding Wei. Big Data Credit practice of ZestFinance Corporation of America [J]. Credit, 2015 (8)
[13] Zhao Hua. Jinduo Xu Xiong. Telecom operators on Internet financial Services [J]. Guangdong Communication Technology, 2014 (10)
[14] Zhang Jianlong Yangfeng. Fuzzy evaluation of personal credit in the construction of China's credit system [J]. Journal of Shanxi University of Finance and Economics, 2007 (2)
[15] Yu Xiaoping. Pei. Feature analysis of mobile phone communication network [J]. Journal of Physics, 2013 (20)
[16] Zhang Yucai. Song Xinping. Royu. Research on customer credit evaluation based on fuzzy support vector machine [J]. Statistics and decision, 2008 (7)
[17] Xu Jin Fang Zhixiang Boshilen. Analysis of space-time differentiation of urban mass mobile phone users [J]. Journal of Earth Information Science, 2015 (2)
[18] How to conquer ' Big Data ' with mapreduce& MPP. http://venturebeat.com/2013/03/19/how-to-conquer-big-data-with-mapreduce-mpp/
[19] Big data Debate:will HBase dominate NoSQL. Http://www.informationweek.com/software/enterprise applications/big data debate would hbase dominate nosq/240159475
[20] C. dwork, F. McSherry, K. Nissim and et al. Calibrating noise to sensitivity on private data analysis. Theory of cryptography, 2006:265~284
Research on image and credit of mobile phone users based on big data technology