Basic Flow:
Data collection, data modeling, building of data portraits, risk pricing.
data collection : Network behavior data, behavior data within Enterprise service, User Content preference data, user transaction data, authorized data source, third party data source, partner data source, public data source.
Data Modeling : Text mining, natural language processing, machine learning, predictive algorithms, clustering algorithms.
Data Portraits : Basic attributes, purchasing ability, behavioral characteristics, hobbies, psychological characteristics, social networks.
Risk Pricing : Application model, behavioral monitoring model, default model, and collection strategy model.
At present, more than 90% of the modeling team in China use logistic regression to do scorecard, a few use decision trees.
Application Case:
1, the model of fraud risk: social network model.
Judging the new case is the possibility of fraudulent application through the relationship between each case.
2, the main use of credit risk model: Logistic regression to establish a scorecard.
Quantify the probability that a new applicant may default, and establish different credit rules and collection strategies based on the score.
3, after the loan management to use the model: is also a behavioral scorecard.
such as quota adjustment and customer risk sub-pool management.
Attention:
User data needs to be structured and transformed into a feature vector of the same dimension, and then the algorithm can be useful.
For structured data, feature extraction often begins with labeling data. such as buying channels, age and gender, and so on.
For already typed labels, discretization according to different analysis scenarios, or splitting the label of the classification type into multiple 0/1 tags, you can do some machine learning modeling, such as clustering, classification, prediction, correlation analysis, resulting in a vector dimension of thousands of.
Actual modeling case: http://blog.csdn.net/l18930738887/article/details/50662900
Big Data Wind control model