Machine Learning Practice decision tree glasses men buy glasses

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Decision tree is an extremely easy-to-understand algorithm. After a model is built, it is a series of nested if... else... or nested switches.

Advantages: low computing complexity, easy to understand output results, insensitive to missing median values, and the ability to process irrelevant feature data;

Disadvantages: excessive matching may occur;

Applicable data types: numeric and nominal.

Python implementation of decision tree:

(1) implement several tool functions: Calculate the entropy function, divide the dataset tool function, and calculate the maximum probability attribute;

(1) entropy calculation: entropy indicates the degree of disorder in the set. The more unordered the set, the greater the entropy;

def entropy(dataset):from math import log  log2 = lambda x:log(x)/log(2) results={}  for row in dataset:  r = row[len(row)-1]results[r] = results.get(r, 0) + 1ent = 0.0for r in results.keys():  p = float(results[r]) / len(dataset)  ent=ent-p*log2(p)  return ent

(2) Obtain a dataset Based on attributes and values:

def fetch_subdataset(dataset, k, v):return [d[:k]+d[k+1:] for d in dataset if d[k] == v]

This function has only one short row. It indicates that the subset of column K value of V is obtained from the dataset sequence, and the column K is removed from the obtained Subset. Python is simple and elegant.

(3) calculate the maximum probability attribute. When building a decision tree, when processing all decision attributes, we cannot uniquely differentiate data, we use the majority voting method to select the final classification:

def get_max_feature(class_list):class_count = {}for cla in class_list:class_count[cla] = class_count.get(cla, 0) + 1sorted_class_count =  sorted(class_count.items(), key=lambda d: d[1], reverse=True) return sorted_class_count[0][0]

(2) functions for selecting the optimal data partitioning method:

Select the optimal Partitioning Method for the set: Which column of values is used to divide the set to obtain the maximum information gain?

def choose_decision_feature(dataset):ent, feature = 100000000, -1for i in range(len(dataset[0]) - 1):feat_list = [e[i] for e in dataset]unq_feat_list = set(feat_list)ent_t = 0.0for f in unq_feat_list:sub_data = fetch_subdataset(dataset, i, f)ent_t += entropy(sub_data) * len(sub_data) / len(dataset)if ent_t < ent:ent, feature = ent_t, ireturn feature

(3) recursive decision tree construction:

def build_decision_tree(dataset, datalabel):cla = [c[-1] for c in dataset]if len(cla) == cla.count(cla[0]):return cla[0]if len(dataset[0]) == 1:return get_max_feature(dataset)feature = choose_decision_feature(dataset)feature_label = datalabel[feature]decision_tree = {feature_label:{}}del(datalabel[feature])feat_value = [d[feature] for d in dataset]unique_feat_value = set(feat_value)for value in unique_feat_value:sub_label = datalabel[:]decision_tree[feature_label][value] = build_decision_tree(\fetch_subdataset(dataset, feature, value), sub_label)return decision_tree

(4) using decision trees

def classify(decision_tree, feat_labels, testVec):label = decision_tree.keys()[0]next_dict = decision_tree[label]feat_index = feat_labels.index(label)for key in next_dict.keys():if testVec[feat_index] == key:if type(next_dict[key]).__name__ == 'dict':c_label = classify(next_dict[key], feat_labels, testVec)else:c_label = next_dict[key]return c_label

(5) Decision Tree persistence

(1) Storage

def store_decision_tree(tree, filename):import picklef = open(filename, 'w')pickle.dump(tree, f)f.close()

(2) read

def load_decision_tree(filename)：import picklef = open(filename)return pickle.load(f)

(6) at the end of the day, it is time to return to the topic and give the glasses to the male.

The following contact lens dataset comes from the UCI database, which contains the observed conditions for eye condition of many patients and the contact lens type recommended by doctors, contact lenses include hard materials, soft materials, and unsuitable contact lenses.

The data is as follows:

youngmyopenoreducedno lensesyoungmyopenonormalsoftyoungmyopeyesreducedno lensesyoungmyopeyesnormalhardyounghypernoreducedno lensesyounghypernonormalsoftyounghyperyesreducedno lensesyounghyperyesnormalhardpremyopenoreducedno lensespremyopenonormalsoftpremyopeyesreducedno lensespremyopeyesnormalhardprehypernoreducedno lensesprehypernonormalsoftprehyperyesreducedno lensesprehyperyesnormalno lensespresbyopicmyopenoreducedno lensespresbyopicmyopenonormalno lensespresbyopicmyopeyesreducedno lensespresbyopicmyopeyesnormalhardpresbyopichypernoreducedno lensespresbyopichypernonormalsoftpresbyopichyperyesreducedno lensespresbyopichyperyesnormalno lenses

The test procedure is as follows:

def test():f = open('lenses.txt')lense_data = [inst.strip().split('\t') for inst in f.readlines()]lense_label = ['age', 'prescript', 'astigmatic', 'tearRate']lense_tree = build_decision_tree(lense_data, lense_label)

The test results here are as follows:

Glasses men can finally buy the right glasses...

All the code is stuck below:

def entropy(dataset):from math import log  log2 = lambda x:log(x)/log(2) results={}  for row in dataset:  r = row[len(row)-1]results[r] = results.get(r, 0) + 1ent = 0.0for r in results.keys():  p = float(results[r]) / len(dataset)  ent=ent-p*log2(p)  return ent  def fetch_subdataset(dataset, k, v):return [d[:k]+d[k+1:] for d in dataset if d[k] == v]def get_max_feature(class_list):class_count = {}for cla in class_list:class_count[cla] = class_count.get(cla, 0) + 1sorted_class_count =  sorted(class_count.items(), key=lambda d: d[1], reverse=True) return sorted_class_count[0][0]def choose_decision_feature(dataset):ent, feature = 100000000, -1for i in range(len(dataset[0]) - 1):feat_list = [e[i] for e in dataset]unq_feat_list = set(feat_list)ent_t = 0.0for f in unq_feat_list:sub_data = fetch_subdataset(dataset, i, f)ent_t += entropy(sub_data) * len(sub_data) / len(dataset)if ent_t < ent:ent, feature = ent_t, ireturn featuredef build_decision_tree(dataset, datalabel):cla = [c[-1] for c in dataset]if len(cla) == cla.count(cla[0]):return cla[0]if len(dataset[0]) == 1:return get_max_feature(dataset)feature = choose_decision_feature(dataset)feature_label = datalabel[feature]decision_tree = {feature_label:{}}del(datalabel[feature])feat_value = [d[feature] for d in dataset]unique_feat_value = set(feat_value)for value in unique_feat_value:sub_label = datalabel[:]decision_tree[feature_label][value] = build_decision_tree(\fetch_subdataset(dataset, feature, value), sub_label)return decision_treedef store_decision_tree(tree, filename):import picklef = open(filename, 'w')pickle.dump(tree, f)f.close()def load_decision_tree(filename):import picklef = open(filename)return pickle.load(f)def classify(decision_tree, feat_labels, testVec):label = decision_tree.keys()[0]next_dict = decision_tree[label]feat_index = feat_labels.index(label)for key in next_dict.keys():if testVec[feat_index] == key:if type(next_dict[key]).__name__ == 'dict':c_label = classify(next_dict[key], feat_labels, testVec)else:c_label = next_dict[key]return c_labeldef test():f = open('lenses.txt')lense_data = [inst.strip().split('\t') for inst in f.readlines()]lense_label = ['age', 'prescript', 'astigmatic', 'tearRate']lense_tree = build_decision_tree(lense_data, lense_label)return lense_treeif __name__ == "__main__":tree = test()print tree

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine Learning Practice decision tree glasses men buy glasses

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine Learning Practice decision tree glasses men buy glasses

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support