Topic model, model
Http://blog.csdn.net/pipisorry/article/details/42129099
Step 1: install gensim
Step 2: configure a and Vector Spaces
Convert a document represented by a string to a document vector represented by an id:
Documents = ["Human machine interface for lab abc computer applications", "A survey of user opinion of computer system response time", "The EPS user interface management system ", "System and human system engineering testing of EPS", "Relation of user perceived response time to error measurement", "The generation of random binary unordered trees ", "The intersection graph of paths in trees", "Graph minors IV Widths of trees and well quasi ordering ", "Graph minors A survey"] "" # use StemmedCountVectorizer to get stemmed without stop words corpusVectorizer = Hangzhou # Vectorizer = Hangzhou = Vectorizer (stop_words = 'English ') vectorizer. fit_transform (documents) texts = vectorizer. get_feature_names () # print (texts) "" texts = [doc. lower (). split () for doc in documents] # print (texts) dict = documents. dictionary (texts) # self-built Dictionary # print dict, dict. token2id # Use dict to convert a document represented by a string to a document vector indicated by an id: corpus = [dict.doc 2bow (text) for text in texts] print (corpus)
[Http://www.52nlp.cn/javase]
From: http://blog.csdn.net/pipisorry/article/details/42129099
Ref: http://radimrehurek.com/gensim/tutorial.html