The problem of vector clustering based on w2v words (to be solved)

Source: Internet
Author: User

1. The training word vector code is as follows:
#训练词语为向量表示
DefW2v_train (Self):

ques = self.cu.execute (' Select question from activity ')#将所有问题内容作为预料训练一个w2v模型
Da_all = []
For D in ques:
Da_all.append (d[0])
sentences = Self.get_text (da_all)
Model = Word2vec ()
Model.build_vocab (sentences)
Model.train (sentences,total_examples = Model.corpus_count,epochs = model.iter)
Model.save ("./tmp/user_w2corpus")
The result of the training for a word a vector
2. Re-remove each question from a user for word segmentation, then cluster
DefSimmetric_topic_a (Self, Clust_num, UserID):
From Sklearn.clusterImport Kmeans
From Sklearn.externalsImport Joblib
texts=Self.get_dict (userid) [1]# Vocabulary
texts_len=Len (texts)
Model = Gensim.models.Word2Vec.load ('./tmp/user_w2corpus ')
Texts_vec=[]#将每个计算完单个句子的向量的结果存储到该列表即返回句子向量
X=[]
For textIn texts:#将每个句子循环一次
Text_vec=np.zeros ((100,))#由于默认的w2v训练得到的向量维度为100, so initialize to 100, start initializing to 0, but if there is only one word in the sentence and the word is not trained, the dimension cannot be aligned with the previous
For Tin text: #每个句子中的每个词汇的向量求和
try:
# Text_vec+=model[t] #得到句子向量
X.append (Model[t]) # Adding a word to X, if it appears in more than one document, adds x multiple times
Except exception as e:
print ( "the vector set of training does not leave the word '

The problem of vector clustering based on w2v words (to be solved)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.