countvectorize
From: Python Learning text feature extraction (ii) Countvectorizer tfidfvectorizer Chinese processing-CSDN Blog
80930801
The commonly used data input form is: list, list element is the string representing the article, a string represents an article, the string is already split goodCountvectorizer is also available in Chinese
parameter Table |
function |
Stop_words |
to deactivate a glossary; customize the Stop vocabulary |
Token_pattern |
filter rules; |
property sheet |
function |
Vocabulary_ |
A glossary; a typical word. |
Get_feature_names () |
Vocabulary of all text; list type |
Stop_words_ |
Return to Inactive glossary |
Countvectorizer is to convert the words in the text to the word frequency matrix by the Fit_transform function, and the matrix element A[i][j] to denote the word frequency of the J words under the I text. That is, the number of occurrences of each word, through get_feature_names () can see all the text of the keyword, through the toarray () can see the results of the word frequency matrix.
Method Table |
function |
Fit_transform (X) |
Fit the model and return the text matrix |
python--text Feature Extraction Countvectorize