Term frequency analyzer

Source: Internet
Author: User

Reprinted please indicate the source: http://blog.csdn.net/u012027907

Sometimes we often count the number of words in an article. At this time, we need a word frequency analyzer to solve this problem.

Basic Idea:

First save the string in the file or input by the user, and then scan backward from the first character. When a letter is encountered, add it to a string first, and then scan backward, if the character string is still a letter, it is connected to the character string. If it is not a letter, the character string is a word, so that all characters are scanned in turn. Of course, when scanning, we need to compare it with the stored words in sequence. If it is the same, we will not add any more words. Just add one to the number of words in the word.

SeeSee:

I also saved the query results to the Access database, where ADO was used to Access the database. The last time I split the ADO connection class (ADOConn, you can only use it this time.

Code:

Void ADOConn: OnInitADOConn () {// CString sysPath = "C: \ fg \ MuscicPlayer \"; HRESULT hr;: CoInitialize (NULL ); // initialize the OLE/COM environment hr = m_pConnection.CreateInstance ("ADODB. connection "); // create a connection object // we recommend that you use try... catch () to capture the error message, // because it sometimes produces unexpected errors. Try {if (SUCCEEDED (hr) {char * dbPath = "Provider = Microsoft. jet. OLEDB.4.0; Data Source = WordData. mdb "; hr = m_pConnection-> Open (_ bstr_t (dbPath)," "," ", adModeUnknown);} // _ bstr_t strConnect =" Provider = SQLOLEDB; server = LENOVO-PC; DataBase = Study; uid = sa; pwd = zyc123 "; // m_pConnection-> Open (strConnect," "," ", adModeUnknown );} // catch exception catch (_ com_error e) {e. description () ;}} void ADOConn: ExitConnect () {// close record set and connection if (m_pRecor Dset! = NULL) m_pRecordset-> Close (); m_pConnection-> Close (); // release environment: CoUninitialize ();} _ RecordsetPtr & ADOConn: GetRecordSet (_ bstr_t bstrSQL) {try {// connect to the database. if the conection object is NULL, reconnect to the database if (m_pConnection = NULL) OnInitADOConn (); // create the record set object m_pRecordset.CreateInstance (_ uuidof (Recordset); // retrieve the table record m_pRecordset-> Open (bstrSQL, explain (), adOpenDynamic, adLockOptimistic, adshorttext);} catch (_ com_error e) {e. description () ;}// return record set return m_pRecordset;} BOOL ADOConn: ExecuteSQL (_ bstr_t _ bstrSQL) {_ variant_t RecordsAffected; try {// whether the database has been connected if (m_pConnection = NULL) OnInitADOConn (); m_pConnection-> Execute (_ bstrSQL, NULL, ad1_text); return true ;} catch (_ com_error e) {e. description (); return false ;}}


Of course, it is best to use a Data Structure for storage.

//////////// Structure of all words to be separated /////// typedef struct WordStore {char word [30] ;} wordstore; //////// structure for storing the sorted words and numbers //////// typedef struct WordStoreAll {float frequency; int number; char word [30];} wordstoreall;

 

Main class:

//////////////////////////////////////// // Class ReadWord {int AllWord; int Count; bool AddTrue; wordstore WordNode [MAX]; wordstoreall Word [MAX]; // use ADOConn m_AdoConn for database operations; _ RecordsetPtr m_pRecordset; _ ConnectionPtr m_pConnection; char * tablename; public: int Choice; public: ReadWord (); int GetCount () {return Count;} bool GetAddTrue () {return AddTrue;} char * ReadFromText (); char * ReadFromScreen (); void strcopy (char * dest, const char * sour); void Transform (char * str); void GetEveryWord (); void Countword (); void Frequency (); void Order (); void Print (int X); void color (int a); // ------- database operation ---------------- void IsAddToAccess (); // determine whether the user adds the analysis result to the database void AddToAccess (); // Add the analysis result to the database void Select (); // query the database data void CreateTable (); void GetTableName (int n); void ReadTableName (); void WriteTNameToFile ();};
/// // # Include "Readword. h "////////////////////////////////////// constructor // ReadWord:: ReadWord () {AddTrue = false ;} /////////////// add some color SeeSee /////////////////// //// // void ReadWord:: color (int a) // color function {SetConsoleTextAttribute (GetStdHandle (STD_OUTPUT_HANDLE), );} /////////// read information from the screen to the string ////////// char * ReadWord :: readFromScreen () {char * string; string = (char *) m Alloc (1000); printf ("Enter:"); // fflush (stdout); gets (string); return string ;} //////////////////////////////////////// //////////// read information from the file to the string ////////// char * ReadWord:: ReadFromText () {FILE * fp; char * string; char FileName [30]; string = (char *) malloc (5000); if (NULL = string) {printf ("memory allocation failed! \ N "); return NULL;} printf (" Enter the file name (no suffix): "); scanf (" % s ", FileName ); // gets (FileName); strcat (FileName ,". txt "); fp = fopen (FileName," rb "); if (NULL = fp) {printf (" file opening error \ n "); return NULL ;} fgets (string, 5000, fp); fclose (fp); return string ;} ///////////// convert non-word characters into spaces ////////// void ReadWord :: transform (char * str) {while (* str! = '\ 0') {if (* str <'A' | * str> 'Z') * str = ''; str ++ ;}} //// // copy the function // void ReadWord :: strcopy (char * dest, const char * sour) {while (* sour! = '\ 0') {* dest ++ = * sour ++;} * dest =' \ 0 ';} /////////// separate each word ///// // void ReadWord:: GetEveryWord () {char tempw [30]; char tempc [2]; char * String; bool m_flag = false; int I = 0, n = 0; if (1 = Choice) String = ReadFromText (); elseString = ReadFromScreen (); strlwr (String); // convert uppercase letters to lowercase letters Transform (String ); while (1) {if (* String! = '\ 0') {tempc [0] = * String; if (''! = * String) {tempw [I ++] = tempc [0]; m_flag = true;} else {if (m_flag) {m_flag = false; tempw [I] = '\ 0'; strcopy (WordNode [n]. word, tempw); n ++; for (; I> = 0; I --) tempw [I] = 0; I = 0 ;}}} else if (* (-- String )! = '') // In this case, the last word in the file cannot be counted because it is not punctuated. {tempw [I] = '\ 0 '; strcopy (WordNode [n]. word, tempw); n ++; for (; I> = 0; I --) tempw [I] = 0; I = 0; break;} elsebreak; string ++;} AllWord = n; // get the total number of words separated} // count all words extracted/ ////// void ReadWord:: Countword () {int I, j; int count = 0, m_flag = 0; for (I = 0; I <AllWord; I ++) Word [I]. number = 1; for (I = 0; I <AllWord; I ++) {m_flag = 0; for (j = 0; j <AllWord; j ++) {if (! Strcmp (Word [j]. word, WordNode [I]. word) // if the same Word exists, the corresponding number plus one {word [j]. number ++; m_flag = 1 ;}} if (0 = m_flag) // if the original words are not the same, add this Word to Word {strcopy (Word [count]. word, WordNode [I]. word); count ++;} Count = count ;} ///////////// calculate the frequency of each word /////////// void ReadWord :: frequency () {int I; for (I = 0; I <Count; I ++) {Word [I]. frequency = (float) Word [I]. number/Count ;}} ///////////// bubble sort ////////////////////// /// void ReadWord:: Order () {char tempstr [30]; int tempnum; float tempfre; int m, n; for (m = 1; m <Count; m ++) for (n = 0; n <Count-m; n ++) {if (strcmp (Word [n]. word, Word [n + 1]. word)> 0) {// exchange the Word strcopy (tempstr, word [n]. word); strcopy (Word [n]. word, Word [n + 1]. word); strcopy (Word [n + 1]. word, tempstr); // Number of exchanged words tempnum = Word [n]. number; Word [n]. number = Word [n + 1]. number; Word [n + 1]. number = tempnum; // Exchange Frequency tempfre = Word [n]. frequency; Word [n]. Frequency = Word [n + 1]. frequency; Word [n + 1]. frequency = tempfre ;}}} /// // output result /////////////////// /// void ReadWord:: Print (int X) {int I; printf ("analyze the Word Frequency of the text as follows: \ n "); printf ("No. \ t word \ t number \ t frequency \ n"); for (I = 0; I <X; I ++) {printf ("%-3d \ t %-10s \ t % d \ t %. 2f % \ n ", I + 1, Word [I]. word, Word [I]. number, Word [I]. frequency * 100 );}} ///////////// select the query result from the database ///////// // void ReadWord :: select () {m_AdoConn.OnInitADOConn ();/ /Initialize the Link Library Class // char * SQL = "Select * From Wordtable"; char * q = ""; char SQL [50]; char * Sql1 = "SELECT * FROM"; strcopy (SQL, Sql1); strcat (SQL, tablename); strcat (SQL, q); m_pRecordset = m_AdoConn.GetRecordSet (_ bstr_t) SQL); // open and obtain the record set m_pConnection.CreateInstance (_ uuidof (Connection); // create the connection object _ variant_t var; char strword [30]; int number; double frequen; int I = 0; try {if (! M_pRecordset-> adoBOF) // The data in the table is not empty. Move the record and pointer to the first m_pRecordset-> MoveFirst (); else {printf ("\ n the data table is empty! \ N "); m_AdoConn.ExitConnect (); return;} printf (" analyze the Word Frequency of the text as follows: \ n "); printf ("No. \ t word \ t number \ t frequency \ n"); // read each field in the database and output while (! M_pRecordset-> adoEOF) {var = m_pRecordset-> GetCollect ("Word"); // obtain the Word field information of a record if (var. vt! = VT_NULL) strcopy (strword, (LPCSTR) _ bstr_t (var); var = m_pRecordset-> GetCollect ("Num "); // obtain the Num field information of a record if (var. vt! = VT_NULL) number = var. intVal; var = m_pRecordset-> GetCollect ("Frequency"); if (var. vt! = VT_NULL) frequen = var. dblVal; printf ("%-3d \ t %-10s \ t % d \ t %. 2lf % \ n ", I + 1, strword, number, frequen * 100); I + = 1; m_pRecordset-> MoveNext ();}} catch (_ com_error * e) {// AfxMessageBox (e-> ErrorMessage (); e-> Description ();} m_AdoConn.ExitConnect ();} //// // whether the user agrees to add the result to the database /// // void ReadWord :: isAddToAccess () {char button; printf ("\ n Add the result to the database? Press [Y] or another key. \ N "); fflush (stdin); scanf (" % c ", & button); if (button = 'y' | button = 'y ') {AddTrue =! AddTrue; GetTableName (1); CreateTable (); AddToAccess ();}} //// // Add the result to the database ////////////////// ////// void ReadWord:: AddToAccess () {m_AdoConn.OnInitADOConn (); char * q = ""; char SQL [50]; char * Sql1 = "SELECT * FROM"; strcopy (SQL, Sql1 ); strcat (SQL, tablename); strcat (SQL, q); m_pRecordset = m_AdoConn.GetRecordSet (_ bstr_t) SQL ); // open and obtain the record set m_pConnection.CreateInstance (_ uuidof (Connection); // create connection object _ var Iant_t var; int I; try {if (AddTrue) {for (I = 0; I <Count; I ++) {m_pRecordset-> AddNew (); m_pRecordset-> PutCollect ("Word", _ variant_t (Word [I]. word); m_pRecordset-> PutCollect ("Num", _ variant_t (long) Word [I]. number); m_pRecordset-> PutCollect ("Frequency", _ variant_t (float) Word [I]. frequency); m_pRecordset-> Update () ;}} printf ("\ n added successfully! \ N "); m_AdoConn.ExitConnect ();} catch (...) {printf ("operation failed \ n ");}} /////////// store the table name entered by the user to the file //////////////// void ReadWord:: WriteTNameToFile () {char * Table, * str = ""; Table = (char *) malloc (20); strcopy (Table, str); strcat (Table, tablename ); strcat (Table, str); FILE * fp; if (fp = fopen ("TableNameList.txt", "AB +") = NULL) {printf ("file cannot be opened \ n"); return;} if (fwrite (Table, strlen (Table) * sizeof (char), 1, fp )! = 1) printf ("file write error! \ N "); fclose (fp );} //// // read the table name from the saved file ////////////// //// // void ReadWord:: ReadTableName () {FILE * fp; char * string; char * FileName = "TableNameList.txt"; string = (char *) malloc (300); if (NULL = string) {printf ("memory allocation failed! \ N "); return;} fp = fopen (FileName," rb "); if (NULL = fp) {printf (" file opening error \ n "); return ;} fgets (string, 300, fp); fclose (fp); printf ("available table names: \ n"); printf ("% s \ n ", string );} //// // obtain the name of the table to be created or the name of the table to be queried //// // void ReadWord:: GetTableName (int n) {tablename = (char *) malloc (20); fflush (stdin); if (1 = n) {printf ("enter the name of the table to be created:"); gets (tablename); WriteTNameToFile () ;}else {ReadTableName (); printf ("enter the name of the table you want to query:"); gets (Tablename ); /// // judgment} // create a table //// //// // void ReadWord:: CreateTable () {m_AdoConn.OnInitADOConn (); // _ bstr_t SQL; char SQL [100]; char * Sql1 = "CREATE TABLE "; char * Sql2 = "(Word varchar (30), Num int, Frequency float)"; strcopy (SQL, Sql1); strcat (SQL, tablename); strcat (SQL, sql2); if (! M_AdoConn.ExecuteSQL (_ bstr_t) SQL) {printf ("\ n failed to create the table! \ N "); return;} m_AdoConn.ExitConnect ();}

Reprinted please indicate the source: http://blog.csdn.net/u012027907
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.