Use SQLite for website search
- Download the demo project-389 KB
- Database file-1.38 KB
- Download source code-111 KB
Introduction
If your website or program needs a text search engine, unless you write one, otherwise, you will hand over the data you want to search to a network search engine and then spend money to index your website, but now you can use my program to index your text/html/asp files. It will save keywords to the database for future search.
The program uses SQLite as the database. It is an open source database that can be used for free. For more information, see this website.
I use the. Net package of finsar to use SQLite. This package implements an ADO. Net driver for SQLite and is easy to use. Its dll can be downloaded from SourceForge.
Database Design
You can see the E/R link diagram of the database:
This figure illustrates the relationship between m <-> N. Based on this relationship, we convert itWord_pageTable (index table ).
Create a database (class dbwse)
I think a program should be able to create its own database, so it is a complete program, that is, its executable file does not need to carry an empty database. We useCREATE
SQL statement to create the required table.
Collapse
private void CreateDataBase(string path){try{FileStream fi = File.Create(path);fi.Close();DBFile = path;OpenDataBase();string strSQL = "Create Table words (" +"WID INTEGER PRIMARY KEY ," +"Word NVarChar(50)" +")";SQLiteCommand sqd = new SQLiteCommand(strSQL, sqconn);sqd.ExecuteNonQuery();sqd.CommandText ="Create Table pages(" +"PID INTEGER PRIMARY KEY ," +"path NVarChar(100) NOT NULL," +"nofw INTEGER NOT NULL," +"Date_add NVARChar(10) NOT NULL" +")";sqd.ExecuteNonQuery();sqd.CommandText ="Create Table word_page(" +"WID INTEGER FORIGEN KEY REFERENCES words (wid)," +"PID INTEGER FORIGEN KEY REFERENCES pages (pid)," +"QTY INTEGER NOT NULL," +"PRIMARY KEY (WID,PID)" +")";sqd.ExecuteNonQuery();}catch (Exception ex){throw ex;}}
The above Code creates an empty database file.
Add data to the word, page, and word_page tables
Now we need to store the data in the database.INSERT
Command.
public int AddPage(string path, int nofw){int i = PID(path);if (i < 0){// the page must be addstring strSQL = "INSERT INTO pages (path,nofw,Date_add)" +"VALUES ('" + path + "'," + nofw + ",'" +DateTime.Now.ToShortDateString() + "')";SQLiteCommand sqc = new SQLiteCommand(strSQL, sqconn);sqc.CommandText = strSQL;sqc.ExecuteNonQuery();//return PIDi = PID(path);}return i;} |
The "PID function" is used to return the ID (page ID) of the new page. Next we store the page data in the database.
private int createWord(string word){string strsql = "INSERT INTO words (word) VALUES ('" + word + "')";SQLiteCommand sqc = new SQLiteCommand(strsql, sqconn);sqc.ExecuteNonQuery();return WID(word);} |
This function adds a keyword to the "word" table.
public void AddWord(int page, string word, int QTY){int i = WID(word);if (i < 0)i = createWord(word);string strsql = "INSERT INTO word_page (WID,PID,QTY) VALUES " +"(" + i.ToString() + "," + page.ToString() +"," + QTY.ToString() + ")";SQLiteCommand sqc = new SQLiteCommand(strsql, sqconn);sqc.ExecuteNonQuery();} |
According to the current design, the keywords cannot be repeated. If a word is found during query, the ID corresponding to the word is returned.(WID)
Otherwise-1 is returned. Of course, we must first store words in the words table. In fact, this function does not store more redundant data when storing page indexes.
Search databases
Now we can search for a word in the database.
public DataTable SearchWord(string word){string strSQL = "SELECT pages.path,pages.nofw,pages.Date_add,word_page.QTY " +"FROM words INNER JOIN word_page ON words.wid=word_page.wid" +" INNER JOIN pages ON word_page.pid=pages.pid" +" WHERE words.word='"+word+"'";SQLiteDataAdapter sqd = new SQLiteDataAdapter(strSQL, sqconn);DataTable dt = new DataTable();sqd.Fill(dt);return dt;} |
Function returnDataTable
Objects are used in programs.
Create an index for data stored in the database
Now you have saved all the data in the database. To use the data, we need to create an index for these pages!
It was not difficult at the beginning! We can use some delimiters to split an article into words. for example: space, ')', '(', '[', '\', etc., but this is not very accurate.
You must remove unimportant or repeated words before creating an index.
Split Article Section
string[] split = words.Split (new Char[] { ' ', ',', '.', ':',';','{','}','[',']','(',')','\'','"','\\','<','>','=','+','-','/','*','#','&','%','^','`','~','0','1','2','3','4','5','6','7','8','9' }); |
Create a keyword list (class scanfile) fold
private void StartMonitoring(string p){//first step must read fileStreamReader stR = File.OpenText(p);string words = stR.ReadToEnd();stR.Close();string[] split = words.Split(new Char[] { ' ', ',', '.', ':',';','{','}','[',']','(',')','\'','"','\\','<','>','=','+','-','/','*','#','&','%','^','`','~','0','1','2','3','4','5','6','7','8','9' });max = split.Length;int index;int k = 0;list1.Clear();for (int i = 0; i < split.Length; i++){WordInfo word = new WordInfo();word.Word = split[i].Trim();if (word.Word.Length > 2){SearchableWord = word.Word;index = list1.FindIndex(HaveWord);if (index < 0)//not found{word++;list1.Add(word);k++;}else{// increment count of wordlist1[index]++;k++;}}// Progress( this is a event )OnProgress(System.EventArgs.Empty);}total = k;}private static bool HaveWord(WordInfo str){if (str.Word.ToUpper() == SearchableWord.ToUpper())return true;elsereturn false;}
In this function, I usedWordInfo
AndList
Class, and creates a new event for the retrieved file"OnProgress
".
SCF.Progress += new EventHandler(SCF_Progress); |
To findList)
To define a delegate:
private static bool HaveWord(WordInfo str) |
However, we cannot find the word with this parameter. To solve this problem, I added a private variable for string comparison:
private static string SearchableWord; |
Class replusetrivial)
To exclude words that have no value, you must have a list of these words. I want to save these words to the database so that SQL statements can be used conveniently. therefore, I created a garbage vocabulary and used insert/Delete to add and delete these words.
When you createRepluseTrivial
Object, you need to pass it a word list and then callRepulse
Function to remove valuable words.
public void Repluse(){List<wordinfo> temp = new List<wordinfo>();for (int i = 0; i < list1.Count; i++)if (!IsTrivial(list1[i].Word))temp.Add(list1[i]);list1 = temp;}</wordinfo></wordinfo> |
Function callIsTrivial
Function to check whether it is a Spam word.
How to use my code
Now you have a list of spam words and a database for storing these words. Use the algorithm mentioned above to call the "search function" to search for them. of course, this function can be further developed to hide indexes like Google and Microsoft engines. (efficiency cannot be compared ).
We can use a progress bar to display the progress:
public Form1(){InitializeComponent();SCF = new ScanFile();SCF.Progress += new EventHandler(SCF_Progress);}void SCF_Progress(object sender, EventArgs e){progressBar1.Maximum = SCF.MaxProgress;progressBar1.Value++;} |
Create a list to remove spam words:
private void button2_Click(object sender, EventArgs e){progressBar1.Value = 0;SCF.Scan(textBox1.Text);label2.Text = "Total Word:"+SCF.Total.ToString();List<wordinfo> list1=new List<wordinfo>();list1 = SCF.WordList;string str =Path.GetDirectoryName(Application.ExecutablePath)+"\\Trivial.db3";RepluseTrivial rt = new RepluseTrivial(str);list1 = rt.Repluse(list1);listView1.Items.Clear();for (int i = 0; i < list1.Count; i++){ListViewItem li = new ListViewItem(new string[] { i.ToString(),list1[i].Count.ToString(), list1[i].Word });listView1.Items.Add(li);}}</wordinfo></wordinfo> |
Save the list to the database:
private void button3_Click(object sender, EventArgs e){string str = Path.GetDirectoryName(Application.ExecutablePath);DBWSE db = new DBWSE(str+""\\WSEDB.db3");int i=db.AddPage(textBox1.Text,SCF.Total);progressBar1.Maximum = SCF.WordList.Count;progressBar1.Value = 0;for (int j=0;j < SCF.WordList.Count;j++){db.AddWord(i, SCF.WordList[j].Word, SCF.WordList[j].Count);progressBar1.Value++;}} |
You can easily display the query results inDataGridView
Control:
private void button7_Click(object sender, EventArgs e){string str = Path.GetDirectoryName(Application.ExecutablePath);DBWSE db = new DBWSE(str + "\\WSEDB.db3");dataGridView2.DataSource = db.SearchWord(textBox3.Text);} |