asp.net C # Lucene usage method

Source: Internet
Author: User
Tags foreach datetime json static class

But do not think Lucene is a Google-like search engine, Lucene is not even an application, it is just a tool, a library. You can also interpret it as an index, The search feature encapsulates a good set of Easy-to-use APIs. With this API you can do a lot of things about search, and it's very convenient.

Lucene can index and search any data. Lucene can be analyzed and utilized by Lucene, regardless of the format of the data source, as long as it can be translated into the form of text. That is, whether it's MS Word, Html, PDF or some other form of file as long as you can extract the text form of content can be used by Lucene. You can index and search them in Lucene.


1, first look at the code structure

The program is divided into 3 parts, lucenetest is the warehousing procedures, provided by the external text, read, warehousing;

Pangu.Lucene.Analyzer is a parser, based on Pangu Word segmentation tool;

Website is an externally provided service interface, which is open to index data;

2, the analysis of the text, the storage Establishment index:

Official examples

1. Index creation

IndexWriter writer = new IndexWriter ("Index", New StandardAnalyzer (), true);
Indexdocs (writer, new System.IO.FileInfo (args[0));
Writer. Optimize ();
Writer. Close ();

A indexwriter is a class that writes to an index and uses it to create an indexed object and then add files to it. Note that it is not the only class that can modify an index. You can also modify it by using other classes after the index is built.
The first parameter of the constructor is the name of the folder where the indexed index is to be placed. The second parameter is an analysis object that is primarily used to extract content from the text that needs to be indexed, and to remove text content that does not need to be indexed. For example, remove some common words such as a, There is also a decision as to whether the case is sensitive or not. Different options are controlled by specifying different profiling objects. The third parameter is used to determine whether the original index is overwritten.
The second step is to use this writer to add files to the index.
The third step is to optimize.
Step fourth closes the writer.
Project Code:
Using Baitone.DSP.ClearModel;
Using Lucene.Net.Analysis.PanGu;
Using Lucene.Net.Documents;
Using Lucene.Net.Index;
Using Lucene.Net.Store;
Using System;
Using System.Collections;
Using System.Collections.Generic;
Using System.IO;
Using System.Linq;
Using System.Text;
Using System.Threading.Tasks;
Namespace Lucenetest
{
public static Class CreateIndex
{
public static string Indexpath = system.configuration.configurationmanager.appsettings["Indexpath"];
public static string logpath = system.configuration.configurationmanager.appsettings["LogPath"];
Static ArrayList GetAll (DirectoryInfo dir, ArrayList list)/files in the Search Folder
{
fileinfo[] Allfile = dir. GetFiles ();
foreach (FileInfo fi in Allfile)
{
List. Add (FI);
}
directoryinfo[] Alldir = dir. GetDirectories ();
foreach (DirectoryInfo d in Alldir)
{
GetAll (d, list);
}
return list;
}
public static void Main ()
{
DirectoryInfo logdir = new DirectoryInfo (logpath);
if (! System.IO.Directory.Exists (Indexpath))
{
System.IO.Directory.CreateDirectory (Indexpath);
}
ArrayList flst = new ArrayList ();
GetAll (LogDir, flst);
foreach (FileInfo file in Flst)
{
Console.WriteLine ("Start" + file. FullName + "analysis." + "\ n");
Crudindex (file. FullName);
Console.WriteLine (completed) + file. FullName + "Analysis. Quantity:" + "\ n");
}
}
<summary>
Update Index Library Operations
</summary>
private static void Crudindex (string path)
{
if (file.exists (path) = = False)
{
Console.WriteLine (path + "Log file does not exist" + DateTime.Now.ToString ());
Return
}
list<string> returnList2 = new list<string> (); TODO: Remove
DateTime DtB = DateTime.Now;
Console.WriteLine (dtb.tostring ("Yyyy-mm-dd HH:mm:ss") + "start reading" + Path + "...");
StringBuilder sb = new StringBuilder ();
StreamReader sr = new StreamReader (path, Encoding.UTF8);
String Line;
list<string> Arrno = new list<string> ();
Fsdirectory directory = Fsdirectory.open (new DirectoryInfo (Indexpath), New Nativefslockfactory ());
BOOL Isexist = indexreader.indexexists (directory);
if (isexist)
{
if (indexwriter.islocked (directory))
{
Indexwriter.unlock (directory);
}
}
IndexWriter writer = new IndexWriter (directory, new Panguanalyzer (),!isexist, IndexWriter.MaxFieldLength.UNLIMITED);
while (line = Sr. ReadLine ())!= null)
{
if (path. IndexOf ("Log_tags") > 0)//Label
{
Tags _temp = newtonsoft.json.jsonconvert.deserializeobject<tags> (line);
Document document = new document ();
Document. ADD (New Field ("GID", _temp.gid.tostring (), Field.Store.YES, Field.Index.ANALYZED));
Document. ADD (New Field ("Andid", _temp.andid, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Document. ADD (New Field ("Dpidsha1", _temp.dpidsha1, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Document. ADD (New Field ("Flag", _temp.flag.tostring (), Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Document. ADD (New Field ("Macsha1", _temp.macsha1, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Document. ADD (New Field ("Mac", _temp.mac, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Document. ADD (New Field ("Make", _temp.make, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Document. ADD (New Field ("MD", _temp.md, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Document. ADD (New Field ("content", line, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Writer. Adddocument (document);
}
else if path. IndexOf ("Log_base") > 0)//Label
{
Baseinfo _temp = newtonsoft.json.jsonconvert.deserializeobject<baseinfo> (line);
Document document = new document ();
Document. ADD (New Field ("GID", _temp.gid.tostring (), Field.Store.YES, Field.Index.ANALYZED));
Document. ADD (New Field ("IP", _temp.ip.tostring (), Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Document. ADD (New Field ("Lat", _temp.lat.tostring (), Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Document. ADD (New Field ("Lon", _temp.lon.tostring (), Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
if (_temp.eventtime!= null) {
Document. ADD (New Field ("Eventtime", _temp.eventtime, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
}
Document. ADD (New Field ("AppName", _temp.appname, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Document. ADD (New Field ("content", line, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
Writer. Adddocument (document);
}
}
Writer. Dispose ();
Directory. Dispose ();
}
}
}

configuration file, configuration directory:

<?xml version= "1.0" encoding= "Utf-8"?>
<configuration>
<appSettings>
<add key= "Indexpath" value= "D:\web\dspindex"/>
<add key= "LogPath" value= "D:\statlog\aynclog"/>
</appSettings>
<startup>
<supportedruntime version= "v4.0" sku= ". netframework,version=v4.5.2 "/>
</startup>
</configuration>

3. External Web services:

Provide query keywords, indexing query;

Using Lucene.Net.Analysis;
Using Lucene.Net.Analysis.Standard;
Using Lucene.Net.Documents;
Using Lucene.Net.Index;
Using Lucene.Net.QueryParsers;
Using Lucene.Net.Search;
Using Lucene.Net.Store;
Using Pangu;
Using System;
Using System.Collections.Generic;
Using System.Diagnostics;
Using System.IO;
Using System.Linq;
Using System.Text;
Using System.Web;
Using System.Web.Services;
<summary>
Abstract description of S
</summary>
[WebService (Namespace = "http://tempuri.org/")]
[WebServiceBinding (ConformsTo = wsiprofiles.basicprofile1_1)]
To allow the use of ASP.net AJAX to invoke this Web service from a script, uncomment the line.
[System.Web.Script.Services.ScriptService]
public class S:system.web.services.webservice
{
public S ()
{
If you use <a href= "/catalog.asp?cate=1" class= "Keylink" title= "Design" target= "_blank" > Design </a> components, uncomment the line
InitializeComponent ();
}
public static string Indexpath = system.configuration.configurationmanager.appsettings["Indexpath"];
[WebMethod]
public string search (string keyword)
{

if (keyword!= null && keyword!= "")
{
var watch = Stopwatch.startnew ();
Analyzer analyzer = NULL;
Analyzer = new StandardAnalyzer (Lucene.Net.Util.Version.LUCENE_29);
Search
Indexsearcher searcher = new Indexsearcher (Fsdirectory.open (New DirectoryInfo (Indexpath)), true);
string[] fields = {"Gid", "dpidsha1", "content"};
Query expression
Multifieldqueryparser Queryp = new Multifieldqueryparser (Lucene.Net.Util.Version.LUCENE_30, fields, analyzer);
Query.parse: Injection Query condition
Query query = queryp.parse (keyword);
var hits = searcher. Search (query, 200);
Pangu Create highlighter
PanGu.HighLight.SimpleHTMLFormatter Simplehtmlformatter =
New PanGu.HighLight.SimpleHTMLFormatter ("<span style=\" font-weight:bold;color:red;\ ">", "</span>");
PanGu.HighLight.Highlighter highlighter =
New PanGu.HighLight.Highlighter (Simplehtmlformatter,
New Segment ());
Highlighter. Fragmentsize = 50;
StringBuilder sb = new StringBuilder ();
for (int i = 0; I < hits. Totalhits; i++)
{
Document doc = searcher. Doc (hits. Scoredocs[i]. DOC);
Sb. Append (Doc. Get ("content") + "/r/n/<br/>");
Tokenstream stream = Analyzer. Tokenstream ("Goods_name"), New StringReader (Doc. Get ("Goods_name")));
String sample = highlighter. Getbestfragment (Stream, Doc. Get ("Goods_name"), 2, "...");
}
Watch. Stop ();
Return SB. ToString ();
}
Else
{
Return "";
}
}
}
The present is a simple example. Follow-up will continue to release in-depth demo.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.