The last article briefly introduces the application of lucid, And now analyzes the LWE-CORE
Lucidworks enterprise can be used for free, but not open source, understand the structure of this SOLR-based application is of great help, lucidworks enterprise SOLR-based application called LWE-CORE
Last full text search (SOLR) front-end application analysis http://www.cnblogs.com/2018/archive/2011/07/29/2121519.html
Start and Stop
In start. bat
Start "lucidworks LWE-core"/B "% JVM %" % java_opts % app_opts % java_memory_opts % jetty_opts % misc_opts % misc_opts2 %-djetty. home = jetty-jar jetty/start. jar 1>
./Logs/core-stdout.log 2>./logs/core-stderr.log
This is to use Jetty to start the WebProgramBy default, SOLR is provided on port 8888, which provides:
Interfaces Based on SOLR specifications, such as http: // 127.0.0.1: 8888/SOLR/collection1/select /? Q = nickchase. This interface is defined in the SOLR document in detail. In lucid, lucid query parser replaces the original SOLR parsing to demonstrate better results and applications.
Ø SOLR Web Interface: http: // localhost: 8888/SOLR/
Lucid encapsulates enhanced series of rest service APIs, such as http: // localhost: 8888/API/collections/collection1/CES. The most important role of rest APIS is to control and monitor data sources and indexes.
Stopping a program is to close jetty.
SOLR web program analysis
Under LWE-core in the installation directory:
LWE-core \ doc: the content of the lucid document.
LWE-core \ SOLR: a JSP program provided by SOLR. It can be used on the Web interface.
LWE-core \ WEB-INF \: basic file for the website, where Lib \ lucidworks-1.8-1127.jar implements extensions, including restapi
In web. XML, the rest [Google-guice: lightweight IOC container developed by Bob Lee of Google] is implemented through COM. lucid. servlet. lweservletmodule.
Related SOLR index settings and Data
SOLR \ cores \ collection ** is the index and setting area of different data.
Conf: solrconfig. XML Schema. xml fieldtypes. XML is the configuration file used by SOLR. The specific syntax is consistent with that of SOLR.
Data: Index Area of a specific data source
Other main associated files under LWE-core \ WEB-INF \ Lib:
Metadata Extraction
Aperture-core aperture-tools-demork
Http://aperture.sourceforge.net/aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (e.g. file Systems, web sites, mail boxes) and the file formats (e.g. events, images) occurring in these systems.
Obtain Tika from multiple data formats
Tika is a collection of content extraction tools (a toolkit for text extracting ). It integrates poi and product_box and provides a unified interface for text extraction.
PDF-via product_box
MS-*-through poi
HTML-use nekohtml to organize nonstandard HTML into XHTML
OpenOffice format-provided by Tika
Archive-zip, tar, Gzip, and bzip
Supported by RTF-Tika
Java class-class Parsing is completed by ASM
Image: Only metadata of images can be extracted.
XML
Processing packages in different formats:
Pdfbox-1.1.0 fontbox-1.1.0
Poi-3.7-20100617171931
Poi-ooxml-3.7-20100617171931 poi-ooxml-schemas-3.7-20100617171931 poi-scratchpad-3.7-20100617171931
Htmlparser-1.6.bundle.jar
Metadata-extractor-2.4.0-beta1.bundle
Useful tools for determining text file encoding
Juniversalchardet
Quartz-1, 1.8.4
Quartz is an open source project that provides a wide range of Job Scheduling sets.
View data index Luke
Luke \ Luke. bat
For indexed data, you can use this tool to view the index information, including various information. It is also an open-source tool.
Summary
With the above information, we implement a similar solution as follows:
1, directly use the LWE-CORE, so that the Service to achieve custom interface is a customized system; or directly use this solution, in this way, both the interface and service have [of course the scope constraints of this software need to be observed]
2. Refer to the above method to implement a new SOLR-based application. After all, many applications do not need to process so many file-type indexes.