Analysis on the LWE-CORE of the full text search (SOLR) front-end application

Source: Internet
Author: User
Tags solr lucidworks

The last article briefly introduces the application of lucid, And now analyzes the LWE-CORE

Lucidworks enterprise can be used for free, but not open source, understand the structure of this SOLR-based application is of great help, lucidworks enterprise SOLR-based application called LWE-CORE

Last full text search (SOLR) front-end application analysis http://www.cnblogs.com/2018/archive/2011/07/29/2121519.html

Start and Stop

In start. bat

Start "lucidworks LWE-core"/B "% JVM %" % java_opts % app_opts % java_memory_opts % jetty_opts % misc_opts % misc_opts2 %-djetty. home = jetty-jar jetty/start. jar 1>

./Logs/core-stdout.log 2>./logs/core-stderr.log

This is to use Jetty to start the WebProgramBy default, SOLR is provided on port 8888, which provides:

Interfaces Based on SOLR specifications, such as http: // 127.0.0.1: 8888/SOLR/collection1/select /? Q = nickchase. This interface is defined in the SOLR document in detail. In lucid, lucid query parser replaces the original SOLR parsing to demonstrate better results and applications.

Ø SOLR Web Interface: http: // localhost: 8888/SOLR/

Lucid encapsulates enhanced series of rest service APIs, such as http: // localhost: 8888/API/collections/collection1/CES. The most important role of rest APIS is to control and monitor data sources and indexes.

Stopping a program is to close jetty.

SOLR web program analysis

Under LWE-core in the installation directory:

LWE-core \ doc: the content of the lucid document.

LWE-core \ SOLR: a JSP program provided by SOLR. It can be used on the Web interface.

LWE-core \ WEB-INF \: basic file for the website, where Lib \ lucidworks-1.8-1127.jar implements extensions, including restapi

In web. XML, the rest [Google-guice: lightweight IOC container developed by Bob Lee of Google] is implemented through COM. lucid. servlet. lweservletmodule.

Related SOLR index settings and Data

SOLR \ cores \ collection ** is the index and setting area of different data.

Conf: solrconfig. XML Schema. xml fieldtypes. XML is the configuration file used by SOLR. The specific syntax is consistent with that of SOLR.

Data: Index Area of a specific data source

 

Other main associated files under LWE-core \ WEB-INF \ Lib:

Metadata Extraction

Aperture-core aperture-tools-demork

Http://aperture.sourceforge.net/aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (e.g. file Systems, web sites, mail boxes) and the file formats (e.g. events, images) occurring in these systems.

Obtain Tika from multiple data formats

Tika is a collection of content extraction tools (a toolkit for text extracting ). It integrates poi and product_box and provides a unified interface for text extraction.

PDF-via product_box

MS-*-through poi

HTML-use nekohtml to organize nonstandard HTML into XHTML

OpenOffice format-provided by Tika

Archive-zip, tar, Gzip, and bzip

Supported by RTF-Tika

Java class-class Parsing is completed by ASM

Image: Only metadata of images can be extracted.

XML

Processing packages in different formats:

Pdfbox-1.1.0 fontbox-1.1.0

Poi-3.7-20100617171931

Poi-ooxml-3.7-20100617171931 poi-ooxml-schemas-3.7-20100617171931 poi-scratchpad-3.7-20100617171931

Htmlparser-1.6.bundle.jar

Metadata-extractor-2.4.0-beta1.bundle

Useful tools for determining text file encoding

Juniversalchardet

Quartz-1, 1.8.4

Quartz is an open source project that provides a wide range of Job Scheduling sets.

View data index Luke

Luke \ Luke. bat

For indexed data, you can use this tool to view the index information, including various information. It is also an open-source tool.

Summary

With the above information, we implement a similar solution as follows:

1, directly use the LWE-CORE, so that the Service to achieve custom interface is a customized system; or directly use this solution, in this way, both the interface and service have [of course the scope constraints of this software need to be observed]

2. Refer to the above method to implement a new SOLR-based application. After all, many applications do not need to process so many file-type indexes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.