20 open-source search engine systems

Source: Internet
Author: User
Tags apache solr solr

Some open-source search engine systems are introduced, including open-source Web search engines and open-source desktop search engines.

Sphider

Sphider is a lightweight web spider and search engine developed using PHP. It uses MySQL to store data. You can use it to add a search function for your website. Sphider is very small and easy to install and modify. It has been used by thousands of sites.

Risearch PHP

Risearch PHP is an efficient and powerful search engine, especially for small and medium websites. Risearch PHP is very fast. It can search-pages in less than 1 second. Risearch is an index search engine, which means that it first indexes your website and creates a data database to store keywords on all pages of your website for quick search. Risearch is a full-text search engine script that organizes all the keywords into a document index, except for the keywords excluded from the definition in the configuration file. Risearch uses a classic reverse Index algorithm (the same as a large search engine), which is why it is faster than other search engines.

PHPDig

PHPDig is a web crawler and search engine developed using PHP. Create a vocabulary by indexing dynamic and static pages. When you search for a query, it displays the search results page containing keywords according to certain sorting rules. PHPDig contains a template system that can index PDF, Word, Excel, and PowerPoint documents. PHPDig is suitable for personalized search engines with Higher Specialization and deeper layers. It is the best choice to create vertical search engines for a certain field.

Openwebspider

Openwebspider is an open-source multi-threaded web spider (ROBOT: crawler) and a search engine that contains many interesting functions.

Egothor

Egothor is an open-source and efficient full-text search engine written in Java. With the cross-platform features of Java, egothor can be applied to applications in any environment. It can be configured as a separate search engine and used for full-text search.

Nutch

Nutch is a search engine implemented by open-source Java. It provides all the tools we need to run our own search engine. Including full-text search and web crawler.

Lucene

Apache Lucene is a full-text search engine based on Java. It can be used to easily add full-text search functions to Java software. Lucene's most important task is to index every word in a file. The search efficiency is greatly improved by indexing. lucen provides a set of interpretations, filters, and analyzes files, to orchestrate and use an index API, apart from being efficient and simple, it is the most important thing to enable users to customize their functions at any time.

Oxygen

Is a Java-only Web search engine.

Bddbot

Bddbot is a simple and easy to understand and use search engine. The crawler crawls the URLs listed in a file (urls.txt) and stores the results in a database. It also supports a simple web server that accepts queries from the browser and returns response results. It can be easily integrated into your web site.

Zilverline

Zilverline is a search engine that searches for content on a local hard disk or Intranet through the Web. Zilverline can capture their contents from PDF, Word, Excel, PowerPoint, RTF, txt, Java, CHM, zip, rar, and other documents to create summaries and indexes. You can retrieve the results from the local hard disk or intranet. Zilverline supports multiple languages, including Chinese.

Xqengine

Xqengine is used by the full text search engine for XML documents. Use XQuery as its front-end query language. It allows you to query the XML document set by using the logical combination of keywords. It is similar to Google and other search engines in searching HTML documents. Xqengine is just a compact and embedded component developed in Java.

Mg4j

Mg4j allows you to build a compressed full text index for a large collection of documents by using the interpolative coding technology.

JXTA search

JXTA search is a distributed search system. It is used on point-to-point networks and websites.

Yacy

Yacy is a P2P Distributed Web search engine. It is also an HTTP cache proxy server. This project is a new method for building a P2P web index network. It can search for your own or global indexes, crawl your own web pages, or start distributed crawling.

Red-Piranha

Red-Piranha is an open-source search system that truly "learns" what you are looking. Red-Piranha can be used as a personal search engine for your desktop system (Windows, Linux and MAC), or an enterprise intranet search engine, or as a search function for your website, it can also be used as a P2P search engine, or combined with Wiki as a knowledge/document management solution, or search for RSS aggregate information you want, or your company's systems (including SAP, oracle or any other database/data source), or used to manage PDF, word, and other documents, or as a WebService that provides search information or your applications (Web, swing, SWT, flash, Mozilla-Xul, PHP, Perl or C #/.. Net) provides search backend and so on.

Lius

Lius is an index framework based on the Jakarta Lucene project. Lius added indexing functions for Lucene for many file formats such as: MS Word, MS excel, MS PowerPoint, RTF, PDF, XML, HTML, txt, open office sequence and JavaBeans. indexes for JavaBeans are particularly useful when we want to index databases or when users use persistence layer ORM technologies such as Hibernate, JDO, torque, and toplink for development.

Apache SOLR

SOLR is a high-performance full-text search server developed by Java 5 based on Lucene. Add a document to a search set using XML over HTTP. You can query this set by receiving an XML/JSON response through HTTP. Its main features include: efficient and flexible caching, vertical search, highlighting search results, improving availability through index replication, and providing a set of powerful data schema to define fields, type and set text analysis, and provide a web-based management interface.

Paoding

Paoding Chinese Word Segmentation is a Chinese search engine word segmentation component developed in Java and can be integrated into Lucene applications for the Internet and enterprise intranets. Paoding fills the gaps in open-source components for Chinese Word Segmentation in China, and is committed to becoming the preferred open-source component for Chinese word segmentation for Internet websites. Paoding Chinese Word Segmentation pursues efficient word segmentation and a good user experience.

Carrot2

Carrot2 is an open-source search result classification engine. It automatically organizes the search results into topic categories. One architecture provided by carrot2 can obtain search results from various search engines (yahooapi, googleapi, MSN Search API, ls Meta Search, Alexa web search, Pubmed, opensearch, Lucene index, and SOLR.

Regain

Regain is a desktop search engine system similar to a Web search engine. The difference is that regain does not search for Internet content, but for its own documents or files, with regain, you can easily search for large amounts of data (multiple GB) in a few seconds. Regain adopts Lucene's search syntax. Therefore, it supports multiple query methods, multi-index search, and file-based advanced search. It also supports URL rewriting and file-to-HTTP bridging, it also provides better support for Chinese characters.

Regain provides two editions: desktop search and server search. Desktop Search provides quick search for documents on common desktop computers and web pages in LAN environments. The server version is mainly installed on Web servers to search for file servers in websites and LAN environments.

Source: Open-open

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.