OSC search engine Framework SEARCH-FRAMEWORK,TNGOUDB,GSO,

Last Update:2016-01-22 Source: Internet

Author: User

Tags solr connection reset git clone

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Project Objective: Oschina a simple package framework for full-text search

License:public Domain

Content included:

Rebuild Index Tool, Indexrebuilder.java
Incremental build Index tool, Indexupdater.java
Full-Text Search framework

Http://git.oschina.net/oschina/search-framework

Tngoudb background

Tngoudb is a Chinese search engine database developed by Tengu Network (tngou.net) for agricultural search engine of tengu agricultural network. Tengu hopes to build Tngoudb into a dedicated Chinese indexed NoSQL database based on the power of open source.

Brief introduction

Tngoudb is a Java-based cross-platform database that uses Lucene (storage engine), IK (word breaker), Netty (communication), etc. to create a network database.

Tngoudb directly simplifies the invocation of Lucene's related API, using SQL statements to implement CRUD operations of data.

Characteristics

Tngoudb can be separated from the Lucene stand-alone now, through the network can be TNGOUDB deployed on a separate server, processed separately stored in the query business. Tngoudb with

When simplifying the complexity of SOLR, users can perform related data manipulation through simple SQL statements. Tngoudb can completely throw away the Lucene knowledge associated with SOLR and can be implemented with common SQL statements.

Document

Document Address: HTTP://WWW.TNGOU.NET/DOC/TNDB supports complete installation, configuration, and use of documentation.

Use case

Now TNGOUDB is the internal test version, please do not use for online projects! We will continue to develop and update, the late release of the corresponding official version.

Tngoudb is now used in the search business of Tengu net Tengu (Http://www.tngou.net/search)

Http://git.oschina.net/397713572/TngouDB

This project is the complete source code of the Beijing University search engine TSE (including index and crawler two separate project source), TSE for "search engine-principle, technology and system" introduction of the realization of the prototype, interested friends can refer to the book to learn TSE.

"Search engine-principle, technology and system" to provide source code http://sewm.pku.edu.cn/book/
Often can not access, here I will be the previous download learning to add the details of the source code to open, not only the source of the comments, there is a detailed study Notes--CSDN blog column address: http://blog.csdn.net/column/details/ Inside-tse.html, I hope to have some help for the beginner's friends.

Catalogue Description:

Tse081227--tse's Web Collection subsystem (crawler).

Index--tse preprocessing and querying service subsystem, the directory is very large, in fact, not because the source code is large, but because the index/data/tianwang.raw.2559638448 is very large, the file is crawling the original Web page data.

In addition, the original index/data/tianwang.raw.2559638448 file has more than 300 megabytes, upload the hint exceeded the maximum limit of git.oschina.net/file (100M), so the contents of the file deleted a lot, in order to get smaller files, This has no effect on the operation of the entire system, as it simply crawls the original Web page data, which can be much less.

Http://git.oschina.net/lewsn2008/LBTSE

GSO (Google so)

This is a Google search service written in node. js, the principle is to take the user's keywords to Google server search, and then respond to the results returned to the user. Google search agent written using Nodejs

View Demo Project home page

Description of the Certificate: The certificate provided in the file list is used only for testing and is replaced with your own certificate in the production environment

Deployment installation:

git clone https://git.oschina.net/lenbo/gso.gitcd gsonpm install--production

Run command: Test/debug:

npm startOrnode ./bin/run

Production environment

Start with Forever:
forever start -e err.log -o output.log ./bin/run
Start with PM2:
pm2 start ./bin/run -i max

Custom settings Site name

After setting the site name, it will be displayed in the browser title bar under the homepage logo. Modify the Conf/config.js file, locate the name node, and modify it to its own site name:

Name: ' Valley Search '

Statistics script

Paste the script into the Views/partials/statistics.ejs file

Homepage random Text

Paste text into Data/words.txt, with each sentence separated by a blank line, supporting HTML code

Set up multiple Google IPs to prevent blocking

Place the available IP into the Conf/ip.txt file, with each IP separated by a carriage return line break.

Setting up an HTTP proxy server

Sometimes, we may need to set up a proxy server, such as when Google's IP expires temporarily unavailable or blocked by Google. Modify the Conf/config.js file to locate the proxy node:

Proxy: {enable:false,//Set whether timeout:5000 is enabled,//set timeout, enable True when active Host: ',//proxy server address port: 80//Proxy server Port}

Static file compression

The code after clone is uncompressed and can be compressed using the grunt tool.

Compress js,css Files

To install the Grunt tool:npm install -g grunt-cli
Executing commands in the project root directory grunt static
Modify the R_prefix value in Conf/config.js to/public

Note: Installation dependencies must be used before executing the grunt command npm install , notnpm install --production

HTML code compression

Set it up before NODE_ENV production you start the service, such asNODE_ENV=production forever start bin/run

Complete record

Added "related Search" function;
OpenSearch, support Ie,firefox,chrome set as default search engine;
Simple sensitive word detection, otherwise the connection will be the wall/connection reset;
HTML code compression, based on the Html-minifier module compression has been rendered good HTML code;
Headroom function (the search area disappears when the page scrolls down, and the search area appears again when the page scrolls up.) Personally feel this experience for small screen notebook and pad is better, especially mobile phone terminal);
Implement HTTPS function (keyword encryption);
Use Cheeio instead of jquery parsing;
The input box is completed automatically;
Search content language switch;
Filter results based on time period;
When searching with the filetype directive, the result item prefix displays filetype;
Support for setting up multiple Google IPs (2014-12-25);
Increased HTTP proxy functionality (2014-12-28);

Todo

[] pad display optimization, font optimization;
[] Optimize the use of mobile phone-side experience;
[] Support keyboard shortcuts;
[] Support Wikipedia search;
[] Optimization error logging;
[] supports video meta-information retrieval (simultaneous retrieval of playable sources)
[] Increase the online proxy function (some blocked websites appearing in proxy search results);

Http://git.oschina.net/lenbo/gso

Code was written a year ago, so the crawler may have failed, but on this basis to change should be OK.

K:\git\dianying\scripts>tree/f folder PATH List volume serial number is Ee77-ec45k:.│iqiyi_movie_test.py│letv_movie_test.py│m1905_movie _test.py│pps_movie_test.py│pptv_movie_test.py│qq_movie_test.py│sohu_movie_test.py│tudou_movie_test.py│xunlei_m Ovie_test.py│youku_movie_test.py│└─douban doubanapi_1.py doubanapi_2.py doubanapi_3.py Douba napi_xj.py douban_movie_test.py

Search Sites

Dianying_web.py supports hundreds of thousands of of records that are saved to MongoDB by the crawler in the form of a web-based display and supports keyword queries.

Http://git.oschina.net/awakenjoys/dianying

OSC search engine Framework SEARCH-FRAMEWORK,TNGOUDB,GSO,

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More