OSC search engine Framework SEARCH-FRAMEWORK,TNGOUDB,GSO,

Source: Internet
Author: User
Tags solr connection reset git clone

Project Objective: Oschina a simple package framework for full-text search

License:public Domain

Content included:

    • Rebuild Index Tool, Indexrebuilder.java

    • Incremental build Index tool, Indexupdater.java

    • Full-Text Search framework

Http://git.oschina.net/oschina/search-framework



Tngoudb background

Tngoudb is a Chinese search engine database developed by Tengu Network (tngou.net) for agricultural search engine of tengu agricultural network. Tengu hopes to build Tngoudb into a dedicated Chinese indexed NoSQL database based on the power of open source.

Brief introduction
Tngoudb is a Java-based cross-platform database that uses Lucene (storage engine), IK (word breaker), Netty (communication), etc. to create a network database.

Tngoudb directly simplifies the invocation of Lucene's related API, using SQL statements to implement CRUD operations of data.

Characteristics
Tngoudb can be separated from the Lucene stand-alone now, through the network can be TNGOUDB deployed on a separate server, processed separately stored in the query business. Tngoudb with

When simplifying the complexity of SOLR, users can perform related data manipulation through simple SQL statements. Tngoudb can completely throw away the Lucene knowledge associated with SOLR and can be implemented with common SQL statements.

Document

Document Address: HTTP://WWW.TNGOU.NET/DOC/TNDB supports complete installation, configuration, and use of documentation.

Use case
Now TNGOUDB is the internal test version, please do not use for online projects! We will continue to develop and update, the late release of the corresponding official version.

Tngoudb is now used in the search business of Tengu net Tengu (Http://www.tngou.net/search)

Http://git.oschina.net/397713572/TngouDB



This project is the complete source code of the Beijing University search engine TSE (including index and crawler two separate project source), TSE for "search engine-principle, technology and system" introduction of the realization of the prototype, interested friends can refer to the book to learn TSE.

"Search engine-principle, technology and system" to provide source code http://sewm.pku.edu.cn/book/
Often can not access, here I will be the previous download learning to add the details of the source code to open, not only the source of the comments, there is a detailed study Notes--CSDN blog column address: http://blog.csdn.net/column/details/ Inside-tse.html, I hope to have some help for the beginner's friends.

Catalogue Description:

Tse081227--tse's Web Collection subsystem (crawler).

Index--tse preprocessing and querying service subsystem, the directory is very large, in fact, not because the source code is large, but because the index/data/tianwang.raw.2559638448 is very large, the file is crawling the original Web page data.

In addition, the original index/data/tianwang.raw.2559638448 file has more than 300 megabytes, upload the hint exceeded the maximum limit of git.oschina.net/file (100M), so the contents of the file deleted a lot, in order to get smaller files, This has no effect on the operation of the entire system, as it simply crawls the original Web page data, which can be much less.

Http://git.oschina.net/lewsn2008/LBTSE




GSO (Google so)

This is a Google search service written in node. js, the principle is to take the user's keywords to Google server search, and then respond to the results returned to the user. Google search agent written using Nodejs

View Demo Project home page

Description of the Certificate: The certificate provided in the file list is used only for testing and is replaced with your own certificate in the production environment

Deployment installation:
git clone https://git.oschina.net/lenbo/gso.gitcd gsonpm install--production
Run command: Test/debug:

npm startOrnode ./bin/run

Production environment
    • Start with Forever:
      forever start -e err.log -o output.log ./bin/run

    • Start with PM2:
      pm2 start ./bin/run -i max

Custom settings Site name

After setting the site name, it will be displayed in the browser title bar under the homepage logo. Modify the Conf/config.js file, locate the name node, and modify it to its own site name:

Name: ' Valley Search '
Statistics script

Paste the script into the Views/partials/statistics.ejs file

Homepage random Text

Paste text into Data/words.txt, with each sentence separated by a blank line, supporting HTML code

Set up multiple Google IPs to prevent blocking

Place the available IP into the Conf/ip.txt file, with each IP separated by a carriage return line break.

Setting up an HTTP proxy server

Sometimes, we may need to set up a proxy server, such as when Google's IP expires temporarily unavailable or blocked by Google. Modify the Conf/config.js file to locate the proxy node:

Proxy: {enable:false,//Set whether timeout:5000 is enabled,//set timeout, enable True when active Host: ',//proxy server address port: 80//Proxy server Port}
Static file compression

The code after clone is uncompressed and can be compressed using the grunt tool.

Compress js,css Files
    1. To install the Grunt tool:npm install -g grunt-cli

    2. Executing commands in the project root directory grunt static

    3. Modify the R_prefix value in Conf/config.js to/public

Note: Installation dependencies must be used before executing the grunt command npm install , notnpm install --production

HTML code compression

Set it up before NODE_ENV production you start the service, such asNODE_ENV=production forever start bin/run

Complete record
    1. Added "related Search" function;

    2. OpenSearch, support Ie,firefox,chrome set as default search engine;

    3. Simple sensitive word detection, otherwise the connection will be the wall/connection reset;

    4. HTML code compression, based on the Html-minifier module compression has been rendered good HTML code;

    5. Headroom function (the search area disappears when the page scrolls down, and the search area appears again when the page scrolls up.) Personally feel this experience for small screen notebook and pad is better, especially mobile phone terminal);

    6. Implement HTTPS function (keyword encryption);

    7. Use Cheeio instead of jquery parsing;

    8. The input box is completed automatically;

    9. Search content language switch;

    10. Filter results based on time period;

    11. When searching with the filetype directive, the result item prefix displays filetype;

    12. Support for setting up multiple Google IPs (2014-12-25);

    13. Increased HTTP proxy functionality (2014-12-28);

Todo
    1. [] pad display optimization, font optimization;

    2. [] Optimize the use of mobile phone-side experience;

    3. [] Support keyboard shortcuts;

    4. [] Support Wikipedia search;

    5. [] Optimization error logging;

    6. [] supports video meta-information retrieval (simultaneous retrieval of playable sources)

    7. [] Increase the online proxy function (some blocked websites appearing in proxy search results);

Http://git.oschina.net/lenbo/gso




Code was written a year ago, so the crawler may have failed, but on this basis to change should be OK.

K:\git\dianying\scripts>tree/f folder PATH List volume serial number is Ee77-ec45k:.│iqiyi_movie_test.py│letv_movie_test.py│m1905_movie _test.py│pps_movie_test.py│pptv_movie_test.py│qq_movie_test.py│sohu_movie_test.py│tudou_movie_test.py│xunlei_m Ovie_test.py│youku_movie_test.py│└─douban doubanapi_1.py doubanapi_2.py doubanapi_3.py Douba napi_xj.py douban_movie_test.py
Search Sites

Dianying_web.py supports hundreds of thousands of of records that are saved to MongoDB by the crawler in the form of a web-based display and supports keyword queries.

Http://git.oschina.net/awakenjoys/dianying


OSC search engine Framework SEARCH-FRAMEWORK,TNGOUDB,GSO,

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.