Architecture and installation of sphtracing + MySQL full-text retrieval

Source: Internet
Author: User

Preface:

This article describes a full-text retrieval (search engine) architecture of tens of millions of data records verified by the production environment. This document only lists the excerpts from the previous chapters and does not provide the full text.

Tested in DELL PowerEdge 6850 servers (four 64-bit Inter Xeon MP 7110N processors/8 GB memory), RedHat AS4 Linux operating system, MySQL 5.1.26, MyISAM storage engine, key_buffer = m environment, data volume of 10 million records in a single table (this MySQL table has more than 10 fields of the int, datetime, varchar, text, and other types, with only the primary key and no other indexes ), the primary key is used as the WHERE condition for SQL queries. The query speed is very fast and only takes 0.01 seconds.

Sphinx, an open-source full-text search engine from Russia, allows a single index to contain a maximum of 0.1 billion records, and the query speed is 0. x seconds (in milliseconds) in the case of 10 million records ). The index creation speed of sphenders is: 3 to 3 for creating an index with 1 million records ~ In four minutes, you can create an index of 10 million records within 50 minutes, and only the incremental Index containing the latest 0.1 million records takes dozens of seconds to recreate the index.

Based on the above points, I have designed this search engine architecture. It has been running in the production environment for a week, and the effect is very good. If I have time, I will develop a MySQL storage engine plug-in with simple logic, fast speed, low memory usage, and non-Table lock to replace the MyISAM engine, to solve the lock table Delay Problem of MyISAM storage engine during frequent update operations. In addition, distributed search technology has no problems.

I. Search Engine Architecture Design:

1. Search Engine frame diagram:

2. Search Engine architecture design ideas:

(1) The call method is simplified:

To facilitate front-end Web engineers, you only need a simple SQL statement "SELECT... FROM myisam_table JOIN sphinx_table ON (sphinx_table.sphinx_id = myisam_table.id) WHERE query = '... ';' to achieve efficient search.

(2) fast indexing and query:

① Sphsf-search is a high-performance full-text Search software package developed by Russian Andrew Aksyonoff and released under the two-License Agreement of GPL and commercial agreement.

Sphtracing features:

  • Sphinx supports high-speed index creation (up to 10 Mb/s, while Lucene's index creation speed is 1.8 Mb/s)
  • High-performance search (search for 2-4 GB text, and get results within an average of 0.1 seconds)
  • High scalability (up to 0.1 billion GB of text can be indexed in actual measurement, and a single index can contain records)
  • Supports distributed search
  • Supports phrase-based and statistical-based compound result sorting mechanisms
  • Supports any number of file fields (numeric or full-text search)
  • Different search modes are supported ("exact match", "phrase match", and "any match ")
  • Mysql storage engine

② Through the tests of the Foreign High Performance MySQL expert group, we can see that the query based on the primary key is similar to "SELECT... FROM... WHERE id =... "SQL statement (where the id is the PRIMARY KEY), can process more than 10000 queries per second, while ordinary SELECT queries can only process dozens to hundreds of times per second:

③ Sphinx is not responsible for text field storage. Assume that the id, date, title, and body fields of the database are used to create search indexes. Sphinx is queried Based on the keyword, time, category, range, and other information. sphinx only tells us non-text information such as the ID number of the query result. To display the title, body, and other information, you also need to query the MySQL database based on the ID number, or obtain it from other storage such as Memcachedb. Installing SphinxSE as the storage engine of MySQL is a convenient method to combine MySQL with Sphinx.

Create a sphsf-type table and perform a JOIN query on the primary key ID of the MyISAM table and the ID of the sphsf-table. In this way, for MyISAM tables, only one WHERE id =... the WHERE condition is handed over to sphbatch for primary key query. This gives full play to the advantages of the two and enables High-Speed Search and query.

  • Three pages in total:
  • Previous Page
  • 1
  • 2
  • 3
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.