I will introduce several search engine books to you. I think the following three books are the best so far to study search engines. We look forward to a better book for readers, I will also introduce it to you. I will introduce you later
For more information about wireless search.
1. Title: Develop your own search engine Lucene 2.0 + heritrix-(with a CD)
Author: Qiu Zhe
[Content Overview]
This book details how to use Lucene for search engine development. By studying this book, you can build an enterprise-level Search Engine website ..
The book is divided into 14 chapters, including search engine and information retrieval basics, Lucene entry instances, Lucene index creation, use Lucene to build search, Lucene sorting, Lucene analyzer, analysis of documents in Word, Excel, and PDF formats, Compass search engine framework, Lucene distributed and Google search API, crawler heritrix, preparation of integrated instances, htmlparser of Integrated instances, comprehensive instance DWR, comprehensive instance Web editing...
This book is the first book in China to use Lucene and heritrix to explain how to build a search engine. Source code Analysis, and strive to enable readers to go deep into their core, expand and develop the corresponding components on their own, exert their imagination, and develop more creative search engine products. This book is applicable to Java Program Readers and other programmers engaged in computer software development can also serve as entry-level books for search engine enthusiasts.
Currently, there are not many books on the market that introduce search engines from the technical level. Even if there are many books, they are mostly in the theoretical stage, rather than the development process of search engines. Therefore, it can be said that this book is the first book in China to detail the search engine development process.
(1) use the latest Lucene 2.0. In the past, major users used version 1.4.3, while the latest Lucene 2.0 has rewritten many APIs, and the internal implementation methods have been greatly optimized. Of this book Code All of them have been debugged in version 2.0. This helps readers understand more about Lucene's new functions.
(2) A complete search engine case is provided. This case has great practical value. It can be applied to actual projects with a slight modification. The market value is more than 30000 RMB!
(3) focuses on solving developers' headaches. The purpose of this book is to guide the project practice, so it does not list the usage of each API, but to conduct an in-depth discussion of common development issues. For example, Chapter 7th of this book is dedicated to solving "word, how to resolve this issue in Excel and PDF files.
(4) The content is novel and avant-garde and practical. This book introduces compass, heritrix, DWR, and htmlparser. In the process of development of search engines, these are very important and practical technologies. I have demonstrated them to readers through my own practices, hoping that readers can broaden their horizons while learning Lucene.
CD features:
A complete search engine case is provided. This case has a strong practical value. It can be applied to actual projects with a slight modification. The market value is more than 30000 RMB !...
2. book name: Search Engine: principle, technology and system
Author: Li Xiaoming
[Content Overview]
This book systematically introduces the working principle, implementation technology and system construction scheme of Internet search engines. The book consists of 13 chapters, from the basic working principle overview to the implementation of a small simple search engine, then, the design points and key technologies of large-scale distributed search engine systems are discussed in detail. Finally, the topic-oriented and personalized Web information services are introduced, and technologies such as automatic Chinese Web page Classification and their applications are described. This book is well-organized from simple to deep. It involves both in-depth theoretical analysis and a large amount of experimental data, which has the double meaning of learning and practical use. This book can be used as a reference book for the teaching of computer science and technology, information management and information system, e-commerce and other majors of graduate students or senior undergraduates.Technical materialsIt is also of great reference value to the scientific and technical personnel engaged in research and application development such as network technology, web site management, digital library, and Web mining.
[Directory information]
Preface
Chapter 1 Introduction
Section 1 concepts of search engines
Section 2 Development History of search engines
Section 3 some Maoming search engines
Chapter 2 working principle and architecture of Web Search Engines
Section 1 Basic Requirements
Section 2 meeting collection
Section 3 preprocessing
Section 4 query services
Section 5 Architecture
Chapter 3 collection of Web Information
Section 1 Introduction
Section 2 web page collection
Section 3 Multi-Channel collection program parallel work
Section 4 how to avoid repeated webpage collection
Section 5 How to first collect important web principles
Collect
Section 7 Summary of this Chapter
3. Lucene in action Chinese edition
Translated by: gospodnetic, O., Hatcher, E., Tan Hong, etc.
Abstract:
This book introduces Lucene, an open source full-text search engine development kit written in Java. It presents the powerful functions embodied by Lucene as an excellent open-source project through simple languages, a large number of graph injection, rich code examples, and a clear structure, lucene is the best open-source Java search engine available on the Internet. There are 10 chapters in the book, which are divided into two parts. Part 1 is the core of Lucene, focuses on the core API introduction of Lucene, and organizes it according to the sequence in which Lucene is integrated into the program. Part 2 is the application of Lucene, through the introduction of the built-in Lucene tool, demonstrate the advanced application of Lucene technology and porting it in various programming languages.
This book can be both a learning material and a reference manual. This book is suitable for readers who are familiar with basic Java programming and want to add powerful search functions to developers in their own applications. This book is also of great reference value to engineers and technicians engaged in search engine work, as well as programmers and programmers who develop various types of software on the Java platform.
Recommended Editing:
This book introduces Lucene, an open source full-text search engine development kit written in Java. It presents the powerful functions embodied by Lucene as an excellent open-source project through simple languages, a large number of graph injection, rich code examples, and a clear structure. There are 10 chapters in the book, which are divided into two parts. Part 1 is the core of Lucene, focuses on the core API introduction of Lucene, and organizes it according to the sequence in which Lucene is integrated into the program. Part 2 is the application of Lucene, through the introduction of the built-in Lucene tool, demonstrate the advanced application of Lucene technology and porting it to various programming languages.
This book can be both a learning material and a reference manual. It is suitable for readers who are familiar with basic Java programming and developers who want to add powerful search functions to their applications. This book is also of great reference value to engineers and technicians engaged in search engine work, as well as programmers and programmers who develop various types of software on the Java platform.
Directory:
Recommendation Sequence
Translator's preface
Preface 1
Preface 2
Thank you
About this book
Part 1 core of Lucene
Chapter 4 Lucene
1.1 Development History of information organization and access
1.2 understand Lucene
1.2.1 What is Lucene
1.2.2 what Lucene can do
1.2.3 Lucene history
1.2.4 use Lucene organizations
1.2.5 Lucene porting: Perl, Python, C ++, net, and Ruby versions
1.3 index and search
1.3.1 what is Index? Why is it so important?
1.3.2 what is search
1.4 Lucene practice: An application instance
1.4.1 create an index
1.4.2 search for an index
1.5 understand the core category of the index process
1.5.1 index writer
1.5.2 direcory
1.5.3 Analyzer
1.5.4 document
1.5.5 Field
1.6 understand the core category of the search process
1.6.1 indexsearcher
1.6.2 term
1.6.3 Query
1.6.4 termquery
1.6.5 hits
1.7 select similar products
1.7.1 information retrieval tool Library
1.7.2 index and search applications
1.7.3 online resources
Conclusion 1.8
Chapter 2 Index
Chapter 1 add search functions for Applications
Chapter 4 Analysis
Chapter 2 Advanced Search Technology
Chapter 2 extended search
Part 1 Lucene Application
Chapter 4 parsing documents in common formats
Chapter 2 Lucene related tools and their extensions
Chapter 2 Lucene Transplantation
Chapter 1 Case Analysis
Appendix A install Lucene
Appendix B Lucene index file format
Appendix C resources