Modern Information Retrieval (2nd)

Source: Internet
Author: User
Tags idf

Modern Information Retrieval (2nd)
Basic Information
Original Title: Modern Information Retrieval: the concepts and technology behind search, Second Edition
By Ricardo Baeza-Yates Berthier Ribeiro-Neto
Translator: Huang Xiaojing Zhang QIQIU xipeng
Series name: Computer Science Series
Press: Machinery Industry Press
ISBN: 9787111385998
Mounting time:
Published on: February 1, October 2012
Start: 16
Page number: 1
Version: 1-1
Category: Computer

For more information, modern information retrieval (original book version 2nd)

Introduction
Books
Computer books
Modern Information Retrieval (formerly known as 2nd) discusses the concepts and technologies of information retrieval, the application of these technologies in search engines, and their impact on related domain knowledge. The main contents include: user Interface Design; classic Information Retrieval Model, result quality evaluation and user feedback; document and query concepts and related technologies; document set indexing and search technologies; web Document crawling, retrieval, and sorting; Structured Text Retrieval, multimedia retrieval, and enterprise search; library systems and digital libraries.
Modern Information Retrieval (2nd edition of the original book) has a wide range of details and is easy to understand, it can be used as teaching materials or reference books for undergraduates and graduate students in information management and information systems, computer science and technology, library science, information science, archives science, etc, it also has a high reference value for actual practitioners engaged in information retrieval, system analysis, and design.
Directory
Modern Information Retrieval (2nd)
Publisher's words
Translator's preface
Preface to version 2nd
Preface to version 1st
2nd Thank you
1st Thank you
Publisher's acknowledgment
Chapter 1 Introduction 1
1.1 Information Retrieval 1
1.1.1 Early Development of Information Retrieval 1
1.1.2 Information Retrieval in libraries and digital libraries 2
1.1.3 information retrieval in the center of the stage 2
1.2 Information Retrieval question 3
1.2.1 user Task 3
1.2.2 Information Retrieval and data retrieval 4
1.3 Information Retrieval System 4
1.3.1 software architecture of information retrieval system 4
1.3.2 retrieval and sorting process 5
1.4web6
1.4.1web brief history 7
1.4.2 e-publishing age 7
1.4.3how does the Web change search? 8
1.4.4actual problems on Web 9
The organizational structure of the 1.5 book 9
1.5.1 key 9 of this book
1.5.2 content of this book 10
1.6 Book Teaching Resource website 12
1.7 literature discussion 12
Chapter 16 user search page 16
2.1 Introduction 16
2.2 how do people search 16
2.2.1 information search and exploratory search 16
2.2.2 classic and dynamic models for information search 17
2.2.3 navigation and search 18
2.2.4 observation of the search process 18
2.3 current search page 19
2.3.1 start search 19
2.3.2 query description 19
2.3.3 query description page 20
2.3.4 search result 22
2.3.5 query reconstruction 24
2.3.6 organize search results 26
2.4 visualization of the search interface 32
2.4.1 visualized Boolean syntax 32
2.4.2 query item 33 in Visualized Query results
2.4.3 relationship between visualized words and documents 36
2.4.4 visualization of Text Mining 38
2.5 design and evaluation of the search interface 40
2.6 trends and research questions 42
2.7 literature discussion 42
Chapter 4 Information Retrieval modeling 44
3.1 Information Retrieval Model 44
3.1.1 modeling and sorting 44
3.1.2 Information Retrieval Model Description 44
3.1.3 information retrieval model classification system 45
3.2 retrieving classic information 47
3.2.1 basic concepts 47
3.2.2 Boolean Model 49
3.2.3 weight 50
3.2.4tf-IDF Weight 52
3.2.5 document length normalization 56
3.2.6 vector model 57
3.2.7 probability model 59
3.2.8 simple comparison between classic models 64
3.3 other set theory models 64
3.3.1 set-based model 64
3.3.2 extended Boolean Model 68
3.3.3 Fuzzy Set Model 70
3.4 other algebra models 72
3.4.1 generalized vector space model 72
3.4.2 potential semantic index model 74
3.4.3 neural network model 75
3.5 Other probability models 76
3.5.1bm25 model 77
3.5.2 language model 78
3.5.3 random Difference Model 83
3.5.4 Bayesian Network Model 85
3.6 Other models 90
3.6.1 hypertext model 90
3.6.2 web-based model 91
3.6.3 Structured Text Retrieval 91
3.6.4 multimedia retrieval 92
3.6.5 enterprise and vertical search 92
3.7 trends and research questions 92
3.8 literature discussion 93
Chapter 4 search evaluation 96
4.1 Introduction 96
4.2cranfield paradigm 97
4.2.1 Brief History 97
4.2.2 reference set 98
4.3 retrieval index 98
4.3.1 accuracy and recall rate 98
4.3.2 single value summary: P @ n, MAP, MRR, f102
4.3.3 user-oriented indicators 105
4.3.4 accumulated discount gain of 106
4.3.5 binary preferences 109
4.3.6 Ranking Correlation Measurement 111
4.4 reference document set 115
4.4.1trec reference set 115
4.4.2 other reference sets 121
4.4.3 other small-scale test document set 121
4.5 user-based rating 122
4.5.1 manual experiments in the laboratory 122
4.5.2 side-by-side panel 122
4.5.3a/B testing 123
4.5.4 crowdsourcing 124
4.5.5 rating of clicking data 125
4.6 practice 126
4.7 trends and research issues 127
4.8 literature discussion 127
Chapter 1 related feedback and query expansion 5th
5.1 Introduction 129
5.2 feedback method framework 129
5.3 explicit feedback 131
5.3.1 vector model feedback: rocchio method 131
5.3.2 probability model-related feedback 133
5.3.3 evaluation of related feedback 134
5.4 click-based explicit feedback 134
5.4.1 eye movement tracking and correlation evaluation 134
5.4.2 user behavior 135
5.4.3 click as user preference indicator 136
5.5 implicit feedback through local analysis 138
5.5.1 implicit feedback through local clustering 138
5.5.2 implicit feedback through local context analysis 140
5.6 implicit feedback through global analysis 141
5.6.1 similarity-based synonymous dictionary-Based Query Extension: 141
5.6.2 query extension 143 based on the statistical synonymous dictionary
5.7 trends and research issues 145
5.8 literature discussion 145
Chapter 4 documents: language and attributes 6th
6.1 Introduction 147
6.2 metadata 148
6.3 document format 149
6.3.1 text 149
6.3.2 multimedia 149
6.3.3 graphics and VR 150
6.4 Markup Language 151
6.4.1sgml151
6.4.2html153
6.4.3xml155
6.4.4rdf157
6.4.5hytime158
6.5 text attributes 159
6.5.1 info 159
6.5.2 Natural Language Modeling 159
6.5.3 Text Similarity 162
6.6 document preprocessing 163
6.6.1 text vocabulary analysis 163
6.6.2 remove forbidden words 164
6.6.3 stem extract 165
6.6.4 keyword selection 166
6.6.5 synonymous dictionary 166
6.7 Organization document 168
6.7.1 Classification System Law 168
6.7.2 crowdsourcing classification 169
6.8 text compression 170
6.8.1 concepts 170
6.8.2 statistical method 171
6.8.3 statistical method: Modeling 171
6.8.4 statistical method: Coding 173
6.8.5 dictionary 179
6.8.6 compression and preprocessing 180
6.8.7 text compression technology comparison 181
6.8.8 Structured Text compression 182
6.9 trends and research issues 183
6.10 literature discussion 185
Chapter 4 query: language and attributes 7th
7.1 Query Language 187
7.1.1 keyword-Based Query 188
7.1.2 non-Keyword query 190
7.1.3 Structured Query 192
7.1.4 query protocol 194
7.2 query attributes 195
7.2.1web query feature 195
7.2.2 user search behavior 197
7.2.3 query intent 197
7.2.4 query subject 199
7.2.5 query sessions and tasks 200
7.2.6 query difficulty 200
7.3 trends and research issues 203
7.4 literature discussion 204
Chapter 2 Text Classification 8th
8.1 Introduction 205
8.2 text classification feature description 206
8.2.1 machine learning 206
8.2.2 text classification 206
8.2.3 text classification algorithm 207
8.3 unsupervised algorithms 208
8.3.1 clustering 208
8.3.2 simple text classification 212
8.4 supervised algorithms 212
8.4.1 decision tree 214
8.4.2k-nn classifier 218
8.4.3rocchi/o classifier 219
8.4.4 probabilistic Naive Bayes document classification 221
8.4.5 Support Vector Machine (SVM) classifier 224
8.4.6 integrated classifier 231
8.4.7 concluding remarks on supervised algorithms 234
8.5 feature selection or Dimensionality Reduction 234
8.5.1-column join table 235 for category
8.5.2 index item Document Frequency: 236
8.5.3tf-IDF weight 236
8.5.4 mutual information 236
8.5.5 information gain 237
8.5.6 chi-square test 237
8.5.7 function of Feature Selection 238
8.6 rating indicators 238
8.6.1 join table 238
8.6.2 accuracy and error rate: 239
8.6.3 accuracy and recall rate 239
8.6.4f measure and f1240
8.6.5 cross-check 241
8.6.6 standard document set 241
8.7 category organizations-building a classification system 242
8.8 trends and research issues 244
8.9 literature discussion 244
Chapter 2 Index and search 9th
9.1 Introduction 247
9.2 inverted index 249
9.2.1 concepts 249
9.2.2 full inverted index 250
9.2.3 search 252
9.2.4 sort 256
9.2.5 build 257
9.2.6 compressed inverted index 260
9.2.7 Structured Query 261
9.3 signature file 262
9.4 suffix tree and suffix array 264
9.4.1 structure: trie tree and suffix tree 265
9.4.2 simple string SEARCH 266
9.4.3 Complex Mode search 267
9.4.4 build 268
9.4.5 the compressed suffix array 270
9.5 sequential search 273
9.5.1 simple string: horspool274
9.5.2 Complex Mode: automatic machine and bit Parallel 276
9.5.3 faster bit parallel algorithm 279
9.5.4 Regular Expression 281
9.5.5 multi-mode 282
9.5.6 approximate search 283
9.5.7 search compressed text 285
9.6 multi-dimensional Index 287
9.7 trends and research issues 288
9.8 literature discussion 289
Chapter 2 parallel and distributed information retrieval 10th
10.1 Introduction 293
10.2 classification of distributed information retrieval systems 294
10.3 Data Division 296
10.3.1 document set division 297
10.3.2 document set: 298
10.3.3 inverted index Division: 299
10.3.4 divide other indexes by 302
10.4 parallel information retrieval 303
10.4.1 description 303
10.4.2 concurrent information retrieval in the MIMD architecture 305
10.4.3 parallel information retrieval in SIMD architecture 306
10.5 cluster-based Information Retrieval 310
10.6 Distributed Information Retrieval 310
10.6.1 introduction 310
10.6.2 index 313
10.6.3 Query Processing 315
10.6.4web problems 320
10.7 joint search 320
10.8 retrieval in peer networks 322
10.9 trends and research issues 325
10.10 literature discussion 326
Chapter 2 Web Retrieval 11th
11.1 Introduction 327
11.2 a challenging question 328
11.3web329
11.3.1 feature 329
11.3.2web graph structure 331
11.3.3 web modeling 332
11.3.4 link analysis 334
11.4 search engine architecture 335
11.4.1 base architecture 335
11.4.2 cluster-based architecture: 336
11.4.3 caching 337
11.4.4 multi-level index 339
11.4.5 distributed architecture 340
11.5 search engine sorting 342
11.5.1 sorting signal 342
11.5.2 link-based sorting 343
11.5.3 simple sorting function 345
11.5.4 sorting learning 345
11.5.5 learning sorting function 346
11.5.6 Quality Evaluation 347
11.5.7web garbage 348
11.6 manage Web Data 348
11.6.1 assign an identifier 348 to the document
11.6.2 meta data 349
11.6.3 compress the Web image 349
11.6.4 process duplicate data 349
11.7 search engine user interaction 350
11.7.1 search for rectangular paradigm 351
11.7.2 search engine result 356
11.7.3 train users 363
11.8 browse 364
11.8.1 flat browsing 364
11.8.2 structure-oriented browsing and web directory 364
11.9 outside browsing 366
11.9.1 Hypertext and web366
11.9.2 combination of search and browsing 366
11.9.3web Query Language 367
11.9.4 367 dynamic search
11.10 related questions 368
11.10.1 computing advertising 368
11.10.2web mining 370
11.10.3 yuan search 371
11.11 trends and research issues 372
11.11.1 372 out of static text data
11.11.2 current challenges 373
11.12 literature discussion 374
Chapter 2 web crawling 12th
12.1 introduction 376
12.2 web crawler applications 377
12.2.1 General Web Search 377
12.2.2 focus on crawling 378
12.2.3web profile 378
12.2.4 image 378
12.2.5 website analysis 379
12.3 crawler classification system 379
12.4 architecture and implementation 380
12.4.1 crawler 380
12.4.2 actual problem 382
12.4.3 parallel crawling 384
12.5 scheduling algorithm 384
12.5.1 Select policy 385
12.5.2 revisit policy 387
12.5.3 friendly policy 391
12.5.4 combination policy 393
12.6 rating 393
12.6.1 rating network usage 393
12.6.2 evaluation of long-term scheduling 394
12.7 trends and research issues 395
12.7.1 crawl "Dark network" 395
12.7.2 crawling 396 with the help of the website
12.7.3 distributed crawling 396
12.8 literature discussion 396
Chapter 2 Structured Text Retrieval 13th
13.1 Introduction 398
13.2 structured capability 399
13.2.1 explicit and implicit structure comparison 399
13.2.2 static and dynamic structure comparison 399
13.2.3 comparison between a single hierarchy and a multi-hierarchy 400
13.3 early text search model 400
13.3.1 model 401 based on the non-covered list
13.3.2 Model Based on adjacent nodes 401
13.3.3 sorting of Structured Text results 402
13.4xml retrieval 403
13.4.1challenges in XML search 403
13.4.2 index policy 404
13.4.3 sorting policy 405
13.4.4 remove overlap 412
13.5xml retrieval rating 413
13.5.1 document set 414
13.5.2 topic 414
13.5.3 retrieve task 415
13.5.4 correlation 416
13.5.5 measurement 417
13.6 Query Language 419
13.6.1 feature 419
13.6.2xml: Query Language classification 420
13.6.3xml query language example 421
13.7 trends and research issues 425
13.8 literature discussion 427
Chapter 2 Multimedia Information Retrieval 14th
14.1 introduction 429
14.1.1 What Is multimedia 429
14.1.2 multimedia retrieval 429
14.1.3 text retrieval and multimedia retrieval 430
14.2 challenges 431
14.2.1 semantic gap 431
14.2.2 feature ambiguity 432
14.2.3 machine-generated data 432
14.3 content-based image retrieval 433
14.3.1 color-based retrieval 433
14.3.2 texture 434
14.3.3 significance: 436
14.4 audio and music retrieval 437
14.4.1 fingerprint recognition 437
14.4.2 Speech Recognition 438
14.4.3 Speaker Recognition 440
14.4.4 voice Document Retrieval 440
14.4.5 basic audio knowledge 440
14.5 searching and viewing videos 443
14.5.1 video summary 443
14.5.2 static summary 444
14.5.3 image stitching and jumping stills 445
14.5.4 animated summary 446
14.5.5 notebook summary 447
14.5.6 visual and auditory browsing comparison 448
14.5.7 summary rating 448
14.6 converged model: merge all information 449
14.6.1 face name 449
14.6.2 image name 450
14.6.3 audio name 451
14.6.4 audio and video audio-video speech recognition 451
14.6.5 multimedia processing combining audio and video 453
14.7 segmentation 453
14.7.1 video segmentation example 454
14.7.2 video segmentation solution 455
14.7.3 video segmentation by edge 455
14.7.4 speech segmentation 456
14.7.5 segmentation evaluation 457
14.8 compression and MPEG standard 457
14.8.1 intensity and sampling 458
14.8.2 color 458
14.8.3 lossy compression 459
14.8.4 lossless compression 461
14.8.5 time redundancy 461
14.8.6 Motion Prediction 461
14.8.7mpeg standard 462
14.9 trends and research issues 465
14.10 literature discussion 466
Chapter 2 Enterprise Search 15th
15.1 introduction 469
15.1.1 features and applications of Enterprise Search 469
15.1.2 Enterprise Search 470
15.1.3 workplace search 471
15.2 enterprise search task 471
15.2.1 search for supported tasks example 471
15.2.2 the search type is 473.
15.2.3 Research Enterprise Search 473
15.3 structure of enterprise search system 474
15.3.1 collection 474
15.3.2 extracting 476
15.3.3 index 477
15.3.4 text comment index 477
15.3.5 query 478
15.3.6 display of search results 479
15.3.7 Security Model 480
15.3.8 Union/Meta Search 482
15.4 Enterprise Search rating 484
15.4.1 public test set for Enterprise Search 484
15.4.2 Enterprise Search Internal Rating 485
15.4.3 Enterprise Search commissioning 486
15.4.4 what is the expectation 487
15.5 possible reasons for dissatisfaction: 488
15.6 situational and personalized 490
15.6.1 situational control and tools 491
15.6.2 situational: local, enterprise, or global 493
15.6.3 profile privacy 494
15.6.4 define, create, and maintain outlines 494
15.6.5 user modeling 495
15.6.6 implicit comments 496
15.6.7 Information Filtering 496
15.6.8 social recommendation system 497
15.7 trends and research issues 497
15.8 literature discussion 497
Chapter 2 library system 16th
16.1 Library Information Environment 499
16.2 online public retrieval directory 500
16.2.1opac and bibliography 501
16.2.2 ILS Information Retrieval 503
16.2.3 hybrid library integration 504
16.2.4opac and end user 505
16.2.5ils: suppliers and products 506
16.3 Information Retrieval System and document database 507
16.3.1 bibliography and full-text database 508
16.3.2 database record content 508
16.3.3 online industry: Database supplier 510
16.3.4 information retrieval from document database 511
16.4 information retrieval within an organizational unit 514
16.5 trends and research issues 515
16.6 literature discussion 516
Chapter 2 digital library 17th
17.1 introduction 517
17.2 Definition Digital Library 517
17.3 general architecture 518
17.4 basic concepts 519
17.4.1 digital objects and collections 519
17.4.2 metadata and directory 520
17.4.3 resource library/archive 522
17.4.4 service 525
17.5 social and economic issues 527
17.5.1 social problem 527
17.5.2 economic issue 527
17.6 software system 528
17.6.1greenstone529
17.6.2eprints529
17.6.3dspace529
17.6.4fedora529
17.6.5odl530
17.6.65s kit 530
17.7 digital library case study 531
17.7.1 online degree thesis digital library 531
17.7.2 National Science and digital library 532
17.7.3etana-DL archaeological digital library 532
17.8 trends and research issues 532
17.8.1 comments 532
17.8.2 integration 533
17.8.3 other research challenges 533
17.9 literature discussion 534
Appendix A open source search engine 535
Appendix B AUTHOR introduction 549
References 554
Index 654

This book is from: China Interactive publishing network

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.