This section describes the process of indexing: adding content to the SOLR index and, if necessary, modifying the content or deleting it. By adding content to the index, we make its content searchable.
SOLR indexes can receive data from different sources, XML files, comma-separated values (CVS) files, databases, plain format files such as Word or PDF.
Here are three different ways to load data into the index:
- Use the SOLR cell framework based on Apache Tika to ingest binary files or structured files such as office, Word, PDF, and all other formats.
- Upload an XML file by sending an HTTP request to the SOLR server.
- Use the SOLR Java Client API to write a custom Java application to extract data. (If you are using an application, such as a content management system, using the Java Client API may be a good choice.)
Regardless of the method of extracting the data, there is an ordinary data structure with a basic database added to the index: a document contains multiple fields, each field contains a name and content, Content may be empty. One of the field is designed as a unique primary key ID.
1.6.1 What is indexing