Re-indexing

From time to time it may be necessary to completely re-generate an index. Reasons for re-indexing include:

  • A Content Engine upgrade. Some upgrades include modifications to the default solr schema used by Content Studio.

  • Changes to one or more of your publications, or the addition of new search functionality require changes to your own custom solr schema.

In theory, all you need to do to re-index your publications is click on the Reindex... button on the indexer web application's admin page. However, the re-indexing process may take several hours on large sites, and while it is in progress, search requests will return incomplete results. In many production environments, reduced search functionality over several hours is not acceptable. In such cases you can avoid the problem by generating the new index using a separate, non-production Tomcat instance, and then copying the new index to the production environment.

The exact procedure for doing this is installation-dependent, but involves the following general steps:

  1. Install a new Solr instance somewhere in your network that you can use for generating the new index. See Install Solr for details.

  2. Copy context.xml from one of your production Tomcat configurations to your indexing Tomcat instance. This ensures that your indexer web application will be correctly configured to communicate with the Content Engine's indexer web service. By default, context.xml is located in /opt/tomcat-engine1/conf/.

  3. Copy the solr configuration files (usually located in /etc/escenic/solr/solr-core) from your production solr instance to your indexing instance.

  4. Modify the copied configuration as necessary for generating the new index. You might, for example, need to replace the schema file, schema.xml.

  5. Start the new solr instance:

    $ /opt/solr/bin/solr start
  6. Start a browser and display the new indexer web application's admin page (http://host:port/indexer-webapp/admin/)

  7. Click on Reindex..., then click on your browser's Back button to redisplay the admin page.

  8. Wait for the indexing job to complete. The Current state section of the admin page shows the progress of the indexing operation, but it is not refreshed automatically. Click on your browser's Refresh button from time to time and check the Number of documents read but not yet processed value. When this value reaches 0, indexing is complete.

  9. Test the generated index. The easiest way to do this is to use Solr's administration interface. Open a web browser, go to http://host:port/solr/solr-core and follow links to the correct administration page (exactly how you get there is installation-dependent). The administration page contains a search field that you can use to execute test searches, plus links to the Solr documentation.

  10. If you are not satisfied with the results, make the required changes to your configuration files, and try again (from step 6). Otherwise, continue.

  11. Stop the Tomcat instance in which your production solr instance is running.

  12. Copy your modified solr configuration files from your indexing instance to the production instance.

  13. Copy the new index file (usually /opt/escenic/indexer/head-tail.index) from your indexing instance to the production instance.

  14. Restart your production Tomcat instance.