Search Engine Deployment and Configuration

The Content Engine's search functionality is provided by the Java search engine Apache Solr, which runs as a web application. The standard Content Engine installation includes a solr web application and an associated indexer web application that indexes the content of all Escenic publications. These two applications are deployed along with the Content Engine by the ece script's deploy action.

The result of following the basic installation procedure described in Installation Procedure, therefore, is that a solr instance and indexer web application is deployed on every engine host in your installation, all with identical configurations.

graphics/solr-indexer-multiple-host.png

This set up will work, but it is relatively inefficient and is unlikely to work well in a production environment. There are two main reasons for this:

Solr memory usage

solr can at times consume large amounts of memory and trigger large garbage collection operations in the JVM, which has severe effects on Content Engine performance. It should therefore not be run in the same JVM as the Content Engine on production systems. The simplest way to achieve this separation on a single-host installation is to run solr and the indexer webapp in a separate Tomcat instance. For more about this, see Isolating The Search Engine.

Solr optimization issues

The default solr configuration is optimized for editorial purposes: it indexes all the fields needed to support the search functionality provided by Content Studio, resulting in very large indexes. This is acceptable in the editorial context, since the number of concurrent Content Studio users, even in a very large organisation, is not likely to be very large. The presentation hosts in a large Escenic installation, however, can be required to serve many thousands of concurrent users, and the default solr configuration may perform poorly in this context.

The default configuration, therefore, is fine for the editorial hosts in a production system, but for the presentation hosts you are recommended create a custom indexer configuration that only indexes the fields actually needed to support the kinds of search required in your publications.

To do this, open var/lib/escenic/schema.xml for editing on each of your presentation hosts, and modify the index schema to meet your requirements. Editing this file is outside the scope of this manual. In order to tune the search engine you need to take account of the both the contents of your publications, your users' needs with regards to search and the limitations imposed by your particular hardware configuration. For further information and advice on tuning, see the Solr documentation on http://lucene.apache.org/solr/.

There are many more changes you can make to your search engine set-up in order to optimize it for your particular needs. For a discussion of the general principles involved, see Search Engine Configuration.