Search Engine Deployment and Configuration
The
Content Engine's
search functionality is provided by the Java search engine Apache Solr,
which runs as a web application. The standard
Content Engine
installation includes a solr
web application and an
associated indexer
web application that indexes the
content of all
Escenic
publications. These two applications are deployed along with the
Content Engine
by the ece
script's deploy
action.
The result of following the basic installation procedure described in
Installation Procedure, therefore, is that a
solr
instance and indexer
web
application is deployed on every engine host in
your installation, all with identical configurations.
This set up will work, but it is relatively inefficient and is unlikely to work well in a production environment. There are two main reasons for this:
- Solr memory usage
-
solr
and theindexer
can at times consume large amounts of memory and trigger large garbage collection operations in the JVM, which has severe effects on Content Engine performance. They should therefore not be run in the same JVM as the Content Engine on production systems.solr
already runs in its own webapp container (and therefore in a different JVM), but the indexer is deployed to the same Tomcat instance as the Content Engine. The simplest way to achieve this separation on a single-host installation is to move theindexer
webapp to a separate Tomcat instance. For more about this, see Isolating The Search Engine. - Solr stemming
-
In the default
solr
configuration, English stemming is enabled by default. This means that searching non-English content might give unexpected results.If your content is in a language other than English, you should either disable stemming or modify the configuration to suit your language.
To disable stemming, remove the following line from
schema.xml
:<filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>
Disabling stemming will improve
solr
's performance.For information about how to configure stemming for other languages, see the Solr documentation on http://lucene.apache.org/solr/.
- Solr optimization issues
-
The default
solr
configuration is optimized for editorial purposes: it indexes all the fields needed to support the search functionality provided by Content Studio, resulting in very large indexes. This is acceptable in the editorial context, since the number of concurrent Content Studio users, even in a very large organisation, is not likely to be very large. The presentation hosts in a large Escenic installation, however, can be required to serve many thousands of concurrent users, and the defaultsolr
configuration may perform poorly in this context.The default configuration, therefore, is fine for the editorial hosts in a production system, but for the presentation hosts you are recommended create a custom indexer configuration that only indexes the fields actually needed to support the kinds of search required in your publications.
To do this, open
/var/lib/escenic/
solr-core/schema.xml
for editing on each of your presentation hosts, and modify the index schema to meet your requirements. Editing this file is outside the scope of this manual. In order to tune the search engine you need to take account of the both the contents of your publications, your users' needs with regards to search and the limitations imposed by your particular hardware configuration. For further information and advice on tuning, see the Solr documentation on http://lucene.apache.org/solr/.
There are many more changes you can make to your search engine set-up in order to optimize it for your particular needs. For a discussion of the general principles involved, see Search Engine Configuration and Management.