HTML Markup

Version 4 of the Content Engine is very lenient with regard to HTML mark-up, and will accept many different "flavors" of HTML in HTML fields. This made it very easy to import content into the Content Engine, but could cause problems with display of the imported data in Content Studio and browsers, and also made it less easy to process and re-purpose content. Version 5 takes a different approach, and requires all the content in an HTML field to be valid XHTML mark-up.

This requirement obviously creates something of an obstacle to upgrading from version 4 to version 5, but is in the long-term best interests of most users of the Content Engine since it will:

  • Increase the overall reliability of the publishing process.

  • Make it easier to automatically process publication content, thereby increasing its potential value.

The importer automates the process of conversion to XHTML as far as possible, and for publications with "good quality" HTML content, the conversion may be completely invisible. All HTML field content is automatically passed through an HTML "clean-up" process during import, which silently converts the input stream to valid XHTML. This process, however, is not able to convert all possible variations of HTML to XHTML. When it encounters HTML that cannot be converted it will fail. An error will be written to the log and the offending content item will not be imported. You will then need to correct the offending HTML and re-import the content item.