Importing Content Items That Contain HTML Entities

The import service accepts only valid XML data. This means that the content in rich text fields must be valid XHTML, not HTML. This means that standard HTML named character entities such as — and   are not allowed.

Here is an example of some content that includes such HTML entities.

<?xml version="1.0" encoding="UTF-8"?>
<escenic xmlns="http://xmlns.escenic.com/2009/import" version="2.0">
  <content source="ece-auto-gen" sourceid="6ecfd92e-12e3-4773-877c-0dff82811c29" ...>
    <uri>article68.ece</uri>
    ...
    <field name="body">
      <p xmlns="">Here are some HTML character entities: &mdash; and non-breaking&nbsp;space</p>
    </field>
    ...
  </content>
</escenic>

Two solutions to this problem are described below.

Adding a DOCTYPE declaration

Before importing, include a DOCTYPE declaration for the required characters. The example below shows the same import data with an in-line DOCTYPE declaration that defines the entities used in the content:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE htmlEntities [
    <!ENTITY mdash "&#x2014;">
    <!ENTITY nbsp "&#xa0;">
    ]>
<escenic xmlns="http://xmlns.escenic.com/2009/import" version="2.0">
  <content source="ece-auto-gen" sourceid="6ecfd92e-12e3-4773-877c-0dff82811c29" ...>
    <uri>article68.ece</uri>
    ...
    <field name="body">
      <p xmlns="">Here are some HTML character entities: &mdash; and non-breaking&nbsp;space</p>
    </field>
    ...
  </content>
</escenic>      
Replacing the entities

Before importing, replace the named entities with valid numerical character entities. The example below shows the same import data processed in this way:

<?xml version="1.0" encoding="UTF-8"?>
<escenic xmlns="http://xmlns.escenic.com/2009/import" version="2.0">
  <content source="ece-auto-gen" sourceid="6ecfd92e-12e3-4773-877c-0dff82811c29" ...>
    <uri>article68.ece</uri>
    ...
    <field name="body">
      <p xmlns="">Here are some HTML character entities: &#x2014; and non-breaking&#xa0;space</p>
    </field>
    ...
  </content>
</escenic>