Importing Content Items That Contain HTML Entities
The import service accepts only valid XML data. This means that the
content in rich text fields must be valid XHTML, not HTML. This means
that standard HTML named character entities such as
—
and
are
not allowed.
Here is an example of some content that includes such HTML entities.
<?xml version="1.0" encoding="UTF-8"?> <escenic xmlns="http://xmlns.escenic.com/2009/import" version="2.0"> <content source="ece-auto-gen" sourceid="6ecfd92e-12e3-4773-877c-0dff82811c29" ...> <uri>article68.ece</uri> ... <field name="body"> <p xmlns="">Here are some HTML character entities: — and non-breaking space</p> </field> ... </content> </escenic>
Two solutions to this problem are described below.
Adding a DOCTYPE declaration
Before importing, include a DOCTYPE declaration for the required characters. The example below shows the same import data with an in-line DOCTYPE declaration that defines the entities used in the content:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE htmlEntities [ <!ENTITY mdash "—"> <!ENTITY nbsp " "> ]> <escenic xmlns="http://xmlns.escenic.com/2009/import" version="2.0"> <content source="ece-auto-gen" sourceid="6ecfd92e-12e3-4773-877c-0dff82811c29" ...> <uri>article68.ece</uri> ... <field name="body"> <p xmlns="">Here are some HTML character entities: — and non-breaking space</p> </field> ... </content> </escenic>
Replacing the entities
Before importing, replace the named entities with valid numerical character entities. The example below shows the same import data processed in this way:
<?xml version="1.0" encoding="UTF-8"?> <escenic xmlns="http://xmlns.escenic.com/2009/import" version="2.0"> <content source="ece-auto-gen" sourceid="6ecfd92e-12e3-4773-877c-0dff82811c29" ...> <uri>article68.ece</uri> ... <field name="body"> <p xmlns="">Here are some HTML character entities: — and non-breaking space</p> </field> ... </content> </escenic>