What is the problem

If you import an XML document into Déjà Vu, or some other kind of document that has an XML based format (e.g. XLIFF), you may find that some segments contain a lot of XML tags in them, even though you would expect Déjà Vu to process them. For example, something like this:

CDATA001_1

This is what you see in the translation grid if you import a file that contains the following:

CDATA002_1

As the example illustrates, you most likely have come across a CDATA tag inside the XML file. You can find more information about what a CDATA tag is here. Essentially, what happens is that the CDATA is extracted in one go: Déjà Vu will put the whole content of a CDATA tag into one segment in the translation grid, and it won’t do anything else with it. That’s why there is HTML in the grid when you import these files, and why there are several sentences inside the segment that were not split into separate segments. Déjà Vu extracted the whole CDATA tag and dumped it into a segment.

Déjà Vu has to act this way because it is required by the XML standard. The reasons for this are somewhat obscure, but nonetheless important.

What you can do about it

How to deal with the tags

The XML filter in Déjà Vu, and most of the filters for XML based formats that Déjà Vu has, have an option that you can use to make working with data found in these tags much easier. That is the Process Embedded HTML option:

Using this option with the example above, we get this result:

CDATA004_1

How to deal with the segmentation

In the example shown above, Déjà Vu inserted several different sentences it extracted from the CDATA tag into a single segment. Given the way CDATA tags are processed by Déjà Vu, the segmentation rules are not applied to the text extracted from them.

At the moment, there is no automatic way to split these segments. If you run into a CDATA tag that contains several different sentences that should be split by Déjà Vu, you can do so manually.

XML files and the CDATA tag

What is the problem

What you can do about it

How to deal with the tags

How to deal with the segmentation

Comments