Unlike other file formats (FrameMaker, Word, Excel, etc.), SGML (Standardized General Markup Language) and XML (eXtensible Markup Language) are not real file formats; they are a standard for tagging files, and for defining those tags. Since every set of SGML/XML files uses a different set of tags, an SGML/XML filter must be created for every set of SGML/XML files.
SGML defines a standard for creating DTDs (Document Type Definition). For example, the World Wide Consortium (W3C) has DTDs for the various specifications of HTML; this means that HTML is a markup language defined according to SGML rules. You will probably be somewhat familiar with the structure and tags in HTML, so we will use it as an example in our explanations.
Tags and Attributes
SGML files are text files that encode formatting, layout, and image information using tags. Tags are in the format of:
<TAGNAME ATTRIBUTE1="VALUE1" ATTRIBUTE2="VALUE2">
A tag can contain attributes that further define a value of the tag.
Because Déjà Vu X2 Professional does not need to interpret tags and keys, there are only two pieces of information that you must provide:
- Embeddable tags: An embeddable tag is one that can appear in the middle of a segment, and Déjà Vu should not split the segment before or after this tag. For example, the <B> and <I> tags in HTML (which specify bold and italic attributes) are embeddable, while the <P> tag (which specifies a paragraph change) is not.
- Extractable text between tags: It is possible to define whether text between certain tags is extractable (default) or not extractable, i.e., not translatable. For example, if text between certain tags always contains dates or numbers that may not need to be translated, you can choose to embed that text.
If you define a text between tags with nested subtags (for example, <tag1> text <tag2> text </tag2></tag1>) as non-extractable, text between the nested subtags will not be extracted either.
However, attributes (see below), are not affected by a choice to not extract text between tags (for example, the attributetext in: <tag1 attribute="attributetext"> text </ tag1> would be extracted if so defined, even though the text of tag1 may be defined as not extractable).
- Extractable attributes: certain tags may contain attributes whose values are translatable, and must therefore be extracted. For example, the <IMG> tag in HTML (which inserts an image into the text) has the ALT=“[alternate text for the image]” attribute, which specifies the text to display if the browser cannot load the image. This text should be translated, so the attribute is extractable.
Déjà Vu offers two possibilities for creating an SGML filter file:
- from the DTD file
- directly from the SGML/XML files
In general, it is advisable to combine the methods to allow for greater accuracy of the SGML filter.
Click on the version of Déjà Vu that you are using below to see how to create an XML filter.
|Déjà Vu X2
|Déjà Vu X3