Hi,
I am new to Genero.
Could somebody help me with understanding some aspects of how the Genero BDL and the Genero Studio handle ASCII and UTF-8?
I am using Genero Studio 3.10.11, running on Microsoft Windows Server 2019 (US English language).
I would like to generate XML files with all the characters that are not 7-bit ASCII escaped.
However I am confused by the GST debug output behaviour when non ASCII characters are displayed.
Here is what I have done so far:
DEFINE input_text STRING
DEFINE doc xml.DomDocument
DEFINE node xml.DomNode
DEFINE text_node xml.DomNode
DEFINE w om.SaxDocumentHandler
# Input text with UTF-8 characters (shown incorrectly, what encoding is this?)
LET input_text = ">Value at £42 other • bullet 1 • bullet 2"
# Generate ASCII encoded XML with all non-ASCII escaped
LET doc = xml.DomDocument.CreateDocument("x")
CALL doc.setXmlEncoding("ASCII")
LET node = doc.getDocumentElement()
LET text_node = doc.createTextNode(input_text)
CALL node.appendChild(text_node)
CALL
doc.save("output_ascii.xml")
# Generate UTF-8 encoded XML with minimal escaping (< and > only)
LET w = om.XmlWriter.createFileWriter("output_sax.xml")
CALL w.startDocument()
CALL w.startElement("d", NULL)
CALL w.characters(input_text)
CALL w.endElement("d")
CALL w.endDocument()
# Output text from output_ascii.xml with all non-ASCCI (non 7bit) characters escaped
# This is the output I need
<?xml version="1.0" encoding="ASCII" standalone="no"?><x>>Value at £42 other • bullet 1 • bullet 2</x>
# Output text from output_sax.xml, showing dots as "•" (which is correct) instead of "•" (which is incorrect)
# This is a valid output, which correctly displays all characters,
# including those that need more than 7bits (or more than 1 byte)
<?xml version='1.0' encoding='UTF-8'?><d>>Value at £42 other • bullet 1 • bullet 2</d>
# If I load either file using Genero BDL om.XmlReader
# and display the contents to the Debug output of the Genero Studio IDE,
# I get the output with the "incorrect" characters:
>Value at £42 other • bullet 1 • bullet 2
# Is there a way to turn on "display UTF-8" in the GST debug output?