Subscribe for automatic updates: RSS icon RSS

Login icon Sign in for full access | Help icon Help
Advanced search

Pages: [1]
  Reply  |  Print  
Author Topic: Newbie: how to create fully (7-bit) escaped XML?  (Read 9138 times)
Francois G.
Posts: 16

« on: December 04, 2020, 11:03:38 pm »


I am new to Genero.
Could somebody help me with understanding some aspects of how the Genero BDL and the Genero Studio handle ASCII and UTF-8?

I am using Genero Studio 3.10.11, running on Microsoft Windows Server 2019 (US English language).

I would like to generate XML files with all the characters that are not 7-bit ASCII escaped.
However I am confused by the GST debug output behaviour when non ASCII characters are displayed.

Here is what I have done so far:

DEFINE input_text STRING
DEFINE doc xml.DomDocument
DEFINE node xml.DomNode
DEFINE text_node xml.DomNode
DEFINE w om.SaxDocumentHandler

# Input text with UTF-8 characters (shown incorrectly, what encoding is this?)
LET input_text = ">Value at £42 other • bullet 1  • bullet 2"

# Generate ASCII encoded XML with all non-ASCII escaped
LET doc = xml.DomDocument.CreateDocument("x")
CALL doc.setXmlEncoding("ASCII")
LET node = doc.getDocumentElement()
LET text_node = doc.createTextNode(input_text)
CALL node.appendChild(text_node)

# Generate UTF-8 encoded XML with minimal escaping (< and > only)
LET w = om.XmlWriter.createFileWriter("output_sax.xml")
CALL w.startDocument()
CALL w.startElement("d", NULL)
CALL w.characters(input_text)
CALL w.endElement("d")
CALL w.endDocument()

# Output text from output_ascii.xml with all non-ASCCI (non 7bit) characters escaped
# This is the output I need
<?xml version="1.0" encoding="ASCII" standalone="no"?><x>&gt;Value at &#163;42 other &#8226; bullet 1  &#8226; bullet 2</x>

# Output text from output_sax.xml, showing dots as  "" (which is correct) instead of "•" (which is incorrect)
# This is a valid output, which correctly displays all characters,
# including those that need more than 7bits (or more than 1 byte)
<?xml version='1.0' encoding='UTF-8'?><d>&gt;Value at 42 other bullet 1  bullet 2</d>

# If I load either file using Genero BDL om.XmlReader
# and display the contents to the Debug output of the Genero Studio IDE,
# I get the output with the "incorrect" characters:
>Value at £42 other • bullet 1  • bullet 2

# Is there a way to turn on "display UTF-8" in the GST debug output?
Sebastien F.
Four Js
Posts: 505

« Reply #1 on: December 05, 2020, 11:51:42 am »

Hello Franois,

Regarding Genero BDL and charset/locale management:

It's very important to understand charset configuration in all software components that interact with a Genero application:

- Database locale / collation (server side)
- Database client charset configuration (application side)
- Genero BDL application locale (at runtime AND compile time)
- Extension libraries like Genero Web Services
- File I/O with OS file system with API like om/XML ...
- When using TUI, the terminal emulator charset definition
... etc

We can help you on this.
I suggest that you contact your support center, so we can organize a conf call.

In the meantime, I recommend you to read this chapter of the documentation:

Check the schema in this page, to understand where charset config matters (in yellow):

Since you are using Microsoft SQL Server, I also recommend you to read this:

(Note that Genero V4 will support UTF-8 in char/varchar SQL Server columns, but V3.20 does not)

Regarding the XML question, we need to investigate on that.
I see that you are using other characters as ASCII-7 in your sources.
You need to define what charset encoding you use for the Genero fglcomp/fglform compilers and the fglrun runtime system.
You must also define the charset you use in the GST configuration.

Sebastien F.
Four Js
Posts: 505

« Reply #2 on: December 05, 2020, 03:34:19 pm »

After some investigation, it looks like the UTF-8 string you want to use should be:

>Value at 42 other ⤢ bullet 1  ⤢ bullet 2

Can you confirm?

In fact you see:

>Value at £42 other • bullet 1  • bullet 2

Because the sequence of bytes is probably interpreted as CP-1252 charset.

I did a quick test with your sample on my Linux box, setting LANG=en_US.UTF-8, and I get the following in the output files:

$ cat output_ascii.xml
<?xml version="1.0" encoding="ASCII" standalone="no"?><x>&gt;Value at &#163;42 other &#10530; bullet 1  &#10530; bullet 2</x>

$ cat output_sax.xml
<?xml version='1.0' encoding='UTF-8'?>
<d>&gt;Value at 42 other ⤢ bullet 1  ⤢ bullet 2</d>

Which looks good to me.

On Windows, you must set the env var LANG=.fglutf8 (we can explain you more in detail this setting, or just read the documentation)

In GST, you need to set the UTF-8 locale as well.

PS: Just realized that you have mentioned Windows Server and not SQL Server.

Francois G.
Posts: 16

« Reply #3 on: December 07, 2020, 09:57:12 am »

Thanks for the prompt answer Seb.

Windows dev server: runs GST, setup as US English.
Linux dev backend connects to an Informix database.
Linux production backend connects to its own Informix database.
All setup by FourJs.

The text is entered by our customers (copy and paste from a web page or a word processor) in full UTF-8.

I will read the documentation that you pointed to.

Sebastien F.
Four Js
Posts: 505

« Reply #4 on: December 07, 2020, 10:31:52 am »

Hello Franois,

If you want to support UTF-8 input and DB storage, you will have to setup UTF-8 locale for the whole chain:

- Genero application (fglrun) (LANG/LC_ALL)
- Informix Client (CLIENT_LOCALE, DB_LOCALE)
- Informix Server (DB_LOCALE when creating the DB)

Here another doc link related to Informix setup:

When you are done with reading the doc, I suggest that we have a talk so I can explain all of that.

Pages: [1]
  Reply  |  Print  
Jump to:  

Powered by SMF 1.1.21 | SMF © 2015, Simple Machines