Subscribe for automatic updates: RSS icon RSS

Login icon Sign in for full access | Help icon Help
Advanced search

Pages: [1]
  Reply  |  Print  
Author Topic: XMLWriter - controlling the character encoding used.  (Read 15119 times)
David H.
Posts: 158


« on: September 04, 2009, 10:21:18 am »

Hi all,

We need to be able to generate XML files with specific character encodings in order to interface with Customers specific requirements. Currently using Genero v2.11 on Windows all files we generate seem to end up as <?xml version='1.0' encoding='windows-1252'?> Unless I'm missing something obvious, I don't see any way we can control this programmatically. If not then please could you consider an enhancement to allow us to control what encoding is used for the XML files we generate.

TIA,

David
Frank G.
Four Js
Posts: 48


« Reply #1 on: September 04, 2009, 10:59:10 am »

Hi, have you ever tried with the XML apis of the GWS package ?

Frank
Sebastien F.
Four Js
Posts: 509


« Reply #2 on: September 04, 2009, 11:08:16 am »

David,

Today it's not possible to generate an XML file with the XMLWriter in a different encoding as the current application character set, defined by LANG/LC_ALL on Unix and by the system language on Windows.

To write XML files in different encoding with om.XmlWriter, we would need to integrate or use a charset conversion library like iconv or IBM's ICU.

BTW there is a bug we plan to fix in 2.21: Today when loading an XML file, no character set checking is done, so you can get invalid chars if you don't pay attention. This is not an issue if you use the same encoding for fglrun and XML files or when loading ASCII only XML files. This problem is registered as bug #9500.

Seb
David H.
Posts: 158


« Reply #3 on: September 08, 2009, 09:57:27 am »

Hi Frank,

No as yet I've not but when I get a chance I'll certainly give it a go... Sorry, I have to ask, since the versions in the XML library seem fully featured I wonder why you don't phase out the built in OM classes and replace them with the XML library instead?

Regards,

David
Frank G.
Four Js
Posts: 48


« Reply #4 on: September 08, 2009, 11:01:55 am »

Hi David,

  The OM XML library was first designed to handle the user interface dynamically, and then extended to handle XML documents, but maybe that the FGL team can give you more details. Moreover, the GWS XML library was designed according to the W3C (DOM) specification, therefore it will be difficult to replace the OM implementation with the GWS one without any impact on existing 4GL applications already using the OM library. I would recommend to use the OM library to perform UI tree manipulation, and the GWS library for XML manipulation.

Regards,
Frank
Sebastien F.
Four Js
Posts: 509


« Reply #5 on: September 11, 2009, 10:21:52 am »

About the OM classes in FGL:

The OM library is there for historical reason and is optimized for AUI tree management. It was extended to manipulate XML files and do basic XML processing. In this context we don't see the need to implement a huge XML lib with external lib dependencies.

Regarding the encoding produced in the <?xml ?> header:

This is a STANDARD XML IANA character set name, resolved from the (LANG/LC_ALL) conversion file FGLDIR/etc/charmap.alias.

  http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
  http://www.iana.org/assignments/character-sets

If the third party tool reading the XML file generated from FGL is not able to support a standard encoding specification in the <?xml > declaration, it's a bug in that software.

If you are sure you are writing only pure ASCII characters, you can argue that you want only to write an ASCII encoded file that does not need encoding specification in the header (actually it defaults to UTF-8, but since ASCII is a sub-set of UTF-8 it's ok), or you may want to set the "ANSI_X3.4-1968" / "ASCII" encoding by hand... The problem is that the FGL runtime system will not check if the bytes written are valid ASCII characters, and this can be dangerous. Other customers could then complain that the VM generates ASCII XML files with invalid characters... new bug for us.

So far the only workaround I can see it to patch the generated file after it's finished, replacing [encoding="windows-1252"] by [encoding="ANSI_X3.4-1968], but again this makes only sense if you are using pure ASCII chars, we have no charset conversion library like iconv build in the VM.

Seb
Rene S.
Four Js
Posts: 111


« Reply #6 on: September 11, 2009, 11:18:32 am »

Hello, I don't know what OS you're using.
Within my Linux box I would simply call xmllint --encode encoding filename.
This command would convert characters - if necessary - and change the encoding name in the output file.
If the program is not available, install  libxml2-utils. On debian or ubuntu: apt-get install ibxml2-utils.
Rene
Sebastien F.
Four Js
Posts: 509


« Reply #7 on: September 11, 2009, 12:23:13 pm »

David's application is on Windows as mentioned in the first post.

Maybe David can write a little Windows Scripting (.wsf) program using the Microsoft.XMLDOM class to convert to a different encoding... Sorry I did not check, I am not an expert of Microsoft.XMLDOM but they should provide such conversion function.

We have a example of .wsf script in FGLDIR/lib/fgldoc of FGL 2.20, doing XSTL transformations.

http://msdn.microsoft.com/en-us/library/15x4407c(VS.85).aspx

Seb
David H.
Posts: 158


« Reply #8 on: September 11, 2009, 01:26:22 pm »

Thanks for the replies/suggestions.

I agree that the character set used should not matter as the software reading the file should of course handle the conversion. However that said I seem to keep coming across XML specs from 3rd parties which state a specific character encoding must be used. I've also seen sample files we've generated fail tests because the wrong character encoding was used!

I think for now, I'll alter the encoding externally. Ultimately we only use normal ASCII characters anyway, so we should not have any problems doing this.

In the longer term I'm going to have a look at switching our XML generation library from the OM classes to the GWS ones, in order to get full control over everything we generate. This looks quite easy coding wise. The only complication being I've not installed GWS at any client sites, so I'd need to do some upgrading before I could roll this out globally!

Cheers,

David
Frank G.
Four Js
Posts: 48


« Reply #9 on: September 14, 2009, 09:36:19 am »

Hi,

  Just for your information, GWS package is bundled with FGL since version 2.11.05 in the fglgws package.

Regards,

Frank
Pages: [1]
  Reply  |  Print  
 
Jump to:  

Powered by SMF 1.1.21 | SMF © 2015, Simple Machines