Subscribe for automatic updates: RSS icon RSS

Login icon Sign in for full access | Help icon Help
Advanced search

Pages: [1]
  Reply  |  Print  
Author Topic: Convert Program to UTF-8  (Read 11699 times)
.
Posts: 4


« on: January 10, 2018, 03:24:52 pm »

Hello,
we are converting our Database (Informix) to UTF-8, and so we have to convert our programs.
The most part works fine:
General preferences: Text Encoding = utf-8
XML-Files (.4tb, .4tm) Encoding=utf-8
All Files converted to utf-8
Application - Property: LANG=.fglutf8

Only the forms doesn't work. If they are compiled wiht the GeneroStudio (3.10), the 'ä' or 'ü' are transformed in the 42f-File as '?'

But if we compile the form with the command "sform *.4fd", the transformation is correct.

what can I do?


best regards

Johanna Koechert

Sebastien F.
Four Js
Posts: 545


« Reply #1 on: January 10, 2018, 03:49:44 pm »

Hello Johanna,

Migration to UTF-8 is not a trivial process and the motivation must be to support multiple languages in your application.

If your motivation is to support multiple languages in the same instance of your application / database, there is no choice you have to migrate to UTF-8.

But if you will keep using your current language and move to UTF-8 because it's the new standard, you better keep using your current single-byte ISO-885xx locale (SBCS).

When using UTF-8, you enter the world of variable-sized, multi-byte character sets (MBCS).

In UTF-8:
  a = 1 byte
  é = 2 bytes
  是 = 3 bytes

Furthemore:

If, in your .4gl source you do something like

  LET c = mystr[10,11]

This is by default using BYTE length semantics, even when using UTF-8.
So this means extract one byte (not a char) at position 10.
This byte can be valid (a single A in ASCII), but any other non-ASCII byte will be a piece of a UTF-8 char (= invalid)

To leave the code untouched with UTF-8, Genero supports CHAR length semantics with FGL_LENGTH_SEMANTICS=CHAR environment variable.
In the above code, 10 and 11 become character positions, not byte positions.
Similarly, the LENGTH() function will return a number of characters (instead of bytes in the default BYTE length semantic)

Now the fun:

Informix does not support char length semantics, a VARCHAR(20) means 20 bytes, even in UTF-8.
OK, Informix provides the SQL_LOGICAL_CHAR server param (onconfig) to apply a ration to the size of CHAR/VARCHAR columns in CREATE TABLE / ALTER TABLE, but that's all. 
This does not enable full char length semantics: the SQL LENGTH() function will continue to return a number of bytes :-)

You may want to contact your support center for more details.

PLEASE PLEASE PLEASE Read also carefully:

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

http://4js.com/online_documentation/fjs-fgl-manual-html/#c_fgl_localization_038.html

It is very important to understand the principle and constraints of locale / charset support in Genero.
Especially: The locale used at compile time (if not ASCII) must be locale used at runtime and thus the database client locale.
To introduce more flexibility, we have implemented localized strings:
http://4js.com/online_documentation/fjs-fgl-manual-html/#c_fgl_localized_strings_001.html

Seb
Sebastien F.
Four Js
Posts: 545


« Reply #2 on: January 10, 2018, 05:18:07 pm »


With Informix (IDS 12) you need also to set GL_USEGLU=1, when creating your UTF-8 database and executing client programs, to enable the GLS for Unicode (GLU), based International Components for Unicode (ICU) libraries, instead of the default GLS libraries.

This is needed to get Unicode collation (sorting based on locale language rules).

https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.glsug.doc/ids_gug_090.htm

With the default GLS library (GL_USEGLU not set), we have also recently discovered that 4-byte UTF-8 characters such as emojis are rejected, when used in SQL string literals.

Try following to test your config:

Quote
create temp table tt1 ( pkey integer primary key, name varchar(50) );
insert into tt1 values ( 101, '⌚');
insert into tt1 values ( 102, '⌛');
insert into tt1 values ( 103, '⍐');
insert into tt1 values ( 104, '🀀');
insert into tt1 values ( 105, '😀');
select pkey, name, length(name) length from tt1 order by pkey;


Seb
Romain W.
Four Js
Posts: 48


« Reply #3 on: January 11, 2018, 08:52:30 am »

Hi Johanna,
Concerning the compilation of the forms in GST 3.10, may I ask you to open a support ticket and send us such form, please? Don't forget to attach also the DB schema if required.
But just a thought, did you set variable LANG to .fglutf8 in the Genero Studio configuration?
Thanks.
Regards,
Romain W.
Anderson P.
Posts: 82


« Reply #4 on: January 18, 2018, 06:18:12 pm »

Johanna, you are not alone...

We also are trying to figure out a way to migrate our Informix to UTF-8, today our database encoding is ISO-8859-1. The goal is to support Arab characters input, because our branch at Dubai needs to register the products description in Arabic.

One of the main challenges is that we have 22 servers which are connected between them, and all of them need to be migrated at the same time.

Besides the problem with the variable size, we are also worried that this migration will break the webservices that we provide and consume. Other raised concerns is that the Arabic writing is right to left, and this can cause issues at our applications and reports.

Also, it's a shame that informix does not support semantics length, this problem by itself is almost a deal breaker for this conversion, since almost all of our text columns are varchar with the specific length for the information that it stores.

As a matter of fact, we are almost giving up on this. After contacting some engineers from IBM that confirmed "there is no easy way of doing this, you need to drop the entire database, create a new database and import the old database converting the charset, hoping to have no big problems during this process", we are in doubt if it really is a reasonable thing to do.

But it's at least interesting to know that other companies are trying this same migration, if you could provide us with updates about this, maybe we can help each other.
Pages: [1]
  Reply  |  Print  
 
Jump to:  

Powered by SMF 1.1.21 | SMF © 2015, Simple Machines