If I understand correctly, this XSLT adds LStr nodes to all form items where "text" or "comment" are used, as if a %"xxx" localized string was used... right?
Yes.
I would rather add % before each string of TEXT, COMMENT, TITLE attributes in the real source file of form (The .per file)
Not if you have over 4000 form files to process; besides, your suggestion does not deal with literal strings in the grid image sections: the XSLT neatly injects mappings for literals too. The localisation section of the FGL manual suggests:
Warning: It is not possible to specify a static localized string directly in the area of containers like GRID, TABLE or SCROLLGRID. You must use label fields to use localized strings in layout labels:
01 LAYOUT
02 GRID
03 {
04 [lab01 |f001 ]
05 {
06 END
07 END
08 ATTRIBUTES
09 LABEL lab01 : TEXT=%"myform.label01";
10 EDIT f001 = FORMONLY.field01;
11 END
I expect I would get lynched if I tried to make the programmers work with that. The entire form would become very cryptic and inpenetrable. Not only is the label's TEXT attributes a few pages down in the editor, the contents aren't even immediately obvious! Now, the contents of the TEXT could progressively be converted to something reasonably descriptive and rational over time, but how much time for 4000+ forms? I don't think it's an option.
I believe that converting all text and comment attributes would capture all and only all the descriptive strings, without damaging any CODE type attributes. Of course I will need to confirm that, and perhaps add a few exceptions or special rules for special attributes, but overall the XSLT will make trivial the rather complex and time-consuming process of localizing forms.
Here we speak about form files, but in the 4gl sources, you will you have to add a % before all string to be localized. This has to be done by hand...
This problem has not been missed. The 4GL's will contain a mixture of codes, SQL string fragments, and descriptive text. There is little an automatic tool can do. However we should be able to use scripts to progressively identify patterns, exceptions and rules which can be combined with human intervention to tumble home the conversion of 4GLs. At this stage I don't see much better than this.
I would also strongly suggest to replace the original strings by a real string resource identifier:
COMMENT = %"customer_name.comment"
This sounds like a fairly attractive proposition for field comments. Some simple scripts could gather all comments from all forms, beg for human intervention when the same field has the different strings, attach them to the fields in a data dictionary and then re-comment all the fields with your symbolic suggestion when a dictionary entry is available. Yes, I like that.
If you have non-ASCII characters in the original text,
Fortunately not a problem for us; Australia is an English speaking country (if you believe the hype, many would disagree :-) and we're ASCII all the way.
Our first "localization" is to another English speaking country, so the conversions will be relatively trivial changes for local custom and legislation. The next possible conversion is to a real non-english language, which fortunately still uses ASCII or a simple 8-bit character set at the worst case. So the English conversion will be a nice stepping stone to get some useful infrastructure in place. I think without a very quick and convenient structured editor for looking up strings, we would be risking a rebellion if we tried to go non-english.
But regardless of whether it's a minor english-to-english conversion, or a much larger english-to-X conversion, we're not going to quickly get away from mapping strings like "please enter the cost of widgets" vs. logical "top.middle.base" strings. Therefore the option of automagic localization of forms would be very welcome, especially since it would deal with the literal strings of GRID TABLE and SCROLLGRID containers.
While we're discussing localization, what thought has been given to allowing parameterisation of the LSTR function? I consider reordering important because I hear that the proper translation of this example:
Found 10 documents out of 15
is properly translated to
Out of 15 documents, there are 10 found
in some language (italian? mandarin? must re-read my sources)
And of course many many more parameterised strings will no doubt have ordering issues.
The C# convention of using {0} {1} {2} etc is probably smarter than the old C style of %d %x since it allows reordering of the parameters. However the syntax used to interpolate parameters is of course open to any suggestion.