Subscribe for automatic updates: RSS icon RSS

Login icon Sign in for full access | Help icon Help
Advanced search

Pages: [1] 2
  Reply  |  Print  
Author Topic: Wednesday - Breakout - How To Write Portable SQL Code - Sebastien Flaesch  (Read 3479 times)
Reuben B.
Four Js
Posts: 834


« on: November 18, 2020, 09:26:16 am »

Quote
How does Genero run the same SQL against many different database types?, what is the ODI layer?, what is and how do you write portable SQL? Join Sebastien, our lead developer for our database drivers for a technical deep-dive that will increase your understanding and make you a more effective developer when dealing with different databases.

This thread will be used to collate questions and answers from the live chat of Seb's presentation and any questions you may have after watching a recording.

Product Consultant (Asia Pacific)
Developer Relations Manager (Worldwide)
Author of https://4js.com/ask-reuben
Contributor to https://github.com/FourjsGenero
HJ W.
Posts: 2


« Reply #1 on: November 19, 2020, 02:28:29 am »

Hello Sebastien and all,

In some databases have "connect pool" to reduce time for touching databases.
From now, we always connect database from session begin until finish. do we have any plan to contact with connect pool even for different session sharing ?

Regards,
alex
Sebastien F.
Four Js
Posts: 465


« Reply #2 on: November 19, 2020, 11:08:57 am »

Hello Alex and thanks for your question about connection pools.

First we need to clarify what a connection pool is:

"A connection pool is a cache of database connections maintained so that the connections can be reused when future requests to the database are required."

https://en.wikipedia.org/wiki/Connection_pool


The problem with connection pools is that the Genero application code needs to be changed:

The application must add SQL instructions, to indicate when it needs to do execute SQL statements.

Whatever solution you have in mind, the middle-tier connection pool manager needs to know if a client application needs a connection or if that connection can be reused by another client application.

So basically, you would have to do something like this in your code:

Code
  1.  
  2. MAIN
  3.  
  4.    -- Register for an SQL connection (ward the connection pool manager that we do SQL)
  5.    REGISTER FOR CONNECTION TO dbsrc AS "c1" USER uname USING upswd
  6.    ...
  7.  
  8.    -- Need to do some SQL...
  9.  
  10.    GET CONNECTION "c1" -- May need some time to get real SQL session from the pool
  11.  
  12.    DECLARE c1 CURSOR FOR SELECT ...
  13.    FOREACH c1 ...
  14.    END FOREACH
  15.  
  16.    PREPARE s1 FROM "..."
  17.  
  18.    BEGIN WORK
  19.    UPDATE ...
  20.    INSERT ...
  21.    COMMIT WORK
  22.  
  23.    FREE c1    -- Mandatory!!
  24.    FREE s1    -- Mandatory!!
  25.  
  26.    RELEASE CONNECTION "c1" -- SQL connection is not needed for now, can be used by another
  27.  
  28.    ...
  29.  
  30.    -- Need to do some SQL...
  31.    GET CONNECTION "c1"
  32.    ...
  33.    RELEASE CONNECTION "c1"
  34.  
  35. END MAIN
  36.  
  37.  

Imagine all the changes needed in your legacy code...

This cannot be auto-magically done in the ODI drivers!


The pros of connection pools:

- faster connection time
- less resources on DB server and client side

To me, both should be fixed by DB vendors!
See Informix or PostgreSQL how it's fast and low-memory consuming.
Yes, Oracle client is a MONSTER in terms of memory usage.


The cons of connection pools:

- Any SQL connection is done and associated with a DB user.
Sharing an SQL connection among several applications / users is impossible unless you use the same DB users for all applications, or use some DB specific SQL feature to switch to another user (like with Informix trusted connections)


- An Informix 4GL or Genero BDL application needs by essence a dedicated SQL connection.
That SQL connection is not stateless:
You can change SQL session parameters like for ex Oracle's
  ALTER SESSION SET NLS_DATE_FORMAT = ...
With connection pool, those session parameters are shared among different applications using the connection pool.


- SQL cursors belong to an SQL connection.
Before sharing/passing an SQL connection, ALL cursors or prepared statements must be closed/freed.


- Each DB vendors provides its own connection pool solution, some don't support that at all.


Note that when you implement web services, you do connection pool implicitly:
Each WS server process connects to the database, to execute SQL statements for the WS client application.
So the WS server processes in GAS infrastructure acts as an SQL connection pool.


My conclusion:

Keep dedicated SQL connections in you Genero applications, and add more RAM and CPU on client app server and DB server.
This will be cheaper than setting up connection pool!

Other option, ask Oracle to reduce the memory footprint of its DB client library and reduce the connection time!

Seb
Sebastien F.
Four Js
Posts: 465


« Reply #3 on: November 19, 2020, 11:24:18 am »

Hello!

During my session, I had no time left to talk about new SQL features and changes in Genero V4.
So I give a summary here...

Detailed information will of course be available in the Genero BDL documentation upgrade notes.
But if you want more information before the EAP, I can provide this here or during a 1:1 session.

SQL features in Genero V4:

- Support for SQL Server 2019 char/varchar columns using UTF-8 collation
- Better data type mapping for INTERVAL x TO y with Oracle DB
- Data type name mapping of CHAR/VARCHAR to NCHAR/NVARCHAR for IBM DB2 LUW, IBM Netezza, SAP HANA
- PostgreSQL 13
- IBM Netezza Performance Server

Several DB client/server versions that go end of life (according to DB vendors) will be desupported in Genero BDL V4.

Seb
Alessandro (Efisio) R.
Posts: 42


« Reply #4 on: November 19, 2020, 12:04:57 pm »

Hi Sebastien,

is it possible to implement the LOAD instruction (for PostgreSQL) to behave like when used Informix SE without transaction? Maybe with a fglprofile entry?

Thanks
Sebastien F.
Four Js
Posts: 465


« Reply #5 on: November 19, 2020, 03:26:32 pm »

Hello Efisio,

About LOAD with PostgreSQL to behave like an Infomix DB without transactions:

What behavior do you expect exactly?

Under the cover, the LOAD statement uses an insert cursor, and LOAD will try to start a transaction, to commit or rollback all rows as a whole.

When the database does not support TXs (like with Informix in your case), no TX block is used obviously.
The LOAD will insert rows, until first error occurs, and then it stops.
Already inserted rows remain in the table (since no TX is used)
Is this the behavior you are looking for?

First I would suggest that this kind of problem should NOT occur, and you better make sure that the data file does not contain invalid data.

The fact that the LOAD works in an atomic way (when DB supports TXs) is good...
To continue the load in your case, you would have to deal with already inserted rows.
What are you doing with that?
Delete all rows, fix the data file, then try to reload?
If the table contains rows before the LOAD, how do you identify which rows are the new ones?

As I explained yesterday, PostgreSQL has TXs and in case of error, the whole TX is canceled.
This is ok with the current implementation of LOAD.

If I understand well, you ask for an FGLPROFILE entry, to avoid LOAD to use a TX internally?

It should be possible, but is this really a good idea???

BTW note that if you explicitly start your own TX before LOAD, the LOAD will not use its own transaction:

BEGIN WORK
LOAD ...
COMMIT WORK

But that would not help in your case (it is still a TX for PostgreSQL, which would cancel the whole TX)

I see no general solution, if you have a lot of LOAD instructions in such case.

If this an isolated case, maybe you can give a try with a base.Channel object, to load the data yourself and insert rows one by one.
You have then the control to use a BEGIN WORK / COMMIT WORK or not ...
Another advantage of using base.Channel is that you control the data types that will be used to insert the rows.
Remember that LOAD has to "guess" what column data types are, just from an INSERT INTO tabname statement....

Other question: Does your Informix table(s) have serial column(s)? That is another story...

It sounds like you plan to move from Informix SE to PostgreSQL ...
Is this a project, or do you already have your application running with PostgreSQL in production?
If you want, we can have a 1:1 talk with the WWDC tool.
Or if you prefer, contact me by email (sf@4js.com), so we can exchange more about your migration project.

Seb
Alessandro (Efisio) R.
Posts: 42


« Reply #6 on: November 19, 2020, 04:40:13 pm »

Thanks Sebastien for the long and extra detailed answer.
I also appreciate your avalaibilty for a talk and email contact, I will try to answer here before.

We already have an application in production with Informix and PostgreSQL compatibility.
Of course we have tables with serial columns, and after struggling we manage to get same behaviour with serials in all database functionality, including temp tables.
Is there any problem with serials and LOAD?
If I remind correctly, when using PostgreSQL SERIAL data type, which in reality is an INTEGER with DEFAULT clause pointing to the nextval of a dedicated sequence, an issue to fix could be the value of the sequence not updated at the end of the LOAD, according to the max value of the SERIAL column. Am I right?

However our issue with LOAD is not related to SERIALs, but with CURSORs that are not defined with the WITH HOLD clause being closed after the LOAD.

All the things you have pointed out in the answer are extremely useful and correct from a point of view of writing new code.

I do know that transactions are almost necessary in a lot of processing, but since we used to have this application running on an Informix SE without transactions, for now it would be nice to have the same behaviour.
What I mean is that we are not considering the failure of our LOAD instructions (usually they don't), but when a LOAD is executed successfully in PostgreSQL, a transaction ends (it ends anyway also if it fails) and so cursors that have been opened before will be closed and here is where we got errors.

So for this I was asking if it's easy and it doesn't create problems to other "odi" behaviours, and obvioulsy you consider useful for migrating only in the first "phase", to implement a kind of flag for the LOAD transaction.

Anyway our LOAD instructions do not touch master tables with the probably issues that could come like you've said from a partial loading, but in general LOAD instructions out of transaction have to be fixed sooner or later (by us).

Thanks in advance
Sebastien F.
Four Js
Posts: 465


« Reply #7 on: November 19, 2020, 05:19:12 pm »

Efisio,

Yes you are right, LOAD of table with serial will be like doing INSERTs by specifying the serial values explicitly.
As result, since PostgreSQL does not automatically reset the underlying sequence, the next INSERT without specifying the serial column will produce a new serial number that in lower than values inserted from the LOAD.
There are some tricks to fix this, see for ex:
https://stackoverflow.com/questions/244243/how-to-reset-postgres-primary-key-sequence-when-it-falls-out-of-sync/244265
So it's not only related to LOAD, it's also with simple INSERTs...
I am investigating if we can do something for this.
BTW when using one of our other serial emulations (trigseq), the issue does not exist (but you should not use that emulation now since you already use native PostgreSQL serial type)
Anyway as I said yesterday, the best portable alternative is to use your own SEQUENCES.
(you would also have to reset your sequences manually after the LOAD, but at least you have total control over this)

Regarding LOAD and transactions and opened cursors, I better understand now.
Yes, atfer a LOAD using transactions, opened cusors not declared WITH HOLD are closed automatically.
In fact Genero behaves like Informix c4gl, as long as the database has transactions (I have tested this see attachment load.4gl)
When the DB has no transaction support, cursors remain open...
But with ODI + PostgreSQL, we consider that the DB has transaction obviously, that's why you have this behavior.
No matter if the LOAD has errors or succeeds: When terminating a transaction, opened cursors are closed.
I will also investigate on this, without promising anything.

Seb

* load.4gl (0.39 KB - downloaded 119 times.)
Alessandro (Efisio) R.
Posts: 42


« Reply #8 on: November 19, 2020, 06:26:55 pm »

I put here a useful function I wrote to set the PostgreSQL SERIAL column underlying sequence to the right value, I used the PostgreSQL formalism to get the sequence name but I also could query the system catalog or information_schema to be sure which is the colmn linked sequence.

Code
  1. FUNCTION adjustSequence( p_column_name, p_table_name )
  2. DEFINE p_column_name STRING
  3. DEFINE p_table_name STRING
  4. DEFINE l_sequence_name STRING
  5. DEFINE l_query STRING
  6. DEFINE l_max BIGINT
  7.  
  8. LET l_sequence_name = SFMT( "%1_%2_seq", p_table_name CLIPPED, p_column_name CLIPPED )
  9. LET l_query = SFMT( "SELECT MAX( %1 ) FROM %2", p_column_name CLIPPED, p_table_name CLIPPED )
  10. PREPARE p_max_grp_seq FROM l_query
  11. DECLARE c_max_seq CURSOR FOR p_max_grp_seq
  12. LET l_max = 0
  13. FOREACH c_max_seq INTO l_max
  14. EXIT FOREACH
  15. END FOREACH
  16. IF l_max IS NULL OR
  17.   l_max == 0 THEN
  18. LET l_query = SFMT( "ALTER SEQUENCE %1 RESTART;", l_sequence_name )
  19. #DISPLAY l_query
  20. EXECUTE IMMEDIATE l_query
  21. ELSE
  22. IF l_max == 1 THEN
  23. LET l_query = SFMT( "ALTER SEQUENCE %1 RESTART;", l_sequence_name )
  24. #DISPLAY l_query
  25. EXECUTE IMMEDIATE l_query
  26. LET l_query = SFMT( "SELECT nextval('%1');", l_sequence_name )
  27. #DISPLAY l_query
  28. EXECUTE IMMEDIATE l_query
  29. ELSE
  30. LET l_query = SFMT( "SELECT setval('%1', %2, true);", l_sequence_name, l_max )
  31. #DISPLAY l_query
  32. EXECUTE IMMEDIATE l_query
  33. END IF
  34. END IF
  35. END FUNCTION
  36.  

Another thing that made me struggle a lot are temp tables created with the structure of existing tables which have SERIAL columns because it resulted in having the respective column of INTEGER data type and all the SERIAL magic was gone, obviously it's been fixed, but it's important to know, I didn't find anything related in the documentation.
I suppose that database adaptation guides were written by you and I must thank you for all the useful information and hints.

Talking about the LOAD instruction without tx being an enemy of non-WITH HOLD cursor my idea is only for a first migration step and a retrocompatibility, because for sure nowadays it's really unconvenient to not use transactions.

Thanks a lot!
Sebastien F.
Four Js
Posts: 465


« Reply #9 on: November 19, 2020, 06:41:41 pm »

Thanks for sharing that code Efisio!

About temp tables + serials, we have a warning in the doc... Can you check if it's what you refer to?
http://4js.com/online_documentation/fjs-fgl-manual-html/#fgl-topics/c_fgl_odiagpgs_023.html
Quote
With Informix SQL, if the source table has a column defined as SERIAL or BIGSERIAL, a SELECT ... INTO TEMP will produce a new temp table with an auto-incremented serial column. With the SELECT INTO TEMP emulation for non-Informix databases, not using the native sequence generators (such as IDENTITY columns in SQL Server), the resulting temporary table will get a simple INTEGER or BIGINT column, instead of an auto-incremented column.
To find this page, enter "postgresql temporary table" in the top right search field of the online doc, and check the results, it's the second item in the results...

I am sorry for all efforts you had to do to make serials work with PostgreSQL.
Maybe it would have been easier to use directly self-managed SEQUENCEs.
But of course in a migration process it's not easy.
Doing 100% compatible Informix emulations with what other DBs provide is not much easier!

Seb
Alessandro (Efisio) R.
Posts: 42


« Reply #10 on: November 20, 2020, 09:16:28 am »

Now I see the warning in the latest 3.20 documentation, but I started migrating when we were using 3.10, or even 3.00. :)

PostgreSQL, as far as I know, is the DBMS that most fits as a replacement for Informix for compatibility/emulation of specific features, in these kind of migrations, at least the versions where the oids are still special columns (so v. 12 and 13 exclused).

Thanks Seb
Alessandro (Efisio) R.
Posts: 42


« Reply #11 on: November 20, 2020, 12:08:00 pm »

Sorry for the double post, but I found 2 questions that i note down and didn't ask during your presentation.

Starting from page 11 of your presentation pdf you have deepened the concept of db users. What are your strategies and advices, specially in PostgreSQL environment? Do you recommend a db user per app user? I think one disadvantage of this is probably that dbas or devs have to add a db user every time a new app user is created from app administrators. Or have you managed to do this automatically?

Another thing I want to point out that I've not seen in the adaptation guide (for PostgreSQL at least) and I came across during migration is using a bdl variable type different from the db column type in the where clause of a static sql query for filtering results. I think an example could clarify.

Code
  1. SELECT * FROM my_table WHERE my_table.my_column = my_bdl_variable
where my_column is a decimal(11,0) bdl type, which in PostgreSQL maps to numeric(11,0)
and my_variable is a char(11) bdl type

This can happen with many different types, also with oid PostgreSQL type.
Of course this issue can be fixed by changing the bdl type of the variable but in PostgreSQL you can add a cast at database-level, since it's not easy to check type mismatches at project-level (now if I remember correctly there will be a feature to add a dollar sign before a variable name in static sql queries in the vnext 4 that I seen from Rene presentation).
Sebastien F.
Four Js
Posts: 465


« Reply #12 on: November 20, 2020, 12:30:52 pm »

Efisio,

1) DB Users:
Yes, if you want to benefit from DB features like auditing, permissions, etc, you need to map 1 real physical app user to 1 DB user.
This is a design choice for your application.
And yes, if you want to use this commendation, you have to create a DB user for each real user (so you need some admin application).
And you must figure out how to provide user/pswd when connecting to the DB from programs.
There are different solutions to authenticate users at DB connection, depending on the DB brand.
Even if you have a single DB user to connect to PostgreSQL, you must find a solution to pass the username and password in
  CONNECT TO "dbname" USER user-name USING pswd-string
Hopefully you don't use FGLPROFILE entries for user/pswd: These are provided for development only... (password is in clear)

2) Using program variables as SQL parameters
You should ALWAYS use a BDL program variable defined with a type that matches the SQL column.
This is important especially for DATE (where the string version depends in DBDATE).
BDL DECIMAL(P,S) for PostgreSQL DECIMAL/NUMERIC(P,S) is fine.
You could also use an INTEGER BDL for PostgreSQL DEC/NUM, that is an acceptable combination as long as the DB server supports the type conversion.
It becomes more tricky when using a char string type for a numeric or date/time type in the DB.
Then serialization/formatting rules take place and you make your code non-SQL-portable.
To implement regular app code using SQL (as in your example), I don't see an issue to use a BDL type corresponding to the column type.
If you have to write some generic code, use base.SqlHandle and set the SQL parameter value with a program variable having the matching type.
You should not add a cast in the SQL statement (this is not portable)
Just use the correct BDL type, with a DEFINE LIKE to match a .sch schema it's even better.

Seb
Sebastien F.
Four Js
Posts: 465


« Reply #13 on: November 20, 2020, 12:43:33 pm »

Efisio,

Just to clarify: I do not say that you are force to use 1 DB user for each physical user.
All I want to say is that the DB server needs to identify the physical user, if you want to use DB engine features like auditing/privileges.
And of course that brings more complexity in user management and authentication.

But if not needed, it's ok to have "abstract" DB users.
And you should have several DB user types:
1) A DB application administrator (with create table privileges etc)
2) A DB user with SELECT/UPDATE/INSERT/DELETE privileges (for regular users)
3) A DB user with minimal privileges (SELECT only, for guest users?)

Seb
Alessandro (Efisio) R.
Posts: 42


« Reply #14 on: November 20, 2020, 01:06:01 pm »

1) db users:
Ehm, in reality we are using fglprofile entries even in production, or the .pgpass, that anyway is a simple text file with password in clear... What could be a solution?
Thanks for the explanation and the 3 main roles suggestion.

2) type mismatches
I said that because of the migration from ifx to pgs.
Informix was not so precise like PostgreSQL is, so it auto casts many type rather than other DBMS.
Sure the variables types have to be the same of the columns', preferrable with the LIKE clause, but since Informix used to accept almost everything, I was talking about old code compatibility and emulation of this "feature", also considering the casts at database level.
It is really not easy to find this kind of types mismatch unless you read all the code of your project, usually it's a runtime error that bring to the attention this issue.
I think this can help the first step of a migration but not to writing portable code...
Pages: [1] 2
  Reply  |  Print  
 
Jump to:  

Powered by SMF 1.1.21 | SMF © 2015, Simple Machines