Poor performance with large jobs

Richard T. · October 02, 2014, 09:23:18 PM

Hello, we are seeing an issue running our 4GL code against a SQL Server database whenever there is a large set of records to process. If we split the same job into 4 separate processes, the results are hugely improved.

For example if we process 100,000 records in one large process, it will take 65 minutes to complete. If we take the same program (and same data) but run 4 separate processes of 25,000 (simultaneously), all 4 processes complete in less than 9 minutes.

From monitoring the system during each test, the only difference we can see is that the 4 individual tasks seem to use more of the cpu power available to the server.

We are trying to determine why 1 large job is so much slower than 4 smaller jobs. Is there something we can do to improve the large job?

thx for the info

Reuben B. · October 02, 2014, 11:13:36 PM

Are you aware of the profiler https://4js.com/online_documentation/fjs-fgl-manual-html/#c_fgl_profiler_001.html, this allows you to see which functions a running program is spending its time in.

Have you checked the database query plans?

What architecture do you have, is the application and database on the same or separate servers?, how many CPU's for each server? etc

Sebastien F. · October 03, 2014, 11:14:33 AM

Hello Richard,

This is an interresting case...

I guess that the db server is running on a multi-processor machine and thus can handle several client program queries in parallel. And if the client programs run also on a multi-proc machine the processor resources will also be used in parallel for client programs.

When running the 4 processes, there is certainly a bit overhead on the db server for concurrent access to the same table, but that's the actual purpose of a db server, so SQL Server probably deals very well with this.

However, we wonder that it takes ONLY 9 mins to complete with the 4 processes: Logically it should take about 16 mins (65 mins / 4)

As Reuben suggested, to understand what happens under the cover, we need ALL details about your configuration and the conditions of your tests. I suppose for example that you ran the tests on an isolated dev machine and not in a production environment with other concurrent user accessing the same db server, that you ran the tests several times, etc.

I guess you are using the db driver based on SQL Native Client. Correct?
Understand that this database driver is based on ODBC. So a Genero BDL fglrun process connecting to SQL Server is just like any other ODBC client program connecting to SQL Server. This information can be important for SQL Server administration.

I suggest that you have a look at the SQL programming tips here:
https://4js.com/online_documentation/fjs-fgl-manual-html/#c_fgl_sql_programming_049.html
Do you use a transaction block to insert the rows? (BEGIN WORK / COMMIT WORK)?
I you do not, each INSERT will procude a commit, and will generate additionally processing on the server side.
I don't think this would have a big impact with SQL Server, because it's natively "auto-commit" mode, it's rather an issue with db servers like Oracle. Still it's something to try if you don't use tx block yet.

Once we have more details on your config we'll do similar tests in house.

Seb

Sebastien F. · October 03, 2014, 04:21:37 PM

It would also help to know what kind of SQL queries are done in your program(s)....
Ideally, provide a sample program similar to your real program, to reproduce the issue.
Seb

Richard T. · October 06, 2014, 09:15:29 PM

Hi All, Thanks for the quick replies !

I was not aware of the profiler feature so I will definately check this out, thanks ! What I have already tried a few weeks ago was to hand write some debug statements within our program to show me some metrics about individual functions (although this profiler appears to do a much better job of reporting detailed information). From the debugging, I can see that each function seems to take longer (when there is a large job) and there is no single function or statement that stands out as the culprit. I will re-test using the profiler utility to see if it can shed any light on something we may have overlooked. I will also review the SQL Programming Tips document to ensure we have considered each of the recommendations described there.

The database query plans seem consistent but I will revisit that output and compare the output from the large job versus the output from the smaller jobs.

On the server where we are seeing this issue, the application and the database exist on the same machine. It is a Windows 2012 operating system accessing a SQL Server 2012 database engine. There are 32 gigs of ram on this server and 4 cpu cores. We are using the latest native ODBC driver for SQL Server.

This server is not a production machine so I have been able to run these tests while other users are not accessing the system for anything.

the main cursor in the program selects a large number of database columns from two database tables. As the logic loops through these records there are a number of other select statements executed based on the main record it is currently processing. when certain data conditions are met, the program will execute serveral insert statements into other tables. Every time the program loops on the main cursor, it will commit its transaction and then start a new one.

So for example:

1. main cursor:

select
table1.field1, table1.field2, table1.field3, table1.field4, table1.field5, table1.field6, table1.field7,
table1.field8, table1.field9, table1.field10, table1.field11, table1.field12, table2.fieldA, table2.fieldB,
table2.fieldC, table2.fieldD, table2.fieldE, table2.fieldF, table2.fieldG, table2.fieldH, table2.fieldI,
table2.fieldJ, table2.fieldK, table2.fieldL, table2.fieldM, table2.fieldN, table2.fieldO, table2.fieldP,
table2.fieldQ, table2.fieldR, table2.fieldS, table2.fieldT, table2.fieldU, table2.fieldV, table2.fieldW,
table2.fieldX, table2.fieldY, table2.fieldZ, table2.fieldAA, table2.fieldAB
from
table1, table2
where table1.field1 = table2.fieldA
and table1.field2 = table2.fieldB

(there is an index on both tables table1(field1,field2) table2(fieldA, fieldB) )

2. then within the loop (depending on the data), there could be 10 or more other queries made to gather detail data from other smaller tables. All these queries appear to be using an index effectively.

3. Then depending on data conditions, the program may then potentially insert records into 5-10 other tables.

Thanks again for the suggestions, we will review the documentation and experiment with the profiler to see if we can uncover the underlying issue.

Richard

Sebastien F. · October 07, 2014, 11:59:46 AM

Hello Richard,
Thanks for the details, we will do some tests too, but I believe that it's normal to get better performances, when running several batch programs in parallel, since you have a 4 CPU machine...
Maybe you could try to run 2 programs instead of 4, to see if you get better performances: Maybe the OS would then dispatch as follows: 2 procs for the programs, and 2 procs for the SQL Server engine?
This configuration parameter can also matter:
dbi.database.dsname.snc.prefetch.rows = N
https://4js.com/online_documentation/fjs-fgl-manual-html/?path=fjs-fgl-manual#c_fgl_dbvendor_param_snc.html
Seb

Sebastien F. · October 07, 2014, 04:34:47 PM

Richard,

I did some tests on my Windows 7 64b 4 CPU core with SQL Server 2012, with Genero BDL 2.50.16.

I wrote a little test program, using a table with 10,000 rows, with a integer primary key and a varchar(200).
The program basically does a similar task as yours, (fetch large rows and insert into another table):

Code (genero) Select

main
    define tm datetime hour to fraction(3)
    define x,i int, v varchar(200),
           rec record
               pk int,
               v1 varchar(200),
               v2 varchar(200),
               v3 varchar(200),
               v4 varchar(200),
               v5 varchar(200),
               v6 varchar(200),
               v7 varchar(200),
               v8 varchar(200),
               v9 varchar(200),
               v10 varchar(200)
           end record

    connect to "msvtest1+driver='dbmsnc'" user "msvuser" using "fourjs"

    set lock mode to wait 5

{
whenever error continue
    drop table t1
    drop table t2
whenever error stop
    create table t1 ( pk int primary key, name varchar(200), addr varchar(200) )

    for i=1 to 10000
        let v = "aaaaaaaaaaaaaaaaaaaaaaaa"||i
        insert into t1 values (i, v, v)
    end for
    create table t2 ( pk int, name varchar(200) )
exit program
}

declare c1 cursor for select pk, name, name, name, name, name, name, name, name, name, name from t1 order by pk
    for x=1 to arg_val(1)
      foreach c1 into rec.*
        insert into t2 values (rec.pk, rec.v1)
      end foreach
    end for

end main

I watched the CPU usage in the Resource Monitor.

When I run the program alone (fglrun prog 4), fetching 4 x 10,000 rows and inserting 40,000 new rows in the second table, it takes about 23 secs.
The 4 CPUs are used equally.

When I run 4 individual programs (4 x fglrun prog 1), fetching 10,000 rows and inserting 10,000 new rows each, every single program takes about 8 secs.
I can of course execute the 4 programs in parallel, and the time to process all the 40,000 rows is then about 8 secs.
The 4 CPUs are used equaly.

So as expected, it takes more time to run a unique program processing (sequentially) the same number of rows as 4 individual programs running in parallel.

If it's not to difficult to define the SQL filters to split rows to process over several programs, maybe this is a good option for you.

But if you want to use a unique program to process all rows, we may continue to investigate to reduce the execution time...

I suggest that you set FGLSQLDEBUG and redirect (stderr) the SQL debug log to a file, in fact you can see the execution time of SQL statements:

SQL: INSERT INTO t1 VALUES (?,?,?)
| 4gl source : 4procs.4gl line=31
| esm.c:00328(3) : Nat stmt1 = insert into t1 VALUES (?,?,?)
| sqlcode : 0
| curr driver : ident='dbmesm'
| curr connection : ident='esm_msvtest1_ida_utf8+driver='dbmesm'' (dbspec=[esm_msvtest1_ida_utf8+driver='dbmesm'])
| using: 3
| t: INTEGER f:01 v:'3245'
| t: VARCHAR(200) f:01 v:'aaaaaaaaaaaaaaaaaaaaaaaa3245'
| t: VARCHAR(200) f:01 v:'aaaaaaaaaaaaaaaaaaaaaaaa3245'
| Execution time : 0 00:00:00.00151

(last line)

Maybe some SQL statements take much more time as expected (for ex, 2-3 seconds for an INSERT is not normal), because SQL Server takes some time for perform a given statement, to write TX logs or so?
Run both solutions (the single program and the 4 programs), and analyse the FGLSQLDEBUG output (grep for "Execution time")

Seb

Laurent G. · October 07, 2014, 04:45:27 PM

Richard,

I'm not an expert in Sqlserver configuration/tuning but I can relay some information I gathered from one of our ISV that did the exact same migration as you (Unix/Informix to Windows/Sqlserver). Those suggestions are general and might not apply to your specific batch sqls but I thought I would document just in case and for the community:

Look at having 2GB minimum to 4GB optimal for each CPU core for SQL Server => So you are more than covered based on those guidelines with 4 core / 32GB
Disk subsystem will also have a major impact on performance. The more spindles the better. Have 3 separate LUN's for the database server (Data, Logs, and TempDB). Significant performance improvements (up to 80%) have been seen putting the TempDB (and buffer pool cache in SQL Server 2014) on SSD drives (ie. PCIe SSD cards)

HTH

Laurent

Poor performance with large jobs

Richard T.

Reuben B.

Sebastien F.

Sebastien F.

Richard T.

Sebastien F.

Sebastien F.

Laurent G.