When do child processes of start menu die after the main app dies?

Candy M. · October 18, 2017, 05:18:57 PM

We have a situation that the child processes started from the start menu of the main application launcher are not dying when the main application dies.
So you will see them in a ps -ef with parent process of 1. In theory, should the child processes die after the main application dies? Or should all processes die associated with GDC (ssh) when GDC is terminated or disconnection occurs?

The particular users are coming in on a VPN with probably automatic port forwarding enabled. They are working in one of these child processes and get stuck, so
they kill their VPN, GDC or maybe shut off their computer, we're not exactly sure. Well, the parent process of the main application dies, but the child process does not.
In the particular case we saw this morning, the child application was hung still running, but the application launcher was gone. (These processes were days old).
Also, it appears it was stuck in the child applicatlon, at a channel.openPipe command trying to send an email (it was running a command email client postiessl on linux, ie (CALL ch_smtppost.openPipe(cmd_smtppost, "u")). The email client was hung probably communicating with smtp because user email configuration was wrong.
So, I conjecture because it was hung running a linux command, the child 42r did not die.

So what are the rules for the child processes dying when the GDC disconnects from the backend? Is the only way to kill these is to have a cron running that searches for them? The problem that arises is that these child processes not connected with a GDC are consuming a user license and the customer calls us complaining that they have User limit exceeded and cannot log on.

Any insight on this is appreciated.
Candy

Reuben B. · October 18, 2017, 11:18:20 PM

Candy,

You are confusing two concepts. To understand, note the applications that are running on the client and server machines. With GDC you will see a gdc.exe running on your PC. On your application (I'm guessing Linux) server you will see "fglrun" processes.

For parent/child processes, as your have observed if the child process has been launched from StartMenu or via RUN WITHOUT WAITING, the parent process id = 1, and if the parent fglrun processes dies, the child fglrun process will continue.

The other concept is the fglrun process(es) on your server, your gdc.exe on the PC, and the question is effectively what happens if one of them dies. For the GDC dying case, you need to be aware of two things.

There is the FGLPROFILE setting gui.protocol.pingTimeout http://4js.com/online_documentation/fjs-fgl-manual-html/#c_fgl_feconn_ping_wait.html. The front-end process sends a ping every X seconds to the fglrun process. This FGLPROFILE setting says if I haven't received a ping in Y seconds then assume the front-end process has died and stop the fglrun process. So when the GDC dies, you should expect the fglrun process to remain running, consuming a license for a number of minutes depending upon the value of this timeout. Web Users are very familiar with this setting because it is a lot easier for a user to close a browser tab and historically they have bumped into this issue a lot.

The second thing you need to be aware of is OPTIONS ON CLOSE APPLICATION http://4js.com/online_documentation/fjs-fgl-manual-html/#c_fgl_programs_010.html. If the front-end dies in an orderly fashion, then it is possible to send a signal to the fglrun process and have it end in a tidy fashion without any UI. You may have seen something during the 3.10 EAP where I pointed out that this functionality know works with the Genero Browser Client.

My advise is to leave the timeout settings on their default values, to have OPTIONS ON CLOSE APPLICATION in your code to ensure a clean tidy up where possible, and also investigate a CPU based license if you find that you hit User Limit Exceeded because you have fglrun processes consuming a license when they are waiting to time-out. Also a good sysadmin practise is to identify day old fglrun processes and query wether they should still be alive.

I am curious about your program that was hung at openPipe. Did it have any UI? The pingTimeout would not come info play if there was no UI. If there was UI, is the openPipe blocking the timeout from occurring.

Reuben

Candy M. · October 19, 2017, 12:21:53 AM

Hello Reuben,
I apologize that I wasn't clearer in my explanation.
For applications (child processes) being launched from StartMenu, if the ping timeout is reached they should die, is that correct (I believe it is 10 minutes)?
In our case, it is a network failure so the OPTIONS ON CLOSE APPLICATION would not be applicable. We always leave the timeout settings to their default value.
A CPU license would probably cost too much for our customers and so is not applicable as well. Our goal is to kill these processes so for 1) they are not using a license, 2) they are not consuming any database resources.
In the case where the child process launched is from the StartMenu application, the ping timeout obviously is having no effect as they are still there. There is a UI. This example is a Quote being emailed to a customer in the application and it hangs on the openPipe command. Perhaps the timeout is blocking the timeout from occurring and that is the problem. So the question is, is that a bug or is that just what happens and the process stays there even if there is a UI.
I'm curious as to what criteria you use as to whether a day old fglrun should still be alive? How can you determine that it is not still attached to a client? All production servers are Linux.

Thanks for the discussion and making my explanation clearer.

Candy

Reuben B. · October 19, 2017, 01:06:37 AM

QuoteI'm curious as to what criteria you use as to whether a day old fglrun should still be alive?

From my sysadmin days a number of years ago ...
if you have written into a table when program starts, the pid and the name of the person running the program, then it is simply a case of ringing up the person and ask "did you leave your computer on last night?"

Quote
Perhaps the timeout is blocking the timeout from occurring and that is the problem.
Quote
and thats where you need to create a small example that illustrates and send to support

Candy M. · October 19, 2017, 03:32:15 AM

QuoteFrom my sysadmin days a number of years ago ...
if you have written into a table when program starts, the pid and the name of the person running the program, then it is simply a case of ringing up the person and ask "did you leave your computer on last night?"

That's a little bit too human intensive. If we have 40+ production servers out there, some one would be on the phone all day, lol! I was hoping your answer would be a little more technical. We have actually had a cron that would kill the fglrun's, but it was killing some that it shouldn't have so we turned it off for now.

Yes, I should send a small sample, it will be sort of a pain in this case since the smtp was hanging the process, so I'll have to think about how to accomplish it,
plus also support probably doesn't have postiessl on their server.

My observation is that if every fglrun with UI should die when the pingTimeout is reached, then there should not be any stray fglruns out there.
Candy

Candy M. · October 19, 2017, 04:57:26 AM

QuoteYes, I should send a small sample, it will be sort of a pain in this case since the smtp was hanging the process, so I'll have to think about how to accomplish it,
plus also support probably doesn't have postiessl on their server.

No worries, John H. in support reproduced it and is submitting it as a bug. :-)

When do child processes of start menu die after the main app dies?

Candy M.

Reuben B.

Candy M.

Reuben B.

Candy M.

Candy M.