Tue Oct 30, 2012 1:42 pm

While CmdProcess being discussed in another topic, I have a few questions as well.

1: it is described in the Library reference as STD.System.Util.CmdProcess(), but what is in the system is lib_fileservices.FileServices.CmdProcess() – I assume it must be the same, correct?

2: I tried an example from the reference :
OUTPUT(lib_fileservices.FileServices.CmdProcess(‘echo’,’George Jetson’));
and it just stays in the execution state, not generating any output.

3: Anyway, where the target command suppose to be executed – on each node in parallel or in one place? If this is a single place where is it? And same for PIPE.

Re: CmdProcess and PIPE

Wed Oct 31, 2012 1:42 pm

1. Correct. They are indeed the same, legacy version versus open source version. Depends on what cluster version you are connected to.

2. Yes CMDProcess was reported not working in Version 3.8 but is fixed in the next update.

3. I think it depends on the command that you are executing. If it affects the cluster than it will be across all nodes as expected.

Re: CmdProcess and PIPE

Wed Nov 07, 2012 10:58 am

Is there same problem with the PIPE (action and/or OUTPUT option)?

3. I think it depends on the command that you are executing. If it affects the cluster than it will be across all nodes as expected.

Hmm. It’s hard to believe that ECL will know anything about UNIX command line. I.e. my understanding it should just blindly execute a process and pass arguments to it.

Re: CmdProcess and PIPE

Wed Nov 07, 2012 12:44 pm

PIPE works, at least in 3.8.4 and 3.8.6. I’m using it.

(Clarification: I’m using the PIPE version of OUTPUT; what follows talks about that, rather than the PIPE built-in function.)

Unless you go to great lengths, the command would be executed on all active nodes. Note that that doesn’t mean all nodes, though. For instance, if you manipulated the data in such a way that all the records you’re processing wind up on one node, then only that node’s external command would be executed. In most cases, this is exactly what you want.

Where this becomes a problem is when you want to execute commands based on the overall action instead of individual records (e.g. “disable one external service before processing a group of records, then re-enable it afterwards”). You would have every node disabling and re-enabling the service in that case, so care must be taken.

Re: CmdProcess and PIPE

Wed Nov 07, 2012 2:18 pm

Thanks, now it’s getting even more interesting:

First, I’ve tried input pipe example from the book (BTW, It’s really only example I can try because other using some non-standard commands for the pipe process):

Code: Select all //Form 2 with XML input:
namesRecord := RECORD
STRING10 Firstname;
STRING10 Lastname;
p := PIPE(‘echo George Jetson ‘, namesRecord, XML);

It did output only one record. We have 50 logical nodes on 5 physical machines, so I assume this way it randomly decided which one of the nodes to use to execute this command.
Not clear, however, how it will work if the command will generate number of records – how they will move across the nodes?

But more interesting things happen after I decided to add output pipe:

Code: Select all namesRecord := RECORD
STRING10 Firstname;
STRING10 Lastname;
p := PIPE(‘echo George Jetson ‘, namesRecord, XML);
OUTPUT(p,,PIPE(‘tee /tmp/names.all’));

Now simple output printed me 50 exactly the same rows!
So this means that OUTPUT which clearly should happen after input (I’m using the result dataset in it) is affecting the way input PIPE being executed!

Regarding the output PIPE, I’ve got 5 files 1 line each, i.e. 5 records total – files got overwritten since I have 10 logical nodes per one physical.

Re: CmdProcess and PIPE

Wed Nov 07, 2012 3:27 pm

This actually makes sense.

Your initial PIPE command is, in reality, creating the dataset with the echo command. It is executing on every node, so p winds up holding a single record actually located on each of your 50 nodes. The first result you see is actually the result of your first OUTPUT(), where those 50 records are created. Your second OUTPUT, with the PIPE, write the single record contained on that node to /tmp/names.all located on that node.

Edit: My earlier comments about using PIPE were explicitly about the PIPE option to OUTPUT. The PIPE built-in function is an input pipe, for getting data into the system. The PIPE() function runs on all nodes all the time, as far as I can tell.

Re: CmdProcess and PIPE

Wed Nov 07, 2012 3:37 pm

Re: CmdProcess and PIPE

Wed Nov 07, 2012 3:45 pm

I think you’re right, because the following two pieces of code produce different results on Thor:

Code: Select all namesRecord := RECORD
STRING10 Firstname;
STRING10 Lastname;
p := PIPE(‘echo George Jetson ‘, namesRecord, XML);

Code: Select all namesRecord := RECORD
STRING10 Firstname;
STRING10 Lastname;
p := PIPE(‘echo George Jetson ‘, namesRecord, XML);

The second one, with the SEQUENTIAL, produces a recordset with the number of records equal to the number of nodes. If it’s not a bug then something should be clarified, I think.

Re: CmdProcess and PIPE

Wed Nov 07, 2012 8:06 pm

Re: CmdProcess and PIPE

Wed Nov 07, 2012 8:06 pm

