[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor as a Unix Shell? (instead of BASH / SH / TCSH)?



Hi All,

I was wondering if there is anywhere on the net tool that can provide a condor shell.
I will give you an example of what I an meaning when I'm saying a "Condor Shell":

Think about a Unix machine, a Linux style machine. Say my shell knows that every binary or command the user is running (USER MODE), that is located under the / (/etc, /bin, /sbin, /usr/bin etc...) is running on the local machine. Since programs like 'ls' not consume a lot of CPU resources, they can keep running on the local machine. 
Now say that I have installed all of my tools under the /mnt/Software/vendors/ directory, so I have:
    /mnt/Software/vendors/adobe
    /mnt/Software/vendors/autodesk 
    /mnt/Software/vendors/matlab
    /mnt/Software/vendors/starsim
    /mnt/Software/vendors/vera
    /mnt/Software/vendors/....
    /mnt/Software/vendors/....
    /mnt/Software/vendors/....
  
Now, the shell knows that each time I run some of the commands that include in my $PATH and located under the /mnt/Software/vendors/ it convert it to run as a Condor Job.

This is assuming that all the execute machines are identical in there configuration.
In my configuration I also use a Local NAS, used by the submit & execute machines (like /home, /mnt/Software ... etc..)
All machine in the condor pool are configured under the same UID_DOMAIN and the same FILESYSTEM_DOMAIN.

Now submitting ajob to the pool works great!
In fact if you configure the job submit file according to some parameters, once submitted all files, logs, directories, or any other data the job have created is being saved in the directory where the user submit the job. The condor log file, stdio, stderr are also being created, and being stream using the stream option util the job is finished/killed etc...). 

When the user start and tail (with -f  option on) the stdio file, he has the impression that the job is running locally, but in fact it is running somewhere in some slot on the condor pool.

Now, assuming my Q is quick, and good manage. So no big timeout can happened, and jobs almost start the time the user has submitted them.

The trick is not that instead the user write the submit job file, the shell created them for him, and send it to the q, tail the output until it finished. 

The shell can have different paramters like include/exclude specific path or command, something like: INCLUDE_PATH, EXCLUDE_PATH, INCLUDE_COMMAND, EXCLUDE_COMMAND, and maybe other directives we discover that are needed. But the basic idea is that the shell aware of the command.

Of course this method is limited, since it can only be aware of the first command in the command chain. What I'm meaning by the command chain is that as long as you running command under the condor shell, every command that is included in the include_path, 
will be automatically baked as a condor job. but if you run a script, a perl one for example, that use the system command {system'"'}, then the condor shell can not be aware of this, since it is outside it's domain. This means from a condor perspective that the binary will run locally and not as a condor job.

Since all of my users are working on Windows machines, I have no justification to give a user a higher rank for his machine. They are all shared 10 physical machine, 24 cores each, total of 240 slots, and 960GB of ram, using NX remote desktop protocol.

Now, Please also be aware, the all of the command the users are using, like matlabc, maya-render in my example are well known as a high consuming CPU and time. both by the users are the administrator.

One last note, this ca be done for specific command using .aliases, but I was thinking on a more general solution.

What do you think?

Sassy