Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] condor - guru needed!!!! (testing app)
- Date: Mon, 31 Jul 2006 12:15:35 -0700
- From: "bruce" <bedouglas@xxxxxxxxxxxxx>
- Subject: [Condor-users] condor - guru needed!!!! (testing app)
further update!!
it appears that i've actually gotten my test up/running and it appears that
i can confirm that the test perl apps are running on both the master/client
node within the test Condor setup that i have.
however, when i look at the "Condor_q" output, it only shows that two
processes are running at a time, which i imagine equates to a process
running on each machine (the master and client).
i'd like to have multiple instances running in parallel on both machines..
any idea/pointers as to how to make this happen??
i should easily be able to have 10-20 of these test apps running in parallel
on each of my test machines...
thanks
-bruce
update...
hi. this is in continuation to my getting a two node Condor up an testing.
i performed a:
Condor_submit stest.sub
i then did:
Condor_q
where i see the queued up test pl scripts. however, i see:
-- Submitter: laptop2.mesa.com : <192.168.1.33:56278> : laptop2.mesa.com
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
9.0 test 7/31 08:47 0+00:00:00 I 0 9.8 stest.pl
9.1 test 7/31 08:47 0+00:00:00 I 0 9.8 stest.pl
9.2 test 7/31 08:47 0+00:00:00 I 0 9.8 stest.pl
9.3 test 7/31 08:47 0+00:00:00 I 0 9.8 stest.pl
9.4 test 7/31 08:47 0+00:00:00 I 0 9.8 stest.pl
9.5 test 7/31 08:47 0+00:00:00 I 0 9.8 stest.pl
.
.
.
which seems to imply that something's wrong in my config files...
i'm pretty sure that whatever is wrong is rather simple/subtle!
is there a Condor guru that I can talk to for a few minutes on this..
my basic needs are to:
1) allow any user to submit a job
2) allow each job to run as fast as possible on the network/machine
3) allow multiple jobs to run on a given machine at the same time
4) track which jobs/apps run on which machine
i want to get/submit a job/app and throw it on the network to run as fast as
possible, which means i want to run multiple apps on the same machine at the
same time... Condor should be great for this, if i could get my hands around
how to properly configure it!
thanks
-bruce
hi...
this is further continuance of my testing with condor.
i've been able to get a sample app running with a 2 node system. i can do
'condor_submit' from both the master/child node and i see both machines.
the condor_config file for both machines is pretty much the sample file,
with limited changes. using the sample, my test apps appear to have a
wait/delay of 5 mins. my goal is to be able to run as many apps as fast as i
possibly can, on the machines in the network.. i'd also like to be able to
see what machines the app(s) are actually running on...
i tried to run the test function listed in the 'condor_config' file, using:
## Replace UWCS_* with TESTINGMODE_* if you wish to do testing mode.
i also used the following:
StartIdleTime = 2 * $(MINUTE)
ContinueIdleTime = $(MINUTE)
MaxSuspendTime = 1 * $(MINUTE)
MaxVacateTime = 1 * $(MINUTE)
in an attempt to try to run as fast as possible during the tests.
my test doesn't run, instead, the StartLog indicates that I have some kind
of an error. a sample of the StartLog contents is listed below. as i
indicated, the test submit app i'm running has run successfully with the
initial condor_config file, prior to my changes...
any thoughts/suggestions/help would be appreciated!!
thanks
-bruce
sample StartLog contents...
7/30 23:51:40 match_info called
7/30 23:51:40 Received match <192.168.1.33:42714>#1154324088#25
7/30 23:51:40 State change: match notification protocol successful
7/30 23:51:40 Changing state: Unclaimed -> Matched
7/30 23:51:41 DaemonCore: PERMISSION DENIED to unknown user from host
<192.168.1.55:33433> for command 442 (REQUEST_CLAIM)
7/30 23:51:41 DaemonCore: PERMISSION DENIED to unknown user from host
<192.168.1.55:33062> for command 443 (RELEASE_CLAIM)
7/30 23:53:40 State change: match timed out
7/30 23:53:40 Changing state: Matched -> Owner
7/30 23:53:40 State change: IS_OWNER is false
7/30 23:53:40 Changing state: Owner -> Unclaimed
7/30 23:56:41 DaemonCore: Command received via UDP from host
<192.168.1.55:33073>
7/30 23:56:41 DaemonCore: received command 440 (MATCH_INFO), calling handler
(command_match_info)
7/30 23:56:41 match_info called
7/30 23:56:41 Received match <192.168.1.33:42714>#1154324088#27
7/30 23:56:41 State change: match notification protocol successful
7/30 23:56:41 Changing state: Unclaimed -> Matched
7/30 23:56:41 DaemonCore: PERMISSION DENIED to unknown user from host
<192.168.1.55:33458> for command 442 (REQUEST_CLAIM)
7/30 23:56:41 DaemonCore: PERMISSION DENIED to unknown user from host
<192.168.1.55:33073> for command 443 (RELEASE_CLAIM)
7/30 23:58:41 State change: match timed out
7/30 23:58:41 Changing state: Matched -> Owner
7/30 23:58:41 State change: IS_OWNER is false
7/30 23:58:41 Changing state: Owner -> Unclaimed