Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Unable to run a standard universe job.
- Date: Tue, 18 Jun 2019 18:15:46 +0000
- From: Michael Murphy <Michael.Murphy@xxxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Unable to run a standard universe job.
I got the standard universe job to run by dropping the submitter's
firewall. Vanilla jobs work just fine through our host firewalls thanks
to the shared port. Is there a way to constrain the ports required for
standard universe jobs for firewall transversal?
Thanks!
--
Michael McInerny Murphy
IERUS Technologies, Inc.
2904 Westcorp Blvd., Suite 210
Huntsville, AL 35805
(O): (256) 319-2026 ext 107
-----Original Message-----
From: Collin Mehring <collin.mehring@xxxxxxxxxxxxxx>
Reply-To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Unable to run a standard universe job.
Date: Fri, 14 Jun 2019 10:23:38 -0700
Hi Michael,
>From the analyze output it seems like that machine is rejecting your
job. I would either check the START expression on that machine directly
(1) or do a reverse analyze with condor_q (2) to find out why.
1: condor_config_val -name bane.hq.ierustech.com -v START
2: condor_q 183.0 --better-analyze -reverse
-machine bane.hq.ierustech.com
Best,
Collin
On Fri, Jun 14, 2019 at 6:43 AM Michael Murphy <
Michael.Murphy@xxxxxxxxxxxxx> wrote:
> Greetings,
>
> I am trying to run a standard job in our condor pool. However, I
> cannot get a test job to execute. The matchmaker is not finding a
> match even though my requirement only specifies a hostname. I have
> never run a standard job in our pool before. I am not sure it's
> configured properly. Here's my submit script:
>
> universe = standard
> executable = ./Cicero_CC_12750
> should_transfer_files = YES
> Requirements = machine == "bane.hq.ierustech.com"
> when_to_transfer_output = ON_EXIT_OR_EVICT
> log = $(Cluster).log
>
> input = test_run.inp
> output = test_run.out
> error = test_run.err
> transfer_input_files = test_run.inp
> queue
>
> The executable is compiled FORTRAN code relinked with
> condor_compile.
>
> When I check the status and try to determine why it's not matched to
> the execute host I use 'condor_q -analyze -better <JOB ID>' with the
> following output:
>
> [michael.murphy@banzai Condor_checkpoint_test]$ condor_q -better
> -analyze 183.0
> -- Schedd: banzai.hq.ierustech.com : <192.168.6.67:9618?...
> The Requirements expression for job 183.000 is
>
> ( machine == "bane.hq.ierustech.com" ) && ( TARGET.Arch ==
> "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( ( CkptArch ==
> TARGET.Arch ) || ( CkptArch is undefined ) ) && ( ( CkptOpSys ==
> TARGET.OpSys ) ||
> ( CkptOpSys is undefined ) ) && ( TARGET.Disk >= RequestDisk )
> && ( TARGET.Memory >= RequestMemory )
>
> Job 183.000 defines the following attributes:
>
> DiskUsage = 3750
> ImageSize = 3500
> RequestDisk = DiskUsage
> RequestMemory = ifthenelse(MemoryUsage =!=
> undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)
>
> The Requirements expression for job 183.000 reduces to these
> conditions:
>
> Slots
> Step Matched Condition
> ----- -------- ---------
> [0] 2 machine == "bane.hq.ierustech.com"
> [6] 560 CkptArch is undefined
> [10] 560 CkptOpSys is undefined
>
> No successful match recorded.
> Last failed match: Fri Jun 14 08:24:48 2019
>
> Reason for last match failure: no match found
>
> 183.000: Run analysis summary ignoring user priority. Of 560
> machines,
> 544 are rejected by your job's requirements
> 2 reject your job because of their own requirements
> 14 are exhausted partitionable slots
> 0 match and are already running your jobs
> 0 match but are serving other users
> 0 are available to run your job
>
> WARNING: Be advised:
> Job did not match any machines's constraints
> To see why, pick a machine that you think should match and add
> -reverse -machine <name>
> to your query.
>
> The submitting machine's name is "banzai.hq.ierustech.com" and the
> execution machine is called "bane.hq.ierustech.com".
>
> Have I forgotten to specifiy some macros to enable std universe jobs?
> Thanks for your time.
>
> --
> Michael McInerny Murphy
> IERUS Technologies, Inc.
> 2904 Westcorp Blvd., Suite 210
> Huntsville, AL 35805
> (O): (256) 319-2026 ext 107
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/