[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job has not yet been considered by the matchmaker



Thanks Ido.

I did not see anything unusual in the logs files. 
Stopping and restarting all the condor processes did not solve the issue.  In the end, I rebooted the server, it all started to work. 
I don't understand why.

Regards
Vip

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of htcondor-users-request@xxxxxxxxxxx
Sent: 27 July 2020 20:59
To: htcondor-users@xxxxxxxxxxx
Subject: HTCondor-users Digest, Vol 80, Issue 59

Send HTCondor-users mailing list submissions to
	htcondor-users@xxxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
	https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
or, via email, send a message with subject or body 'help' to
	htcondor-users-request@xxxxxxxxxxx

You can reach the person managing the list at
	htcondor-users-owner@xxxxxxxxxxx

When replying, please edit your Subject line so it is more specific than "Re: Contents of HTCondor-users digest..."


Today's Topics:

   1. Re: "Job has not yet been considered by the matchmaker"
      (idoshamay@xxxxxxxxx)
   2. Re: Conda env with workstation pool (Bockelman, Brian)


----------------------------------------------------------------------

Message: 1
Date: Mon, 27 Jul 2020 22:35:58 +0300
From: idoshamay@xxxxxxxxx
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] "Job has not yet been considered by the
	matchmaker"
Message-ID:
	<CAOXPkJ+zz0h5Ot83SswmP=i0nLsDZC-M-n_fNMNqu-s3449d+Q@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

Hi,

Not an expert, but I did see negotiation issues from time to time, and it sounds like one.
What does your NegotiatorLog say?
It will be located in LOG (condor_config_val LOG) - usually /var/log/condor/NegotiatorLog Try Going over a single negotiation cycle (starts with ---------- Started Negotiation Cycle ----------) Usually, you can found the issue in a reasonable effort (if not, try adding D_FULLDEBUG to NEGOTIATOR_DEBUG variable).

Ido

On Mon, Jul 27, 2020 at 4:48 PM Vipul Davda <vipul.davda@xxxxxxxxxxxxxxxx>
wrote:

> Hello,
>
> Over the weekend, I upgraded condor manager from 8.6.x to 8.8.9. The 
> worker nodes are on version 8.8.8.
>
> After the upgrade, I cannot get any of jobs running.
>
> # condor_q -better 1329761
>
>
> The Requirements expression for job 1329761.000 is
>
>     (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && 
> (TARGET.Disk
> >= RequestDisk) && (TARGET.Memory >= RequestMemory) &&
>     ((TARGET.FileSystemDomain == MY.FileSystemDomain) ||
> (TARGET.HasFileTransfer))
>
> Job 1329761.000 defines the following attributes:
>
>     DiskUsage = 1
>     FileSystemDomain = "int10.physics.ox.ac.uk"
>     RequestDisk = DiskUsage
>     RequestMemory = 3000
>
> The Requirements expression for job 1329761.000 reduces to these
> conditions:
>
>          Slots
> Step    Matched  Condition
> -----  --------  ---------
> [0]          67  TARGET.Arch == "X86_64"
> [1]          67  TARGET.OpSys == "LINUX"
> [3]          67  TARGET.Disk >= RequestDisk
> [5]          67  TARGET.Memory >= RequestMemory
> [8]          67  TARGET.HasFileTransfer
>
>
> 1329761.000:  Job has not yet been considered by the matchmaker.
>
>
> 1329761.000:  Run analysis summary ignoring user priority.  Of 25 machines,
>       0 are rejected by your job's requirements
>       1 reject your job because of their own requirements
>       0 match and are already running your jobs
>       0 match but are serving other users
>      24 are able to run your job
>
> I skimmed trough mailing list posts and various documentation sources 
> and I could not find anything that could help.
>
> Any help is much appreciated.
>
> Regards
> Vip
>
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx 
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www-auth.cs.wisc.edu/lists/htcondor-users/attachments/20200727/06d0ec0a/attachment.html>

------------------------------

Message: 2
Date: Mon, 27 Jul 2020 19:58:51 +0000
From: "Bockelman, Brian" <BBockelman@xxxxxxxxxxxxx>
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Conda env with workstation pool
Message-ID: <FDA0314C-5F9B-4248-B879-5EA96B3985A7@xxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

Hi Matt,

Three ideas come to mind:

1.  We've seen sites wrap the HTCondor interaction in higher-level tools that users use instead of HTCondor commands:
   - Pro: Users need to learn very little -- only enough to invoke the high-level tool correctly.
   - Con: Users don't learn how to help themselves, can only support a limited number of workflows.

2.  The USER_JOB_WRAPPER, run at the startd, provides a powerful hook for customizing the job startup environment. Separately, note that Singularity can be used to start jobs inside containers you curate.
   - Pro: As you execute arbitrary code, you can do a number of transformations impossible to do at the schedd side.
   - Con: As far as users are concerned, this is outright magic.  They'll learn the HTCondor system more but may not be able to transfer the knowledge to other places without said magic.

3.  Perhaps you can just provide an easy way to generate the right environments and have users utilize HTCondor directly.  That is, write a wrapper on the submit side which takes a conda environment and produces a corresponding file (which is really a Singularity image - but no need to tell the user that) and provide explanation of how to run inside that file.  This way, the conda environment is always activated in the job - pure python.
   - You still end up doing some work to write the "conda-to-singularity" script but after that the users can still see all the pieces working together.

Brian

On Jul 27, 2020, at 12:42 PM, West Matthew <matthew.west@xxxxxxxx<mailto:matthew.west@xxxxxxxx>> wrote:

Hi All,

I want to stress something that I might not have made clear in previous messages.

I work with a lots of folks who have little experience with running software outside of a single machine. They have a need to scale up their computing effort in order to do a quality analysis but HTC is not why they got interested in the work. This is not to say a biologist or civil engineer cannot learn the tools to containerize their own software, but often incentives prioritizing immediate results make researchers hesitant to try new tools.

- Why do I need all these other skills just to run software like it does on my laptop?

For mature projects and more experienced developers, I believe using containers is an easy sell. But I hope y'all can understand the predicament of lowering the barrier to entry to working in an HTC framework for beginners.

Cheers,
Matt

________________________________
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx<mailto:htcondor-users-bounces@xxxxxxxxxxx>> on behalf of Michael Pelletier via HTCondor-users <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
Sent: Monday, July 27, 2020 1:11:19 AM
To: HTCondor-Users Mail List
Cc: Michael Pelletier
Subject: Re: [HTCondor-users] Conda env with workstation pool

I set up a Singularity container to auto-activate a specified environment at startup, which works pretty well for my users. You might also be able to use the OS-native python to write a python-language wrapper that pulls in all the activation environment variables to its own environment, and then invokes the target Python script.



Michael V Pelletier
Principal Engineer

Raytheon Technologies
Information Technology
Digital Transormation & Innovation




From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx<mailto:htcondor-users-bounces@xxxxxxxxxxx>> On Behalf Of West Matthew
Sent: Friday, July 24, 2020 7:48 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
Subject: [External] Re: [HTCondor-users] Conda env with workstation pool



Hi Josh,

I am working from the CHTC recommendation using conda pack. I am curious if you know a way to activate the conda environment from within a python script? Having to wrap my python executable with a bash script is rather frustrating when I am trying to get away from multi-language situation.

I can create the directory and extract the contents of the tarball. I just need to activate the env.

Cheers,
Matt
________________________________
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx<mailto:htcondor-users-bounces@xxxxxxxxxxx>> on behalf of Josh Karpel via HTCondor-users <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
Sent: Thursday, July 23, 2020 4:41:59 PM
To: HTCondor-Users Mail List
Cc: Josh Karpel
Subject: Re: [HTCondor-users] Conda env with workstation pool



That pretty much lines up with what we tell people to do at CHTC: http://chtc.cs.wisc.edu/conda-installation.shtml<https://urldefense.com/v3/__http:/chtc.cs.wisc.edu/conda-installation.shtml__;!!MvWE!WuPvPoWepMwM6AtC-xjjbqI0ch352mJpw3heCLd5ufGBsPongbywiG6AxTJ-Wn1ONGI4aA$>





Josh Karpel
karpel@xxxxxxxx<mailto:karpel@xxxxxxxx>





On Thu, Jul 23, 2020 at 8:25 AM Michael Pelletier via HTCondor-users <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>> wrote:
I?ve built a Singularity definition file that installs Miniconda and creates an environment YAML file, then builds the environment, and then configures the container so that the environment is activated automatically at the startup of the Singularity container. With Miniconda a CUDA 10.2 and 18.04 Ubuntu Singularity container file is about 850 megabytes in size.



Singularity doesn?t necessarily have an extra infrastructure layer, as it doesn?t require any services from the host ? I bet it would be possible to input-transfer the Singularity executable and run it on an input-transferred container.



Alternatively, you could build out the full virtualenv in a directory with Miniconda, and then input-transfer that whole directory and activate it when the job starts up, which would eliminate the need for modules to be available on the exec node.



Michael V Pelletier
Principal Engineer

Raytheon Technologies
Information Technology
Digital Transormation & Innovation



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx<mailto:htcondor-users-bounces@xxxxxxxxxxx>> On Behalf Of West Matthew
Sent: Thursday, July 23, 2020 7:03 AM
To: htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>
Subject: [External] [HTCondor-users] Conda env with workstation pool



I am trying to run an analysis on my local workstation pool that relies on software in a conda virtual environment. When the job runs on a remote machine, it does not have access to the libraries in that env back on the submit machine.

Given that virtual environments are common practice when running locally, I an hoping there is some means of making python libraries accessible without resorting to an extra infrastructure layer like Docker.



Cheers,
Matt
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users<https://urldefense.com/v3/__https:/lists.cs.wisc.edu/mailman/listinfo/htcondor-users__;!!MvWE!WuPvPoWepMwM6AtC-xjjbqI0ch352mJpw3heCLd5ufGBsPongbywiG6AxTJ-Wn2QpuKMtw$>

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/<https://urldefense.com/v3/__https:/lists.cs.wisc.edu/archive/htcondor-users/__;!!MvWE!WuPvPoWepMwM6AtC-xjjbqI0ch352mJpw3heCLd5ufGBsPongbywiG6AxTJ-Wn2XQz5MjQ$>
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www-auth.cs.wisc.edu/lists/htcondor-users/attachments/20200727/e0071185/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
HTCondor-users mailing list
HTCondor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

------------------------------

End of HTCondor-users Digest, Vol 80, Issue 59
**********************************************