[HTCondor-devel] possible bug in HTCondor 9.0.7


Date: Wed, 01 Dec 2021 20:57:08 +0100
From: Carmelo Pellegrino <carmelo.pellegrino@xxxxxxxxxxxx>
Subject: [HTCondor-devel] possible bug in HTCondor 9.0.7
Dear HTCondor developers,

I'm writing to you because I've found something that may be a bug.

Recently I realised that if, in the submit file, the value assigned to the log command is a path, either relative or absolute, rather than just a filename and the job is submitted with the -spool condor_submit flag, the log file - and just it - is not put into the specified path but rather in the Submit_IWD folder.

I've tested this both in the production cluster I'm an admin of and in a htcondor/mini:9.0.7-el7 docker container.

Here is my submit file:

 $ cat submit.sub
executable            = script.sh
output                = logs/stdout.txt
error                 = logs/stderr.txt
log                   = logs/output.log
should_transfer_files = Yes
queue 1


the script being:

 $ cat script.sh
#!/bin/sh

echo "this goes to stdout"
echo "this goes to stderr" >&2

sleep 10


Here is how I submit it in the htcondor/mini container:

 $ condor_submit -spool submit.sub
Submitting job(s).
1 job(s) submitted to cluster 3.


After the job is completed, I run:

 $ condor_transfer_data 3
Fetching data files...


and this is the situation in the submit folder:

 $ tree
.
âââ logs
â   âââ stderr.txt
â   âââ stdout.txt
âââ output.log
âââ script.sh
âââ submit.sub


Unexpectedly, the "output.log" file is not in the logs/ subfolder. This was not the behaviour in HTCondor 8.8.

As stated above, the version of HTCondor is 9.0.7, running on CentOS 7.9:

 $ condor_version
$CondorVersion: 9.0.7 Nov 02 2021 BuildID: 562053 PackageID: 9.0.7-1 $
$CondorPlatform: x86_64_CentOS7 $

$ cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

Please find in attachment the output of the "condor_q -l 3" command.

I hope this is somehow useful, and apologise if not and if this is not the correct mailing list.

Best regards,
Carmelo

Arguments = ""
AutoClusterAttrs = "KFlops,MachineLastMatchTime,Offline,RemoteOwner,TotalJobRuntime,ConcurrencyLimits,FlockTo,Rank,Requirements,DiskUsage,ImageSize,MemoryUsage,RequestDisk,RequestMemory,ResidentSetSize"
AutoClusterId = 2
BlockReadKbytes = 0
BlockReads = 0
BlockWriteKbytes = 0
BlockWrites = 0
BytesRecvd = 79.0
BytesSent = 40.0
ClusterId = 3
Cmd = "script.sh"
CommittedSlotTime = 12.0
CommittedSuspensionTime = 0
CommittedTime = 12
CompletionDate = 1638386831
CondorPlatform = "$CondorPlatform: x86_64_CentOS7 $"
CondorVersion = "$CondorVersion: 9.0.7 Nov 02 2021 BuildID: 562053 PackageID: 9.0.7-1 $"
CoreSize = -1
CpusProvisioned = 1
CPUsUsage = 0.1197060934661592
CumulativeRemoteSysCpu = 0.0
CumulativeRemoteUserCpu = 0.0
CumulativeSlotTime = 12.0
CumulativeSuspensionTime = 0
CurrentHosts = 0
DiskProvisioned = 7622244
DiskUsage = 15
DiskUsage_RAW = 14
EncryptExecuteDirectory = false
EnteredCurrentStatus = 1638386831
Environment = ""
Err = "_condor_stderr"
ExecutableSize = 1
ExecutableSize_RAW = 1
ExitBySignal = false
ExitCode = 0
ExitStatus = 0
GlobalJobId = "6f2af0320058#3.0#1638386810"
HoldReason = undefined
HoldReasonCode = undefined
ImageSize = 1
ImageSize_RAW = 1
In = "/dev/null"
Iwd = "/var/lib/condor/spool/3/0/cluster3.proc0.subproc0"
JobCurrentFinishTransferInputDate = 1638386820
JobCurrentFinishTransferOutputDate = 1638386831
JobCurrentStartDate = 1638386819
JobCurrentStartExecutingDate = 1638386821
JobCurrentStartTransferInputDate = 1638386820
JobCurrentStartTransferOutputDate = 1638386831
JobFinishedHookDone = 1638386831
JobLeaseDuration = 2400
JobNotification = 0
JobPrio = 0
JobRunCount = 1
JobStartDate = 1638386819
JobStatus = 4
JobUniverse = 5
LastHoldReason = "Spooling input data files"
LastHoldReasonCode = 16
LastJobLeaseRenewal = 1638386831
LastJobStatus = 2
LastMatchTime = 1638386819
LastPublicClaimId = "<127.0.0.1:9618?addrs=127.0.0.1-9618&alias=6f2af0320058&noUDP&sock=startd_20_4bfb>#1638386427#8#..."
LastRemoteHost = "slot1@6f2af0320058"
LastSuspensionTime = 0
LeaveJobInQueue = JobStatus == 4 && (CompletionDate =?= undefined || CompletionDate == 0 || ((time() - CompletionDate) < 864000))
MachineAttrCpus0 = 1
MachineAttrSlotWeight0 = 1
MaxHosts = 1
MemoryProvisioned = 3975
MemoryUsage = ((ResidentSetSize + 1023) / 1024)
MinHosts = 1
MyType = "Job"
NumCkpts = 0
NumCkpts_RAW = 0
NumJobCompletions = 1
NumJobMatches = 1
NumJobStarts = 1
NumRestarts = 0
NumShadowStarts = 1
NumSystemHolds = 0
OnExitHold = false
OnExitRemove = true
OrigMaxHosts = 1
Out = "_condor_stdout"
Owner = "condor"
PeriodicHold = false
PeriodicRelease = false
PeriodicRemove = false
ProcId = 0
QDate = 1638386810
Rank = 0.0
RecentBlockReadKbytes = 0
RecentBlockReads = 0
RecentBlockWriteKbytes = 0
RecentBlockWrites = 0
RecentStatsLifetimeStarter = 5
ReleaseReason = "Data files spooled"
RemoteSysCpu = 0.0
RemoteUserCpu = 0.0
RemoteWallClockTime = 12.0
RequestCpus = 1
RequestDisk = DiskUsage
RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)
Requirements = (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.HasFileTransfer)
ResidentSetSize = 7500
ResidentSetSize_RAW = 6332
RootDir = "/"
ScratchDirFileCount = 7
ServerTime = 1638387682
ShouldTransferFiles = "YES"
SpooledOutputFiles = ""
StageInFinish = 1638386810
StageInStart = 1638386810
StageOutFinish = 1638386846
StageOutStart = 1638386846
StartdPrincipal = "execute-side@matchsession/127.0.0.1"
StatsLifetimeStarter = 10
StreamErr = false
StreamOut = false
SUBMIT_Cmd = "/test/script.sh"
SUBMIT_Iwd = "/test"
SUBMIT_TransferOutputRemaps = "_condor_stdout=logs/stdout.txt;_condor_stderr=logs/stderr.txt;"
SUBMIT_UserLog = "/test/logs/output.log"
TargetType = "Machine"
TerminationPending = true
ToE = [ Who = "itself"; How = "OF_ITS_OWN_ACCORD"; HowCode = 0; When = 1638386831 ]
TotalSubmitProcs = 1
TotalSuspensions = 0
TransferIn = false
TransferInFinished = 1638386820
TransferInputSizeMB = 0
TransferInStarted = 1638386820
TransferOutFinished = 1638386831
TransferOutputRemaps = undefined
TransferOutStarted = 1638386831
User = "condor@6f2af0320058"
UserLog = "output.log"
WantCheckpoint = false
WantRemoteIO = true
WantRemoteSyscalls = false
WhenToTransferOutput = "ON_EXIT"

Attachment: smime.p7s
Description: S/MIME cryptographic signature

[← Prev in Thread] Current Thread [Next in Thread→]