[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Support for ambient credentials in built-in S3 plugin



Hello, I am using Condor 9.0.16 on Windows and I am trying to use the
native S3 file transfer capabilities with an EC2 instance's ambient
credentials (via its instance profile) to write to an S3 bucket.

I tested submitting a job with an S3 URL in the output_destination.
The startd has permissions to by way of the aforementioned instance
profile. According to the Condor logs everything worked, but the files
are not in S3. Note: I had to configure "SIGN_S3_URLS = FALSE" to tell
Condor not to try to sign the URLs, since my goal is to use the
instance's ambient credentials.

I am currently maintaining a custom S3 file transfer plugin leveraging
the ambient credentials (since we must use temporary credentials and
we control the startd's instance profile and therefore its
permissions/authorizations) and I would prefer to stop maintaining it
in favor of Condor's native S3 file transfer capabilities. If this
should work and I've encountered a bug or it isn't designed to work
this way, it would be nice to get this functionality in a future
version.

Below is the submit file, as well as the respective entries from the
SchedLog/ShadowLog on the schedd, and the StarterLog on the startd.

===Submit file===

universe=vanilla
executable=C:\Windows\System32\cmd.exe
arguments="/q /c echo Hello World! I am %user% on %computername%. >>
helloworld_$(ProcId).txt"
initialdir=C:\jobs
log=$(ClusterId)_log.txt
error=stderr.txt
output=stdout.txt
transfer_executable=False
transfer_input=False
transfer_output=True
transfer_error=True
should_transfer_files=yes
when_to_transfer_output=on_exit
output_destination=s3://my-bucket-1c83oghox0tpb/test/$(ClusterId)
job_lease_duration=60
run_as_owner=True
load_profile=True
request_cpus=2
request_disk=1
request_memory=2000
queue 10


===SchedLog===

03/03/23 18:00:51 (pid:5468) Starting add_shadow_birthdate(3.5)
03/03/23 18:00:51 (pid:5468) Started shadow for job 3.5 on
slot1@<sanitized> <sanitized> for <sanitized>, (shadow pid = 2004)
03/03/23 18:00:53 (pid:5468) Shadow pid 2004 for job 3.5 reports job
exit reason 100.
03/03/23 18:00:53 (pid:5468) Match record (slot1@<sanitized>
<sanitized> for <sanitized>, 3.5) deleted


===ShadowLog===

03/03/23 18:00:53 (3.5) (2004): DoDownload: other side transferred
C:\condor\execute\dir_532\helloworld_5.txt to
s3://my-bucket-1c83oghox0tpb/test/3/helloworld_5.txt and got result 0
03/03/23 18:00:53 (3.5) (2004): DoDownload: other side transferred
C:\condor\execute\dir_532\_condor_stderr to
s3://my-bucket-1c83oghox0tpb/test/3/_condor_stderr and got result 0
03/03/23 18:00:53 (3.5) (2004): DoDownload: other side transferred
C:\condor\execute\dir_532\_condor_stdout to
s3://my-bucket-1c83oghox0tpb/test/3/_condor_stdout and got result 0
03/03/23 18:00:53 (3.5) (2004): Job 3.5 terminated: exited with status 0

===StarterLog===

03/03/23 18:00:51 (pid:532) Communicating with shadow <sanitized>
03/03/23 18:00:51 (pid:532) Submitting machine is "<sanitized>"
03/03/23 18:00:51 (pid:532) setting the orig job name in starter
03/03/23 18:00:51 (pid:532) setting the orig job iwd in starter
03/03/23 18:00:51 (pid:532) Chirp config summary: IO false, Updates
false, Delayed updates true.
03/03/23 18:00:51 (pid:532) Initialized IO Proxy.
03/03/23 18:00:51 (pid:532) Setting resource limits not implemented!
03/03/23 18:00:51 (pid:532) Set filetransfer runtime ads to
C:\condor\execute\dir_532\.job.ad and
C:\condor\execute\dir_532\.machine.ad.
03/03/23 18:00:51 (pid:532) File transfer completed successfully.
03/03/23 18:00:52 (pid:532) Job 3.5 set to execute immediately
03/03/23 18:00:52 (pid:532) Starting a VANILLA universe job with ID: 3.5
03/03/23 18:00:52 (pid:532) IWD: C:\condor\execute\dir_532
03/03/23 18:00:52 (pid:532) Output file:
C:\condor\execute\dir_532\_condor_stdout
03/03/23 18:00:52 (pid:532) Error file: C:\condor\execute\dir_532\_condor_stderr
03/03/23 18:00:52 (pid:532) Renice expr "10" evaluated to 10
03/03/23 18:00:52 (pid:532) Running job as user <sanitized>
03/03/23 18:00:52 (pid:532) About to exec C:\Windows\System32\cmd.exe
/q /c echo Hello World! I am %user% on %computername%. >>
helloworld_5.txt
03/03/23 18:00:52 (pid:532) Create_Process succeeded, pid=1448
03/03/23 18:00:52 (pid:532) DaemonCore: async_pipe is signalled, but
async_pipe_signal is false.
03/03/23 18:00:53 (pid:532) Process exited, pid=1448, status=0
03/03/23 18:00:53 (pid:532) Failed to open '.update.ad' to read update
ad: No such file or directory (2).
03/03/23 18:00:53 (pid:532) my_popen: CreateProcess failed err=193
03/03/23 18:00:53 (pid:532) FILETRANSFER: Failed to execute
C:\condor\bin\box_plugin.py, ignoring
03/03/23 18:00:53 (pid:532) my_popen: CreateProcess failed err=193
03/03/23 18:00:53 (pid:532) FILETRANSFER: Failed to execute
C:\condor\bin/gdrive_plugin.py, ignoring
03/03/23 18:00:53 (pid:532) my_popen: CreateProcess failed err=193
03/03/23 18:00:53 (pid:532) FILETRANSFER: Failed to execute
C:\condor\bin/onedrive_plugin.py, ignoring
03/03/23 18:00:53 (pid:532) Failed to open '.update.ad' to read update
ad: No such file or directory (2).
03/03/23 18:00:53 (pid:532) All jobs have exited... starter exiting
03/03/23 18:00:53 (pid:532) **** condor_starter (condor_STARTER) pid
532 EXITING WITH STATUS 0


Thank you

~
MG