|
Morning list,
I am running HTCondor on a number of EC2 instances on AWS. I have a control node and then 1+ worker nodes. I am using the docker universe and am struggling to get the executor to pull down the docker container image from AWS ECR. Here is the setup:
- Amazon ECR credential helper is installed on the worker AMI:
https://github.com/awslabs/amazon-ecr-credential-helper
- Tested and able to get auth without sudo.
-
- IAM role attached to worker node(s) has read/write permissions on the ECR repository.
- I can ssh / ssm into the worker node and
sudo docker pullâ to get the image. Then the image is cached and subsequent jobs can run as expected
- When I submit a job without a locally cached image, I am getting login errors. (Error:
Head "https://##account##.dkr.ecr.us-west-2.amazonaws.com/v2/hidtm-htcondor-repository/manifests/1.0.0": no basic auth credentialsâ. This is an ecr error, not a condor error)
- I am relatively confident that the jobs are executing as the
nobodyâ user as per how the cluster was setup.
Does anyone have experience using HTCondor on AWS with the docker universe? Is condor running docker with
sudoâ? I have noticed that the ecr credential helper needs to be run without elevated permissions (e.g., no sudo).
Best,
Jay
|