Morning list,
I am running HTCondor on a number of EC2 instances on AWS. I have a control node and then 1+ worker nodes. I am using the docker universe and am struggling to get the executor to pull down the docker container image from AWS ECR. Here is the setup:
- Amazon ECR credential helper is installed on the worker AMI:
https://github.com/awslabs/amazon-ecr-credential-helper
- Tested and able to get auth without sudo.
-
- IAM role attached to worker node(s) has read/write permissions on the ECR repository.
- I can ssh / ssm into the worker node and
sudo docker pull
â to get the image. Then the image is cached and subsequent jobs can run as expected
- When I submit a job without a locally cached image, I am getting login errors. (Error:
Head "https://##account##.dkr.ecr.us-west-2.amazonaws.com/v2/hidtm-htcondor-repository/manifests/1.0.0": no basic auth credentials
â. This is an ecr error, not a condor error)
- I am relatively confident that the jobs are executing as the
nobody
â user as per how the cluster was setup.
Does anyone have experience using HTCondor on AWS with the docker universe? Is condor running docker with
sudo
â? I have noticed that the ecr credential helper needs to be run without elevated permissions (e.g., no sudo).
Best,
Jay
_______________________________________________