Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Unable to start EC2 instance
- Date: Thu, 23 Jun 2011 09:34:19 -0500
- From: "Timothy St. Clair" <tstclair@xxxxxxxxxx>
- Subject: Re: [Condor-users] Unable to start EC2 instance
You could extract the condor_submit + gridmanager + ec2_gahp..
Cheers,
Tim
On Thu, 2011-06-23 at 07:26 -0700, Philip Papadopoulos wrote:
> Do I need all of condor 7.7 or can I just extract the ec2_gahp
> executable from it?
>
> Thanks,
> Phil
>
>
>
> On Thu, Jun 23, 2011 at 4:56 AM, Matthew Farrellee <matt@xxxxxxxxxx>
> wrote:
>
> On 06/22/2011 02:49 PM, Philip Papadopoulos wrote:
>
>
> Trying out Condor 7.6.1 -- installed via the
> rhap.stripped.tar.gz
>
> I get the following in my GAHP log.
> 06/22/11 09:33:37 Command(AMAZON_VM_STATUS_ALL) got
> error(code:Client,
> msg:End of file or no input: Operation interrupted or
> timed out
> 06/22/11 09:38:38 Call to DescribeInstances failed:
> SOAP 1.1 fault:
> SOAP-ENV:Client [no subcode]
> "End of file or no input: Operation interrupted or
> timed out"
> Detail: [no detail]
>
> 06/22/11 09:38:38 Command(AMAZON_VM_STATUS_ALL) got
> error(code:Client,
> msg:End of file or no input: Operation interrupted or
> timed out
> 06/22/11 09:42:08 EOF reached on pipe 0
> 06/22/11 09:42:08 stdin buffer closed, exiting
> 06/22/11 09:47:19 Call to DescribeInstances failed:
> SOAP 1.1 fault:
> SOAP-ENV:Client [no subcode]
> "End of file or no input: Operation interrupted or
> timed out"
> Detail: [no detail]
>
> 06/22/11 09:47:19 Command(AMAZON_VM_STATUS_ALL) got
> error(code:Client,
> msg:End of file or no input: Operation interrupted or
> timed out
> 06/22/11 09:48:33 EOF reached on pipe 0
> 06/22/11 09:48:33 stdin buffer closed, exiting
> 06/22/11 09:49:18 Call to DescribeInstances failed:
> SOAP 1.1 fault:
> SOAP-ENV:Client [no subcode]
> "End of file or no input: Operation interrupted or
> timed out"
> Detail: [no detail]
>
> 06/22/11 09:49:18 Command(AMAZON_VM_STATUS_ALL) got
> error(code:Client,
> msg:End of file or no input: Operation interrupted or
> timed out
>
>
> The submission file is simple:
> universe = grid
> grid_resource = amazon https://ec2.amazonaws.com/
> periodic_release = NumHolds < 3
> +NumHolds = 0
> periodic_remove = NumHolds >= 3 || (JobStatus == 2 &&
> time()-ShadowBday
> > 1*60*60)
> executable = RunEC2VM
> amazon_keypair_file = keypair.$(Process)
>
> amazon_ami_id = ami-4ed12d27
> amazon_instance_type = m1.large
> amazon_user_data =
> condor:landphil.rocksclusters.org:40000:50000
> amazon_private_key = /home/phil/.ec2/pk.pem
> amazon_public_key = /home/phil/.ec2/cert.pem
>
> queue 1
>
>
> And the condor_config_val (The salient ones I think)
> $ condor_config_val -dump | grep -i amazon
> AMAZON_GAHP = $(SBIN)/amazon_gahp
> AMAZON_GAHP_LOG = /tmp/AmazonGahpLog.$(USERNAME)
> GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE_AMAZON =
> 20
>
> and
> $ condor_config_val -dump | grep -i ssl
> SOAP_SSL_CA_FILE = /etc/pki/tls/cert.pem
> SOAP_SSL_SKIP_HOST_CHECK = True
>
> I've tried both with an without
> SOAP_SSL_SKIP_HOST_CHECK.
> the SSL_CA_FILE exists
> If I try WITHOUT the
> SOAP_SSL_CA_FILE = /etc/pki/tls/cert.pem
> then I get
> Call to DescribeInstances failed: SOAP 1.1 fault:
> SOAP-ENV:Client [no
> subcode]
> "SSL_ERROR_SSL
> error:14090086:SSL
> routines:SSL3_GET_SERVER_CERTIFICATE:certificate
> verify failed"
> Detail: SSL connect failed in tcp_connect()
>
>
> Right now I'm flumoxed.
>
> Thanks,
> Phil
>
> --
> Philip Papadopoulos, PhD
> University of California, San Diego
>
> 858-822-3628 <tel:858-822-3628> (Ofc)
> 619-331-2990 <tel:619-331-2990> (Fax)
>
> Phil,
>
> Assuming you aren't getting those errors 100% of the time, and
> you're actually talking to AWS's EC2 service.
>
> I've seen similar intermittent issues in the past. They came
> and went by days. After much investigation, I eventually
> chalked them up to transient issues with AWS' EC2 SOAP
> interface. The amazon_gahp was Condor's first means to
> interact with EC2 and was written to the (then popular) SOAP
> interface. Over the years the EC2 Query interface has
> apparently taken hold as the interface of choice, with many
> EC2 clones not supporting SOAP. In response, the ec2_gahp has
> been written, available in 7.7, against the Query interface.
> You should try it out, especially on a day when the SOAP
> interface is failing so that we might get a better handle on
> if the issue is truly SOAP v Query.
>
> Best,
>
>
> matt
>
>
>
> --
> Philip Papadopoulos, PhD
> University of California, San Diego
> 858-822-3628 (Ofc)
> 619-331-2990 (Fax)
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/