I believe with the new ec2_gahp you need "grid_resource = ec2 https://ec2.amazonaws.com/"
Best,
matt
On 06/23/2011 07:30 PM, Philip Papadopoulos wrote:
<http://ec2-50-17-131-129.compute-1.amazonaws.com>Still no love....
I git cloned the head of the condor tree, and remade
copied condor_submit, condor_gridmanager, and ec2_gaph in bin, sbin, sbin
I changed the condor config to use the new gahp.
$ condor_config_val -dump | grep AMAZON
AMAZON_GAHP = $(SBIN)/ec2_gahp
AMAZON_GAHP_LOG = /tmp/AmazonGahpLog.$(USERNAME)
GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE_AMAZON = 20
And then submitted with
universe = grid
grid_resource = amazon https://ec2.amazonaws.com/
periodic_release = NumHolds < 3
+NumHolds = 0
periodic_remove = NumHolds >= 3 || (JobStatus == 2 && time()-ShadowBday
> 1*60*60)
executable = RunEC2VM
amazon_keypair_file = keypair.$(Process)
amazon_ami_id = ami-4ed12d27
amazon_instance_type = m1.large
amazon_user_data = condor:landphil.rocksclusters.org:40000:50000
amazon_private_key = /home/phil/.ec2/pk.pem
amazon_public_key = /home/phil/.ec2/cert.pem
queue 1\
as before.
GridManager.log shows
06/23/11 16:21:27 Setting maximum accepts per cycle 4.
06/23/11 16:21:29 [27034] ================================>
AmazonJob::AmazonJob 1
06/23/11 16:21:29 [27034] Found job 2.0 --- inserting
06/23/11 16:21:29 [27034] gahp server not up yet, delaying ping
06/23/11 16:21:29 [27034] (2.0) doEvaluateState called: gmState GM_INIT,
condorState 1
06/23/11 16:21:29 [27034] GAHP server pid = 27038
06/23/11 16:21:34 [27034] ERROR "Bad AMAZON_VM_STATUS_ALL Request: E" at
line 2256 in file
/state/partition1/condor/src/condor_gridmanager/gahp-client.cpp
From this same node, I can use ec2-native tools to start stop query
instances
e.g
$ ec2-describe-instances
RESERVATION r-ef3f0283 126101316194 default
INSTANCE i-d91433b7 ami-4ed12d27
ec2-50-17-131-129.compute-1.amazonaws.com<philip.papadopoulos@xxxxxxxxx <mailto:philip.papadopoulos@gmail.com>>
ip-10-110-235-155.ec2.internal running 0
m1.large 2011-06-23T23:05:41+0000 us-east-1c
aki-e5c1218c monitoring-disabled 50.17.131.129
10.110.235.155
instance-store paravirtual
xen sg-427ca02b
and
ec2-terminate-instances i-d91433b7
INSTANCE i-d91433b7 running shutting-down
-P
On Thu, Jun 23, 2011 at 7:41 AM, Philip Papadopoulos
wrote:
I will try that when I get in this AM (I'm on the west coast) and
report back.
Thanks,
Phil
On Thu, Jun 23, 2011 at 7:34 AM, Timothy St. Clair<tstclair@xxxxxxxxxx <mailto:tstclair@xxxxxxxxxx>> wrote:<matt@xxxxxxxxxx <mailto:matt@xxxxxxxxxx>>
You could extract the condor_submit + gridmanager + ec2_gahp..
Cheers,
Tim
On Thu, 2011-06-23 at 07:26 -0700, Philip Papadopoulos wrote:
> Do I need all of condor 7.7 or can I just extract the ec2_gahp
> executable from it?
>
> Thanks,
> Phil
>
>
>
> On Thu, Jun 23, 2011 at 4:56 AM, Matthew Farrellee> 858-822-3628 <tel:858-822-3628> <tel:858-822-3628 > 619-331-2990 <tel:619-331-2990> <tel:619-331-2990
> wrote:
>
> On 06/22/2011 02:49 PM, Philip Papadopoulos wrote:
>
>
> Trying out Condor 7.6.1 -- installed via the
> rhap.stripped.tar.gz
>
> I get the following in my GAHP log.
> 06/22/11 09:33:37
Command(AMAZON_VM_STATUS_ALL) got
> error(code:Client,
> msg:End of file or no input: Operation
interrupted or
> timed out
> 06/22/11 09:38:38 Call to DescribeInstances
failed:
> SOAP 1.1 fault:
> SOAP-ENV:Client [no subcode]
> "End of file or no input: Operation interrupted or
> timed out"
> Detail: [no detail]
>
> 06/22/11 09:38:38
Command(AMAZON_VM_STATUS_ALL) got
> error(code:Client,
> msg:End of file or no input: Operation
interrupted or
> timed out
> 06/22/11 09:42:08 EOF reached on pipe 0
> 06/22/11 09:42:08 stdin buffer closed, exiting
> 06/22/11 09:47:19 Call to DescribeInstances
failed:
> SOAP 1.1 fault:
> SOAP-ENV:Client [no subcode]
> "End of file or no input: Operation interrupted or
> timed out"
> Detail: [no detail]
>
> 06/22/11 09:47:19
Command(AMAZON_VM_STATUS_ALL) got
> error(code:Client,
> msg:End of file or no input: Operation
interrupted or
> timed out
> 06/22/11 09:48:33 EOF reached on pipe 0
> 06/22/11 09:48:33 stdin buffer closed, exiting
> 06/22/11 09:49:18 Call to DescribeInstances
failed:
> SOAP 1.1 fault:
> SOAP-ENV:Client [no subcode]
> "End of file or no input: Operation interrupted or
> timed out"
> Detail: [no detail]
>
> 06/22/11 09:49:18
Command(AMAZON_VM_STATUS_ALL) got
> error(code:Client,
> msg:End of file or no input: Operation
interrupted or
> timed out
>
>
> The submission file is simple:
> universe = grid
> grid_resource = amazon https://ec2.amazonaws.com/
> periodic_release = NumHolds < 3
> +NumHolds = 0
> periodic_remove = NumHolds >= 3 || (JobStatus
== 2 &&
> time()-ShadowBday
> > 1*60*60)
> executable = RunEC2VM
> amazon_keypair_file = keypair.$(Process)
>
> amazon_ami_id = ami-4ed12d27
> amazon_instance_type = m1.large
> amazon_user_data =
> condor:landphil.rocksclusters.org:40000:50000
> amazon_private_key = /home/phil/.ec2/pk.pem
> amazon_public_key = /home/phil/.ec2/cert.pem
>
> queue 1
>
>
> And the condor_config_val (The salient ones
I think)
> $ condor_config_val -dump | grep -i amazon
> AMAZON_GAHP = $(SBIN)/amazon_gahp
> AMAZON_GAHP_LOG = /tmp/AmazonGahpLog.$(USERNAME)
>
GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE_AMAZON =
> 20
>
> and
> $ condor_config_val -dump | grep -i ssl
> SOAP_SSL_CA_FILE = /etc/pki/tls/cert.pem
> SOAP_SSL_SKIP_HOST_CHECK = True
>
> I've tried both with an without
> SOAP_SSL_SKIP_HOST_CHECK.
> the SSL_CA_FILE exists
> If I try WITHOUT the
> SOAP_SSL_CA_FILE = /etc/pki/tls/cert.pem
> then I get
> Call to DescribeInstances failed: SOAP 1.1
fault:
> SOAP-ENV:Client [no
> subcode]
> "SSL_ERROR_SSL
> error:14090086:SSL
> routines:SSL3_GET_SERVER_CERTIFICATE:certificate
> verify failed"
> Detail: SSL connect failed in tcp_connect()
>
>
> Right now I'm flumoxed.
>
> Thanks,
> Phil
>
> --
> Philip Papadopoulos, PhD
> University of California, San Diego
>
<tel:619-331-2990>> (Fax)
>
> Phil,
>
> Assuming you aren't getting those errors 100% of the
time, and
> you're actually talking to AWS's EC2 service.
>
> I've seen similar intermittent issues in the past.
They came
> and went by days. After much investigation, I eventually
> chalked them up to transient issues with AWS' EC2 SOAP
> interface. The amazon_gahp was Condor's first means to
> interact with EC2 and was written to the (then
popular) SOAP
> interface. Over the years the EC2 Query interface has
> apparently taken hold as the interface of choice,
with many
> EC2 clones not supporting SOAP. In response, the
ec2_gahp has
> been written, available in 7.7, against the Query
interface.
> You should try it out, especially on a day when the SOAP
> interface is failing so that we might get a better
handle on
> if the issue is truly SOAP v Query.
>
> Best,
>
>
> matt
>
>
>
> --
> Philip Papadopoulos, PhD
> University of California, San Diego
> 858-822-3628 <tel:858-822-3628> (Ofc)
> 619-331-2990 <tel:619-331-2990> (Fax)> _______________________________________________<mailto:condor-users-request@cs.wisc.edu> with a
> Condor-users mailing list
> To unsubscribe, send a message to
condor-users-request@xxxxxxxxedu<mailto:condor-users-request@cs.wisc.edu> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to
condor-users-request@xxxxxxxxedu
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
--
Philip Papadopoulos, PhD
University of California, San Diego--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)