HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] Patches to src/condor_shadow.V6/pseudo_ops.C [1/3]



The first patch fixes a simple logical error.  The test for retry_wait
being greater than MaxRetryWait should be made before the actual sleep
is called.  Otherwise, the final sleep is wasted because the exception
is taken immediately.  Why tie up a machine for 2560 seconds (the max
value retry_wait will assume) simply to do nothing?

--- src/condor_shadow.V6/pseudo_ops.C.ORIG	Tue Mar  9 23:58:29 2004
+++ src/condor_shadow.V6/pseudo_ops.C.SAVE	Thu Dec 16 14:47:59 2004
@@ -706,13 +706,13 @@
 								  file,len,
 								  (struct in_addr*)ip_addr,port);
 			if (rval) { // network error, try again
+				if (retry_wait > MaxRetryWait) {
+					EXCEPT("ckpt server restore failed");
+				}
 				dprintf(D_ALWAYS, "ckpt server restore failed, trying again"
 						" in %d seconds\n", retry_wait);
 				sleep(retry_wait);
 				retry_wait *= 2;
-				if (retry_wait > MaxRetryWait) {
-					EXCEPT("ckpt server restore failed");
-				}
 			}
 		} while (rval);
 
@@ -880,13 +880,13 @@
 			rval = RequestStore(p->owner, scheddName, file, len,
 								(struct in_addr*)ip_addr, port);
 			if (rval) {	/* request denied or network error, try again */
+				if (retry_wait > MaxRetryWait) {
+					EXCEPT("ckpt server store failed");
+				}
 				dprintf(D_ALWAYS, "store request to ckpt server failed, "
 						"trying again in %d seconds\n", retry_wait);
 				sleep(retry_wait);
 				retry_wait *= 2;
-				if (retry_wait > MaxRetryWait) {
-					EXCEPT("ckpt server store failed");
-				}
 			}
 		} while (rval);
 
@@ -2195,13 +2195,13 @@
 			}
 		}
 		if(rval == -1 && LastCkptServer && accum_usage > MaxDiscardedRunTime) {
+			if (retry_wait > MaxRetryWait) {
+				EXCEPT("failed to contact ckpt server");
+			}
 			dprintf(D_ALWAYS, "failed to contact ckpt server, trying again"
 					" in %d seconds\n", retry_wait);
 			sleep(retry_wait);
 			retry_wait *= 2;
-			if (retry_wait > MaxRetryWait) {
-				EXCEPT("failed to contact ckpt server");
-			}
 		}
 	} while(rval == -1 && LastCkptServer && accum_usage > MaxDiscardedRunTime);
 	if (rval == -1) { /* not on local disk & not using ckpt server */

-- 
Daniel K. Forrest	Laboratory for Molecular and
forrest@xxxxxxxxxxxxx	Computational Genomics
(608) 262 - 9479	University of Wisconsin, Madison