[Dirvish] Dirvish jobs that run more than 24 hours

Asheesh Laroia asheesh at asheesh.org
Tue Dec 9 20:44:17 UTC 2008


On Tue, 9 Dec 2008, Keith Lofstrom wrote:

> One of my offsite VPN client machines was moved to a slower internet 
> link.  Meanwhile, in October it went through a major automated upgrade. 
> As a result, the nightly rsync job took longer than 24 hours.  The next 
> night's backup started before the first completed, which attempted to 
> move all those files again, and slowed the link down further. 
> Cascading failures. I was incredibly busy, so I turned off dirvish to 
> that machine rather than fix the problem.

I use Dirvish for >24h backups, too.

I use Dirvish from the Debian packages, which have a cron job.  I'm 
attaching the original Debian cron job and my modified one; you'll see my 
modifications are centered around creating a PID file in 
/var/lock/dirvish-cronjob ; this allows me to be sure that I match not 
just any rsync, but the actual dirvish cron job.

A problem with Keith's suggestion is that if any user at all is running 
rsync, then the dirvish cron job will fail to start.

I hereby permit distribution for this modification under the same terms as 
Dirvish itself.

My way also has the cron job output something indicating why it did not 
run overnight, so that you get a nightly email knowing what happened.

-- Asheesh.

-- 
Truthful, adj.:
 	Dumb and illiterate.
 		-- Ambrose Bierce, "The Devil's Dictionary"
-------------- next part --------------
#! /bin/sh
#
# daily cron job for the dirvish package
#
if [ ! -x /usr/sbin/dirvish-expire  ]; then exit 0; fi
if [ ! -s /etc/dirvish/master.conf ]; then exit 0; fi

mount_check() {
	mntout=`tempfile -p mount`
	mount $1 >$mntout 2>&1
	if [ ! -d $1/lost+found ]; then # only works for "real" filesystems :-)
					# (Yes, I know about reiserfs.)
		echo "'mount $1' failed?! Stopping."
		echo "mount output:"
		cat $mntout
		rm -f $mntout
		exit 2
	fi

	if stat $1 | grep 'Inode: 2[^0-9]' >/dev/null; then # ditto
		rm -f $mntout
		return 0 # ok
	fi
	echo "$1 isn't inode 2 ?! Mount must have failed; stopping."
	echo ''
	stat $1
	echo "mount output:"
	cat $mntout
	rm -f $mntout
	umount $1
	exit 2
}

## Example of how to mount and umount a backup partition...
# mount_check /backup

/usr/sbin/dirvish-expire --quiet && /usr/sbin/dirvish-runall --quiet
rc=$?

# umount /backup || rc=$?

exit $rc
-------------- next part --------------
#! /bin/sh
#
# daily cron job for the dirvish package
#
if [ ! -x /usr/sbin/dirvish-expire  ]; then exit 0; fi
if [ ! -s /etc/dirvish/master.conf ]; then exit 0; fi

mount_check() {
	mntout=`tempfile -p mount`
	mount $1 >$mntout 2>&1
	if [ ! -d $1/lost+found ]; then # only works for "real" filesystems :-)
					# (Yes, I know about reiserfs.)
		echo "'mount $1' failed?! Stopping."
		echo "mount output:"
		cat $mntout
		rm -f $mntout
		exit 2
	fi

	if stat $1 | grep 'Inode: 2[^0-9]' >/dev/null; then # ditto
		rm -f $mntout
		return 0 # ok
	fi
	echo "$1 isn't inode 2 ?! Mount must have failed; stopping."
	echo ''
	stat $1
	echo "mount output:"
	cat $mntout
	rm -f $mntout
	umount $1
	exit 2
}

## Example of how to mount and umount a backup partition...
# mount_check /backup

## Asheesh's locking addition
fail() {
	echo "Cron job currently running; I'm outta here."
	exit 1;
}


die_if_dirvish_locked() {
	OTHER_PID=$(cat /var/lock/dirvish-cronjob 2>/dev/null)
	# if the PID file exists:
	[ -f  /var/lock/dirvish-cronjob ] && 
	ps "$OTHER_PID" 2>&1 >/dev/null && fail
}

lock_dirvish() {
	MY_PID=$$
	echo "$MY_PID" > /var/lock/dirvish-cronjob
}

unlock_dirvish() {
	rm -f /var/lock/dirvish-cronjob
}

die_if_dirvish_locked
lock_dirvish

/usr/sbin/dirvish-expire --quiet && /usr/sbin/dirvish-runall --quiet
rc=$?

# umount /backup || rc=$?

unlock_dirvish

exit $rc


More information about the Dirvish mailing list