[Dirvish] mysteriously missing snapshots

Jason Boxman jasonb at edseek.com
Mon Jul 26 16:31:12 UTC 2010

On 7/26/2010 11:37 AM, Dave Howorth wrote:
> Jason Boxman wrote:
>> Yes.  Missing in that as far as the expected daily snapshot direct
>> doesn't appear when I go looking for it.  dirvish-runall won't remove
>> them, so it's either dirvish-expire or the missing directory was never
>> created.  I set expire-default +60 and have no special expire rules set
>> anywhere, though.
> As long as you run dirvish-expire before dirvish-runall, there shouldn't
> be any possibility of it deleting a new snapshot, because it won't exist
> yet.
>>> Have you tried running the missing snapshots manually to see what  happens?
>> Not yet.  I suspect they will succeed since every vault with a missing
>> snapshot ultimately has 90% of the other snapshot directories.
> As long as it's a random problem with just a few snapshots I guess that
> means you still have a reasonably up-to-date backup of everything (two
> days old or better?)

Yeah, but if it's unpredictable that mostly defeats the point, eh? ;)

>> Yeah, kind of operating in the dark here.  Sigh.
> One other thought I had is to check the logs on the clients to see
> whether they've been accessed by ssh or rsyncd or whatever you're using.
> They might show a rejected connection attempt or a timeout or somesuch.

On a host with a missing snapshot there was an ssh connection open for 
about 15 minutes during the backup window.  Strange indeed.  Munin shows 
the usual backup activity (interrupts, load avg, et al) on one of the 
hosts that has no snapshot directory.  Strange.

