Re: How to fix Bacula after tape autoloader dysfunction?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to fix Bacula after tape autoloader dysfunction?

Charles
On 27/12/16 23:30, [hidden email] wrote:

> ------------------------------
>
> Message: 8
> Date: Fri, 23 Dec 2016 10:13:58 +0530
> From: Charles <[hidden email]>
> Subject: [Bacula-users] How to fix Bacula after tape autoloader
> dysfunction?
> To: [hidden email]
> Cc: Aurinoco Systems <[hidden email]>
> Message-ID:
> <[hidden email]>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Hello bacula-users :-)
>
> Following misbehaviour by an HP StoreEver 18G2 LTO-6 Ultrium 6250 Tape
> Autoloader, the list volume command shows two tapes with VolStatus Error
> and one tape as Full with VolBytes 12,901,819,392.
>
> We do not believe the information.  Surely an LT06 cannot be filled with
> 12.5 GB.  And in a recent previous similar incident a different tape was
> shown with Error.  We cleared that by re-initialising the Bacula
> database and the tapes.
>
> Happily we have a parallel backup system and anyway the Bacula system
> has not been running long since re-initialisation.  So we can
> re-initialise again.
>
> Alternatively what can we do to recover Bacula from this situation,
> assuming the autoloader is fixed?  The update command can be used to
> change VolStatus but that is a forceful override.  Is there anything
> equivalent to "update slots" for tapes, to ask Bacula to scan all the
> tapes, updating VolStatus, VolBytes etc?
>
> Best
>
> Charles

Hello bacula-users :-)

Are any tools available to mend Bacula after almost certainly spurious
errors reported by a tape autoloader?

Within a few minutes of Bacula starting to use an autoloader which had
been subject to an abrupt power outage, the autoloader reported a
critical tape alert.  Bacula shows the tape wth status Error (simply
reflecting what the autoloader has told it?).

We believe the condition is spurious but we do not know how to recover
from it apart from re-initialising the tape which holds 97% of our
backup volume.

Best

Charles



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to fix Bacula after tape autoloader dysfunction?

Alan Brown
On 09/01/17 14:47, Charles wrote:

>
> Hello bacula-users :-)
>
> Are any tools available to mend Bacula after almost certainly spurious
> errors reported by a tape autoloader?
>
> Within a few minutes of Bacula starting to use an autoloader which had
> been subject to an abrupt power outage, the autoloader reported a
> critical tape alert.  Bacula shows the tape wth status Error (simply
> reflecting what the autoloader has told it?).

Are you sure it was the loader which had the critical error?


Most "critical errors" of this kind are bacula attempting to unload a
locked drive

(Lesson: issue explicit unlocking commands in your startup sequence)

Or they're an attempt to unload a tape from a drive where the loader has
lost track of what slot it came from.

(Lesson: issue explicit drive unload commands in your startup sequence.)


Yes, they're errors, but not really critical in the overall scheme of
things even if the loader thinks they are.


If you have multipath fibre/SAS/scsi to the drives, then bear in mind
that locks are logically ORed together.

ie: If you lock drive Z  from controller N and bacula starts using the
path from controller P, unlock commands from bacula will come from
controller X and the drive will remain locked.

This caught us out for a long time. I eventually wrote a small shell
script which worked out what the paths were to any given drive and
issued unlock commands for ALL of them. This was grafted into a
localised MTX-changer script.


The overall lesson is simple: "Make sure your drives are empty, BEFORE
(re)starting bacula-sd"


Other than that: update volume={tapelabel} volstatus=append

(Or "used" if you simply want to put the tape in a safe)


> We believe the condition is spurious but we do not know how to recover
> from it apart from re-initialising the tape which holds 97% of our
> backup volume.
>
> Best
>
> Charles
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Bacula-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>




------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to fix Bacula after tape autoloader dysfunction?

Kern Sibbald
Hello Charles,

If you do not know who Alan Brown is, I can say that he is at least 10
times more knowledgeable about the use of tape drives with Bacula than I
am, so I defer to his analysis.  The one thing that does not make sense
to me, however, is your tapeinfo did not show any Tape Alerts, which I
would have expected.

Best regards,
Kern

On 01/09/2017 06:09 PM, Alan Brown wrote:

> On 09/01/17 14:47, Charles wrote:
>> Hello bacula-users :-)
>>
>> Are any tools available to mend Bacula after almost certainly spurious
>> errors reported by a tape autoloader?
>>
>> Within a few minutes of Bacula starting to use an autoloader which had
>> been subject to an abrupt power outage, the autoloader reported a
>> critical tape alert.  Bacula shows the tape wth status Error (simply
>> reflecting what the autoloader has told it?).
> Are you sure it was the loader which had the critical error?
>
>
> Most "critical errors" of this kind are bacula attempting to unload a
> locked drive
>
> (Lesson: issue explicit unlocking commands in your startup sequence)
>
> Or they're an attempt to unload a tape from a drive where the loader has
> lost track of what slot it came from.
>
> (Lesson: issue explicit drive unload commands in your startup sequence.)
>
>
> Yes, they're errors, but not really critical in the overall scheme of
> things even if the loader thinks they are.
>
>
> If you have multipath fibre/SAS/scsi to the drives, then bear in mind
> that locks are logically ORed together.
>
> ie: If you lock drive Z  from controller N and bacula starts using the
> path from controller P, unlock commands from bacula will come from
> controller X and the drive will remain locked.
>
> This caught us out for a long time. I eventually wrote a small shell
> script which worked out what the paths were to any given drive and
> issued unlock commands for ALL of them. This was grafted into a
> localised MTX-changer script.
>
>
> The overall lesson is simple: "Make sure your drives are empty, BEFORE
> (re)starting bacula-sd"
>
>
> Other than that: update volume={tapelabel} volstatus=append
>
> (Or "used" if you simply want to put the tape in a safe)
>
>
>> We believe the condition is spurious but we do not know how to recover
>> from it apart from re-initialising the tape which holds 97% of our
>> backup volume.
>>
>> Best
>>
>> Charles
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Bacula-users mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>>
>
>
>
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Bacula-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to fix Bacula after tape autoloader dysfunction?

Charles
In reply to this post by Alan Brown
On 09/01/17 22:39, Alan Brown wrote:
> On 09/01/17 14:47, Charles wrote:
>> ... Bacula shows the tape wth status Error (simply
>> reflecting what the autoloader has told it?).
>
> Are you sure it was the loader which had the critical error?

No

> Most "critical errors" of this kind are bacula attempting to unload a
> locked drive

Confirmed in bacula.log

> (Lesson: issue explicit unlocking commands in your startup sequence)
>
> Or they're an attempt to unload a tape from a drive where the loader has
> lost track of what slot it came from.
>
> (Lesson: issue explicit drive unload commands in your startup sequence.)

Have modified /etc/init.d/bacula-sd (Debian Jessie), adding mt rewoffl and
mtx unload commands immediately before bacula-sd is started.

> ...
> Other than that: update volume={tapelabel} volstatus=append

Done

Many thanks for sharing your insights Alan :)

Comments inline above.

The only wrinkle was with the autoloader.  Presumably it marked a tape
in error because Bacula did.  Clearing the Bacula status did not clear
the autolodaer status until the tape was loaded to the drive.

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to fix Bacula after tape autoloader dysfunction?

Charles
In reply to this post by Kern Sibbald
On 10/01/17 03:17, Kern Sibbald wrote:
> ...The one thing that does not make sense
> to me, however, is your tapeinfo did not show any Tape Alerts ...

Thanks for being thorough, Kern.

Our bacula-sd.conf's Device stanza for the autoloader did not have an
Alert Command directive.  Now fixed.



------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to fix Bacula after tape autoloader dysfunction?

Alan Brown
In reply to this post by Charles
On 10/01/17 06:04, Charles wrote:
>
> Have modified /etc/init.d/bacula-sd (Debian Jessie), adding mt rewoffl
> and
> mtx unload commands immediately before bacula-sd is started.
>

That should do the trick, although I'd add mt unlock to make absolutely
sure.

>> ...
>> Other than that: update volume={tapelabel} volstatus=append
>
> Done
>
> Many thanks for sharing your insights Alan :)

No problem

>
> Comments inline above.
>
> The only wrinkle was with the autoloader.  Presumably it marked a tape
> in error because Bacula did.  Clearing the Bacula status did not clear
> the autolodaer status until the tape was loaded to the drive.
>

Bacula takes no notice of autoloader status and mtx won't show loader
status anyway (other than full/empty).
What make/model is the loader?





------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to fix Bacula after tape autoloader dysfunction?

Charles
On 10/01/17 22:25, Alan Brown wrote:
> On 10/01/17 06:04, Charles wrote:
>>
>> Have modified /etc/init.d/bacula-sd (Debian Jessie), adding mt rewoffl
>> and
>> mtx unload commands immediately before bacula-sd is started.
>>
>
> That should do the trick, although I'd add mt unlock to make absolutely
> sure.

According to both the man pages and experimentation, Debian Jessie's mt
and mtx do not support an unlock command.

> Bacula takes no notice of autoloader status and mtx won't show loader
> status anyway (other than full/empty).
> What make/model is the loader?

HP StoreEver 18G2 LTO-6 Ultrium 6250 Tape Autoloader

tapeinfo reports: HP Ultrium 6-SCSI revision 35GW


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to fix Bacula after tape autoloader dysfunction?

Alan Brown
On 11/01/17 05:16, Charles wrote:
>
> According to both the man pages and experimentation, Debian Jessie's
> mt and mtx do not support an unlock command.

Whilst other debian versions do

# mt --version
mt-st v. 1.3
default tape device: '/dev/tape'


      lock   (SCSI tapes) Lock the tape drive door.

       unlock (SCSI tapes) Unlock the tape drive door.


You might want to check the version of MT you have installed.


(mt -f /dev/nst0 lock)

Note that lock/unlock will _only_ work if there's a tape in the drive.





------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to fix Bacula after tape autoloader dysfunction?

Charles
On 11/01/17 15:01, Alan Brown wrote:

> On 11/01/17 05:16, Charles wrote:
>>
>> According to both the man pages and experimentation, Debian Jessie's
>> mt and mtx do not support an unlock command.
>
> Whilst other debian versions do
>
> # mt --version
> mt-st v. 1.3
> default tape device: '/dev/tape'
>
>
>       lock   (SCSI tapes) Lock the tape drive door.
>
>        unlock (SCSI tapes) Unlock the tape drive door.
>
>
> You might want to check the version of MT you have installed.
>
>
> (mt -f /dev/nst0 lock)
>
> Note that lock/unlock will _only_ work if there's a tape in the drive.

Thanks Alan

[hidden email]:~# mt -f /dev/st0 lock
mt: invalid argument `lock' for `operation'
Valid arguments are:
   - `eof', `weof'
   - `fsf'
   - `bsf'
   - `fsr'
   - `bsr'
   - `rewind'
   - `offline', `rewoffl', `eject'
   - `status'
   - `bsfm'
   - `eom'
   - `retension'
   - `erase'
   - `asf'
   - `fsfm'
   - `seek'
[hidden email]:~# mt --version
mt (GNU cpio) 2.11
...

To be thorough, the bacula-sd init script should not only run the mt
rewoffl and mtx unload commands; it should also error trap their output.
  I suspect that, when our autoloader and backup server are powered up
at the same time, the autoloader does not finish initialising before the
backup server runs the bacula-sd init script.  In which case the mt and
mtx commands are ineffective and SD finds a tape in the drive resulting
in the tape getting Bacula status Error.

More after I have tested error trapping the mt rewoffl and mtx unload
commands.

Best

Charles


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to fix Bacula after tape autoloader dysfunction?

Charles
On 12/01/17 16:17, Charles wrote:
...

> To be thorough, the bacula-sd init script should not only run the mt
> rewoffl and mtx unload commands; it should also error trap their output.
>  I suspect that, when our autoloader and backup server are powered up at
> the same time, the autoloader does not finish initialising before the
> backup server runs the bacula-sd init script.  In which case the mt and
> mtx commands are ineffective and SD finds a tape in the drive resulting
> in the tape getting Bacula status Error.
>
> More after I have tested error trapping the mt rewoffl and mtx unload
> commands.

Here's my solution.  Beware of line wraps.


[hidden email]:~# diff -u /etc/init.d/bacula-sd{.org,}
--- /etc/init.d/bacula-sd.org 2016-08-17 16:15:12.000000000 +0530
+++ /etc/init.d/bacula-sd 2017-01-15 20:08:37.125034780 +0530
@@ -20,6 +20,10 @@
  #       Customized for Bacula by Jose Luis Tallon
<[hidden email]>
  #

+# 15 Jan 2017 Charles for Support #3484 "autoloader1.iciti.av: cartridge
+#    AS0008L6: media error"
+#    * Added AUTOLOADER_INIT
+
  set -e

  PATH=/sbin:/bin:/usr/sbin:/usr/bin
@@ -42,6 +46,7 @@

  CONFIG="${CONFIG:-/etc/bacula/$NAME.conf}"
  STOPTIMEOUT="${STOPTIMEOUT:-180}"
+AUTOLOADER_INIT="${AUTOLOADER_INIT:-/usr/local/etc/init.d/bacula-sd-autoloader}"

  create_var_run_dir

@@ -50,6 +55,15 @@
  do_start()
  {
  if $DAEMON -t -c $CONFIG $ARGS > /dev/null 2>&1; then
+        if [ -x "$AUTOLOADER_INIT" ]; then
+    log_progress_msg "- running $AUTOLOADER_INIT"
+            "$AUTOLOADER_INIT"
+            if [ $? != 0 ]]; then
+               log_failure_msg \
+               "Not starting $DESC: could not initialise autoloader"
+       return 1
+            fi
+        fi
  start-stop-daemon --start --quiet --pidfile $PIDFILE \
  --oknodo --exec $DAEMON -- -c $CONFIG $ARGS
  return 0


[hidden email]:~# cat /usr/local/etc/init.d/bacula-sd-autoloader
#!/bin/bash

# Copyright (C) 2017 Charles Atkinson
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA

# Helper script for /etc/init.d/bacula-sd.
# Ensures the autoloader is suitably initialised

# Usage:
#   * No arguments or options
#   * Configuration by /etc/default/bacula-sd-autoloader
#   * Intended to be called by /etc/init.d/bacula-sd
#   * /etc/init.d/bacula-sd should fail if this script returns non-zero

# 15 Jan 2017 Charles Atkinson
#   * Creation

# Do nothing if bacula-sd is already running
netstat -plnt | grep -q bacula-sd && exit 0

# Configuration
PATH=/sbin:/bin:/usr/sbin:/usr/bin
NAME=bacula-sd-autoloader

. /lib/lsb/init-functions
if [[ -r /etc/default/$NAME ]]; then
     . /etc/default/$NAME
else
     log_failure_msg "Not initalising autoloader: /etc/default/$NAME
does not exist"
     exit 1
fi

if [[ $AUTOLOADER_DEV = '' ]]; then
     log_failure_msg "Not initalising autoloader: /etc/default/$NAME did
not set AUTOLOADER_DEV"
     exit 1
fi

if [[ $TAPE_DEV = '' ]]; then
     log_failure_msg "Not initalising autoloader: /etc/default/$NAME did
not set TAPE_DEV"
     exit 1
fi

TIMEOUT=${TIMEOUT:60}

# Initialise the autoloader
declare -r rewoffl_OK_regex='^$|rmtopen failed: No medium found$'
declare -r unload_OK_regex='^(Unloading drive 0 into Storage Element
[[:digit:]]\.+done|Data Transfer Element 0 is Empty)$'

start_time=$(date +%s)
while true
do
     err_count=0
     err_msg=

     buf=$(mt -f "$TAPE_DEV" rewoffl 2>&1)
     if [[ ! $buf =~ $rewoffl_OK_regex ]]; then
        err_msg+=$'\n'$buf
        ((err_count++))
     fi

     buf=$(mtx -f "$AUTOLOADER_DEV" unload 2>&1)
     if [[ ! $buf =~ $unload_OK_regex ]]; then
        err_msg+=$'\n'$buf
        ((err_count++))
     fi
     ((err_count==0)) && break

     now_time=$(date +%s)
     if ((now_time-start_time>TIMEOUT)); then
         log_failure_msg "Unable to initalise autoloader: $err_msg"
         exit 1
     fi
done

exit 0


[hidden email]:~# cat /etc/default/bacula-sd-autoloader
# Configuration sccript for /usr/local/etc/init.d/bacula-sd-autoloader

# Required conf values
AUTOLOADER_DEV=/dev/sg5
TAPE_DEV=/dev/st0

# Optional conf value
#TIMEOUT=120    # Seconds to wait for autoloader to become available.
Default 60


Best

Charles

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Loading...