Bacula 9.0.2 testing

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Bacula 9.0.2 testing

Phil Stracchino-2
I've now got everything updated to 9.0.2 using a work-in-progress
development version of the app-backup/bacula-9.0.2 ebuild.  I'm running
against MariaDB 10.2.7 (which more or less approximates MySQL 5.7) with
Galera enabled.


Build platforms:
Gentoo Linux on amd64 (AMD Phenom II, Thuban microarchitecture) using
gcc 6.3.0
Solaris 10u9 on amd64 (Intel P4 Xeon, Nocona microarchitecture) using
Solaris Studio 12.2
Solaris 11.3 on amd64 (AMD Opteron 2384, Shanghai microarchitecture)
using gcc 4.9.4

Build considerations:  Solaris 10 required the tgoto prototype in
conio.c to be moved down one line.  No other build issues encountered
other than that enabling building the storage daemon also forces
enabling the director, even if director is requested to be disabled.


I did change the DB write batch size limit at sql_create.c:870 from
500000 to 1000 per Galera best-performance recommendations.  I was able
to complete incremental backups and some differential backups.  I was
able to successfully run jobs that backed up as many as 120,000 files,
with wsrep_max_rows at its default of 128K.  A differential job that
tried to back up 177,000 files failed with wsrep_max_rows_exceeded.  If
that is truly the only place in the code that the write batch size is
set, then it appears database write batching is not actually working.

I maintain that even without Galera, 500000 is an unreasonably large
batch size.  Just because a modern database *can* handle it doesn't make
it a good idea.  50000 would be more reasonable, and 10000 would be better.



Problems encountered so far, running the Director and both SDs in the
foreground at -d200:

1.  None of the datetime fields in the schema have defaults.  This is a
problem unless STRICT SQL mode is disabled, which is a bad idea.  It is
probable that in upcoming Oracle MySQL versions (and forks thereof),
strict SQL will be mandatory.

Adding the canonically-correct-SQL DEFAULT '1970-01-01-00:00:00' to all
datetime fields prevented any further DB-related outright *failures*.
However, this causes problems with Volume Use Duration settings.

Using DEFAULT '0000-00-00 00:00:00' for datetime is permitted by MySQL
5.7 or MariaDB 10.2.x *as long as* SQL_MODE does not include
NO_ZERO_DATE or NO_ZERO_IN_DATE.  This does not APPEAR to cause any
problems with volume expiration.



2.  Various actions in BAT still create multiple overlapping and
often-confusing dialog boxes.  Deleting a volume, for example, emits a
confirmation dialog, followed by three more simultaneous dialogs:

- Warning:  This command will delete volume ... and all Jobs saved on
that volume from the Catalog
- Bat Question:  Are you sure you want to delete Volume ...? (yes/no)
- Text input dialog:  Are you sure you want to delete Volume ...? (yes/no)

You can't respond to the Warning until you respond to the Text Input
Dialog.  You can't respond to the Text Input Dialog until you respond to
the Bat Question.  If you type in the text input dialog's text input
box, it will throw an error.  You have to ignore the text box and click
OK instead.

However, this APPEARS to no longer cause BAT to become unresponsive.  I
have not yet tried a PURGE VOLUME, which is the other operation that
would in the past cause BAT to become unresponsive.



3.  I am having difficulty getting my LTO4 SD to mount and unmount tapes.

This is what the director logged when trying to run a restore from the
LTO4 tape SD with the wrong tape mounted:


29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:279 Read
acquire: Wrong Volume mounted on Tape device "LTO-4"
+(/dev/nst0): Wanted LTO4-FULL-0019 have LTO4-FULL-0013
29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=No medium found

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=No medium found

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=No medium found

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=No medium found

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=Input/output error

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=Input/output error

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=Input/output error

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=Input/output error

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=Input/output error

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=Input/output error

29-Jul 13:13 babylon5-sd JobId 14248: Fatal error: acquire.c:328 Too
many errors trying to mount Tape device "LTO-4"
+(/dev/nst0) for reading.
29-Jul 13:13 babylon4 JobId 14248: Fatal error: job.c:2699 Bad response
from SD to Read Data command. Wanted 3000 OK data
, got len=11 msg="3000 error "


If I *START* the sd with the correct tape in place, it automounts it
just fine.  I was able to complete a test restore that required a single
tape by pre-loading the tape.  But I cannot manually mount or unmount
tapes, either from BAT or from the console.  It just plain doesn't work.
 Nothing happens.  The SD doesn't log *anything* (at -d200) and as far
as I can tell, never receives the mount or umount commands.


status storage=babylon5-sd says about the device:

Device status:

Device Tape is "LTO-4" (/dev/nst0) mounted with:
    Volume:      LTO4-FULL-0019
    Pool:        *unknown*
    Media type:  LTO-4
    Total Bytes Read=0 Blocks Read=0 Bytes/block=0
    Positioned at File=0 Block=0
Configured device capabilities:
   EOF BSR BSF FSR FSF EOM REM !RACCESS AUTOMOUNT !LABEL !ANONVOLS
ALWAYSOPEN
Device state:
   OPENED TAPE LABEL !MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL
!SHORT !MOUNTED
   Writers=0 reserves=0 blocked=0 enabled=1 usage=1,024
Attached JobIds:
Device parameters:
   Archive name: /dev/nst0 Device name: LTO-4
   File=0 block=0
   Min block=0 Max block=2048000


Do I need to re-test my tape drive under Bacula 9.x?
Has something changed between 7.4.7 and 9 x in tape handling that
requires configuration changes?


Summary:
- Can't run full backups because I can't mount and unmount LTO4 tapes
except by restarting the SD, which will cause the running jobs to fail
- Database write batching is not working, causing jobs that back up more
than 128K files to fail
- Schema is not compliant with MySQL 5.7 or MariaDB 10.2 with strict SQL
compliance enabled, which will cause many database-related failures



--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bacula 9.0.2 testing

Ana Emília M. Arruda
Hello Phil,

I am writing to give you a suggestion about the tape drive issue you are having.

Even If this is a stand alone tape drive and not a tape library, you must have it configured as an autochanger on Director side. For example:

* on Storage Daemon (bacula-sd.conf), you can have the stand alone tape drive configured as usual:

Device {
  Name = My-Tape-Drive
  Drive Index = 0
  Media Type = LTOX
  Archive Device = /dev/nst0
  AutomaticMount = yes;
  AlwaysOpen = yes;
  AutoChanger = yes;
  LabelMedia = no;
  Maximum Concurrent Jobs = 1
}

* but on Director (bacula-dir.conf), you may have it configured as an autochanger:

Autochanger {
  Name = My-Tape-Autochanger
  Address = bacula-server
  SDPort = 9103
  Password = "nin1GJ0jsy4k_X4RhYZX4aoJxHK0WsKkp"
  Device = My-Tape-Drive
  Media Type = LTOX
  Maximum Concurrent Jobs = 1
  Autochanger = My-Tape-Autochanger
}

I don't remember in what version the old tape drive configuration in bacula-dir.conf as a stand alone storage device was causing the mount/unmount bconsole commands to do nothing and this is the workaround we have to make it work.

Hope this helps.

Best regards,

Ana


On Sat, Jul 29, 2017 at 5:35 PM, Phil Stracchino <[hidden email]> wrote:
I've now got everything updated to 9.0.2 using a work-in-progress
development version of the app-backup/bacula-9.0.2 ebuild.  I'm running
against MariaDB 10.2.7 (which more or less approximates MySQL 5.7) with
Galera enabled.


Build platforms:
Gentoo Linux on amd64 (AMD Phenom II, Thuban microarchitecture) using
gcc 6.3.0
Solaris 10u9 on amd64 (Intel P4 Xeon, Nocona microarchitecture) using
Solaris Studio 12.2
Solaris 11.3 on amd64 (AMD Opteron 2384, Shanghai microarchitecture)
using gcc 4.9.4

Build considerations:  Solaris 10 required the tgoto prototype in
conio.c to be moved down one line.  No other build issues encountered
other than that enabling building the storage daemon also forces
enabling the director, even if director is requested to be disabled.


I did change the DB write batch size limit at sql_create.c:870 from
500000 to 1000 per Galera best-performance recommendations.  I was able
to complete incremental backups and some differential backups.  I was
able to successfully run jobs that backed up as many as 120,000 files,
with wsrep_max_rows at its default of 128K.  A differential job that
tried to back up 177,000 files failed with wsrep_max_rows_exceeded.  If
that is truly the only place in the code that the write batch size is
set, then it appears database write batching is not actually working.

I maintain that even without Galera, 500000 is an unreasonably large
batch size.  Just because a modern database *can* handle it doesn't make
it a good idea.  50000 would be more reasonable, and 10000 would be better.



Problems encountered so far, running the Director and both SDs in the
foreground at -d200:

1.  None of the datetime fields in the schema have defaults.  This is a
problem unless STRICT SQL mode is disabled, which is a bad idea.  It is
probable that in upcoming Oracle MySQL versions (and forks thereof),
strict SQL will be mandatory.

Adding the canonically-correct-SQL DEFAULT '1970-01-01-00:00:00' to all
datetime fields prevented any further DB-related outright *failures*.
However, this causes problems with Volume Use Duration settings.

Using DEFAULT '0000-00-00 00:00:00' for datetime is permitted by MySQL
5.7 or MariaDB 10.2.x *as long as* SQL_MODE does not include
NO_ZERO_DATE or NO_ZERO_IN_DATE.  This does not APPEAR to cause any
problems with volume expiration.



2.  Various actions in BAT still create multiple overlapping and
often-confusing dialog boxes.  Deleting a volume, for example, emits a
confirmation dialog, followed by three more simultaneous dialogs:

- Warning:  This command will delete volume ... and all Jobs saved on
that volume from the Catalog
- Bat Question:  Are you sure you want to delete Volume ...? (yes/no)
- Text input dialog:  Are you sure you want to delete Volume ...? (yes/no)

You can't respond to the Warning until you respond to the Text Input
Dialog.  You can't respond to the Text Input Dialog until you respond to
the Bat Question.  If you type in the text input dialog's text input
box, it will throw an error.  You have to ignore the text box and click
OK instead.

However, this APPEARS to no longer cause BAT to become unresponsive.  I
have not yet tried a PURGE VOLUME, which is the other operation that
would in the past cause BAT to become unresponsive.



3.  I am having difficulty getting my LTO4 SD to mount and unmount tapes.

This is what the director logged when trying to run a restore from the
LTO4 tape SD with the wrong tape mounted:


29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:279 Read
acquire: Wrong Volume mounted on Tape device "LTO-4"
+(/dev/nst0): Wanted LTO4-FULL-0019 have LTO4-FULL-0013
29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=No medium found

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=No medium found

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=No medium found

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=No medium found

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=Input/output error

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=Input/output error

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=Input/output error

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=Input/output error

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=Input/output error

29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
Tape device "LTO-4" (/dev/nst0) Volume
+"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
"LTO-4" (/dev/nst0): ERR=Input/output error

29-Jul 13:13 babylon5-sd JobId 14248: Fatal error: acquire.c:328 Too
many errors trying to mount Tape device "LTO-4"
+(/dev/nst0) for reading.
29-Jul 13:13 babylon4 JobId 14248: Fatal error: job.c:2699 Bad response
from SD to Read Data command. Wanted 3000 OK data
, got len=11 msg="3000 error "


If I *START* the sd with the correct tape in place, it automounts it
just fine.  I was able to complete a test restore that required a single
tape by pre-loading the tape.  But I cannot manually mount or unmount
tapes, either from BAT or from the console.  It just plain doesn't work.
 Nothing happens.  The SD doesn't log *anything* (at -d200) and as far
as I can tell, never receives the mount or umount commands.


status storage=babylon5-sd says about the device:

Device status:

Device Tape is "LTO-4" (/dev/nst0) mounted with:
    Volume:      LTO4-FULL-0019
    Pool:        *unknown*
    Media type:  LTO-4
    Total Bytes Read=0 Blocks Read=0 Bytes/block=0
    Positioned at File=0 Block=0
Configured device capabilities:
   EOF BSR BSF FSR FSF EOM REM !RACCESS AUTOMOUNT !LABEL !ANONVOLS
ALWAYSOPEN
Device state:
   OPENED TAPE LABEL !MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL
!SHORT !MOUNTED
   Writers=0 reserves=0 blocked=0 enabled=1 usage=1,024
Attached JobIds:
Device parameters:
   Archive name: /dev/nst0 Device name: LTO-4
   File=0 block=0
   Min block=0 Max block=2048000


Do I need to re-test my tape drive under Bacula 9.x?
Has something changed between 7.4.7 and 9 x in tape handling that
requires configuration changes?


Summary:
- Can't run full backups because I can't mount and unmount LTO4 tapes
except by restarting the SD, which will cause the running jobs to fail
- Database write batching is not working, causing jobs that back up more
than 128K files to fail
- Schema is not compliant with MySQL 5.7 or MariaDB 10.2 with strict SQL
compliance enabled, which will cause many database-related failures



--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: <a href="tel:%2B1.603.293.8485" value="+16032938485">+1.603.293.8485
  Mobile:   <a href="tel:%2B1.603.998.6958" value="+16039986958">+1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bacula 9.0.2 testing

Phil Stracchino-2
On 07/29/17 19:35, Ana Emília M. Arruda wrote:

> Hello Phil,
>
> I am writing to give you a suggestion about the tape drive issue you are
> having.
>
> Even If this is a stand alone tape drive and not a tape library, you
> must have it configured as an autochanger on Director side. For example:
>
> * on Storage Daemon (bacula-sd.conf), you can have the stand alone tape
> drive configured as usual:
>
> Device {
>   Name = My-Tape-Drive
>   Drive Index = 0
>   Media Type = LTOX
>   Archive Device = /dev/nst0
>   AutomaticMount = yes;
>   AlwaysOpen = yes;
>   AutoChanger = yes;
>   LabelMedia = no;
>   Maximum Concurrent Jobs = 1
> }
>
> * but on Director (bacula-dir.conf), you may have it configured as an
> autochanger:
>
> Autochanger {
>   Name = My-Tape-Autochanger
>   Address = bacula-server
>   SDPort = 9103
>   Password = "nin1GJ0jsy4k_X4RhYZX4aoJxHK0WsKkp"
>   Device = My-Tape-Drive
>   Media Type = LTOX
>   Maximum Concurrent Jobs = 1
>   Autochanger = My-Tape-Autochanger
> }
>
> I don't remember in what version the old tape drive configuration in
> bacula-dir.conf as a stand alone storage device was causing the
> mount/unmount bconsole commands to do nothing and this is the workaround
> we have to make it work.


Thanks Ana, I'll try that.  I *wondered* why the Director was so
insistent about whether the needed drive was in-changer or not.

As to what version it stopped working in, mine was working perfectly
well as a standalone device in 7.4.7.

Is LTOX a shorthand for "LTO Any"?


--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bacula 9.0.2 testing

Phil Stracchino-2
In reply to this post by Ana Emília M. Arruda
On 07/29/17 19:35, Ana Emília M. Arruda wrote:

> Autochanger {
>   Name = My-Tape-Autochanger
>   Address = bacula-server
>   SDPort = 9103
>   Password = "nin1GJ0jsy4k_X4RhYZX4aoJxHK0WsKkp"
>   Device = My-Tape-Drive
>   Media Type = LTOX
>   Maximum Concurrent Jobs = 1
>   Autochanger = My-Tape-Autochanger
> }


One thing I'm puzzled about after finding the relevant documentation.
What does the Autochanger *directive* in the Autochanger *resource* do?
It seems ... self-referential.

I'm presuming once I have an Autochanger resource with a Changer Device
specified, I don't need a Changer Device directive in the Device
resource for the drive any more.


--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bacula 9.0.2 testing

Phil Stracchino-2
OK, that change was a major step backward.  I modified my Device
resource for my tape drive as specified and added an Autochanger
resource to my Director, and now with that change *alone*, BAT suddenly
cannot browse Storages, Clients, Pools, FileSets, or Jobs.  (However, it
can still browse Media and Jobs Run.)


Trying to mount a tape using bconsole yielded the following:


*mount storage=babylon5-sd
Automatically selected Catalog: Catalog
Using Catalog "Catalog"

[ nothing happens ]

*mount storage=babylon5-changer
Connecting to Storage daemon babylon5-changer at
babylon5.babcom.com:9103 ...
3998 Device ""LTO-4" (/dev/nst0)" is not an autochanger.
Enter autochanger drive[0]:
Enter autochanger slot: 1
No "Changer Command" for "LTO-4" (/dev/nst0). Manual load of Volume may
be requird.
3901 Unable to open device ""LTO-4" (/dev/nst0)": ERR=tape_dev.c:170
Unable to open device "LTO-4" (/dev/nst0): ERR=No medium found


--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bacula 9.0.2 testing

Phil Stracchino-2
On 07/29/17 21:41, Phil Stracchino wrote:
> OK, that change was a major step backward.  I modified my Device
> resource for my tape drive as specified and added an Autochanger
> resource to my Director, and now with that change *alone*, BAT suddenly
> cannot browse Storages, Clients, Pools, FileSets, or Jobs.  (However, it
> can still browse Media and Jobs Run.)
[...]


Um.  Never mind, I'm not actually certain what I changed fiddling with
the configuration.  I basically tried moving a couple of directives from
the tape Device resource to the Autochanger resource based on the
Autochanger Support documentation in the 9.0.x manual on bacula.org,
then moved them back when the Director complained (contrary to the
documentation) that they weren't permitted there.  I don't THINK I made
any changes I didn't revert.  But now it's all working again, including
BAT being able to browse everything again.  The odd thing is, no error
was reported.

Having to specify an autochanger device and slot every time I mount and
unmount a tape is, I suspect, going to get irritating, but I can live
with it.  However, the whole concept of having to configure a fictitious
autochanger because support for stand-alone tape drives no longer works
is troubling, to say the least.  Really, *REALLY* troubling.

(As  a quick workaround, is there any way to tell Bacula that this
fictitious autochanger has only one drive and only one slot, so that it
can just assume '0,0' without having to prompt?)


Next order of business, I suppose, is to bring down the MariaDB cluster
and bring one node back up as a stand-alone, non-Galera mysqld so that I
can test a full backup to tape.



--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bacula 9.0.2 testing

Phil Stracchino-2
OK, brought the DB cluster down, brought one MariaDB node back up as a
standalone server.  Re-ran the diff that failed earlier, and
unsurprisingly, it ran with no issues.  So I tried a Full to disk.

Well, it's not writing to disk.  The BAT console SAYS it is, but it
isn't.  It did not create a new volume in the Full-Disk pool and use it.
 Neither is it requesting a tape.


run job="Asgard Backup" fileset="Asgard Full Set" level="Full"
client="asgard" pool="Full-Disk" storage="babylon4-file" priority="10"
when="2017-07-29 23:01:55" yes
Job queued. JobId=14298
run job="Babylon5 Backup" fileset="Gentoo Full Set" level="Full"
client="babylon5" pool="Full-Disk" storage="babylon4-file" priority="10"
when="2017-07-29 23:44:38" yes
Job queued. JobId=14299


Both FULL jobs are just sitting there catatonic, neither starting nor
requesting media nor throwing an error.  Not even a 'Job requires
attention' message.  So I retried it with the director at -d 200 again,
and found this:

minbar-dir: msgchan.c:232-14300 >stored: JobId=14300
job=Babylon5_Backup.2017-07-29_23.53.01_04 job_name=Babylon5Backup
client_name=babylon5 type=66 level=70 FileSet=GentooFullSet NoAttr=0
SpoolAttr=0 FileSetMD5=XnE+++/2d6+IsXhMY9+hQD SpoolData=0
WritePartAfterJob=1 PreferMountedVols=1 SpoolSize=0 rerunning=0
VolSessionId=0 VolSessionTime=0 sd_client=0 Authorization=dummy
minbar-dir: msgchan.c:233-14300 === rstore=0 wstore=7fef1c004dc8
minbar-dir: getmsg.c:151-14300 bget_dirmsg 91: 3000 OK Job SDid=3
SDtime=1501377713 Authorization=LMGK-LHNM-LPGC-GLNB-CPFO-LNNC-DBDI-LFEP

minbar-dir: msgchan.c:235-14300 <stored: 3000 OK Job SDid=3
SDtime=1501377713 Authorization=LMGK-LHNM-LPGC-GLNB-CPFO-LNNC-DBDI-LFEP
minbar-dir: msgchan.c:244-14300
sd_auth_key=LMGK-LHNM-LPGC-GLNB-CPFO-LNNC-DBDI-LFEP
minbar-dir: msgchan.c:322-14300 Wstore=babylon5-sd
minbar-dir: msgchan.c:330-14300 wstore >stored: use storage=babylon5-sd
media_type=LTO-4 pool_name=Full-Tape pool_type=Backup append=1 copy=0
stripe=0
minbar-dir: msgchan.c:337-14300 >stored: use device=LTO-4
minbar-dir: getmsg.c:151-14300 bget_dirmsg -1:
minbar-dir: getmsg.c:151-14300 bget_dirmsg -1:
minbar-dir: getmsg.c:151-14300 bget_dirmsg -1:
minbar-dir: getmsg.c:151-14300 bget_dirmsg -1:


So, first problem (still present):  You tell it at the console to run a
Full job but use a different pool, and it says "OK, I'm doing what you
said", and then it *ignores* what it was just told and uses the Pool the
schedule says for Full jobs.  Any customization of a Job that conflicts
with what the Schedule specifies for that Job level is simply overridden.

This behavior was wrong in 7.x, and it's still wrong in 9.x.  Job
customization by the administrator should ALWAYS override ANY defaults
for the job.  "The Administrator Is Always Right."  Console overrides
defaults, NOT defaults override console.  Or why have a console at all?


Second problem:  When it decides to go ahead and use a different Pool
and Storage than I told it to, that's not working anyway.  No mount
request, no operator notification.

Usage question here:  Does the Job definition or Schedule need to be
changed to use the fictional Autochanger now INSTEAD of the Storage?
And what's it going to do when the Autochanger wants a drive and slot
number?




--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bacula 9.0.2 testing

Kern Sibbald
In reply to this post by Phil Stracchino-2
Hello Phil,

See below ...


On 07/29/2017 10:35 PM, Phil Stracchino wrote:
> I've now got everything updated to 9.0.2 using a work-in-progress
> development version of the app-backup/bacula-9.0.2 ebuild.  I'm running
> against MariaDB 10.2.7 (which more or less approximates MySQL 5.7) with
> Galera enabled.
Warning MariaDB 10.2.7 has a serious bug that causes batch inserts to
occasionally fail.  I submitted a bug report, and they duplicated the
problem with a script and are working on fixing it.

>
>
> Build platforms:
> Gentoo Linux on amd64 (AMD Phenom II, Thuban microarchitecture) using
> gcc 6.3.0
> Solaris 10u9 on amd64 (Intel P4 Xeon, Nocona microarchitecture) using
> Solaris Studio 12.2
> Solaris 11.3 on amd64 (AMD Opteron 2384, Shanghai microarchitecture)
> using gcc 4.9.4
>
> Build considerations:  Solaris 10 required the tgoto prototype in
> conio.c to be moved down one line.  No other build issues encountered
> other than that enabling building the storage daemon also forces
> enabling the director, even if director is requested to be disabled.
>
>
> I did change the DB write batch size limit at sql_create.c:870 from
> 500000 to 1000 per Galera best-performance recommendations.  I was able
> to complete incremental backups and some differential backups.  I was
> able to successfully run jobs that backed up as many as 120,000 files,
> with wsrep_max_rows at its default of 128K.  A differential job that
> tried to back up 177,000 files failed with wsrep_max_rows_exceeded.  If
> that is truly the only place in the code that the write batch size is
> set, then it appears database write batching is not actually working.
>
> I maintain that even without Galera, 500000 is an unreasonably large
> batch size.  Just because a modern database *can* handle it doesn't make
> it a good idea.  50000 would be more reasonable, and 10000 would be better.
I had always thought the limit was 25,000, so was surprised
when I saw 500,000 I was a bit surprised.  I suppose it is a sort of
insane limit I added.  Whether it works or not, I don't know.  A bit
I found out why I remembered 25,000.  That is because it is the
maximum set for PostgreSQL.

>
>
>
> Problems encountered so far, running the Director and both SDs in the
> foreground at -d200:
>
> 1.  None of the datetime fields in the schema have defaults.  This is a
> problem unless STRICT SQL mode is disabled, which is a bad idea.  It is
> probable that in upcoming Oracle MySQL versions (and forks thereof),
> strict SQL will be mandatory.
>
> Adding the canonically-correct-SQL DEFAULT '1970-01-01-00:00:00' to all
> datetime fields prevented any further DB-related outright *failures*.
> However, this causes problems with Volume Use Duration settings.
>
> Using DEFAULT '0000-00-00 00:00:00' for datetime is permitted by MySQL
> 5.7 or MariaDB 10.2.x *as long as* SQL_MODE does not include
> NO_ZERO_DATE or NO_ZERO_IN_DATE.  This does not APPEAR to cause any
> problems with volume expiration.
>
>
>
> 2.  Various actions in BAT still create multiple overlapping and
> often-confusing dialog boxes.  Deleting a volume, for example, emits a
> confirmation dialog, followed by three more simultaneous dialogs:
>
> - Warning:  This command will delete volume ... and all Jobs saved on
> that volume from the Catalog
> - Bat Question:  Are you sure you want to delete Volume ...? (yes/no)
> - Text input dialog:  Are you sure you want to delete Volume ...? (yes/no)
>
> You can't respond to the Warning until you respond to the Text Input
> Dialog.  You can't respond to the Text Input Dialog until you respond to
> the Bat Question.  If you type in the text input dialog's text input
> box, it will throw an error.  You have to ignore the text box and click
> OK instead.
>
> However, this APPEARS to no longer cause BAT to become unresponsive.  I
> have not yet tried a PURGE VOLUME, which is the other operation that
> would in the past cause BAT to become unresponsive.
>
>
>
> 3.  I am having difficulty getting my LTO4 SD to mount and unmount tapes.
>
> This is what the director logged when trying to run a restore from the
> LTO4 tape SD with the wrong tape mounted:
>
>
> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:279 Read
> acquire: Wrong Volume mounted on Tape device "LTO-4"
> +(/dev/nst0): Wanted LTO4-FULL-0019 have LTO4-FULL-0013
> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
> Tape device "LTO-4" (/dev/nst0) Volume
> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
> "LTO-4" (/dev/nst0): ERR=No medium found
>
> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
> Tape device "LTO-4" (/dev/nst0) Volume
> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
> "LTO-4" (/dev/nst0): ERR=No medium found
>
> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
> Tape device "LTO-4" (/dev/nst0) Volume
> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
> "LTO-4" (/dev/nst0): ERR=No medium found
>
> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
> Tape device "LTO-4" (/dev/nst0) Volume
> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
> "LTO-4" (/dev/nst0): ERR=No medium found
>
> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
> Tape device "LTO-4" (/dev/nst0) Volume
> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
> "LTO-4" (/dev/nst0): ERR=Input/output error
>
> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
> Tape device "LTO-4" (/dev/nst0) Volume
> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
> "LTO-4" (/dev/nst0): ERR=Input/output error
>
> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
> Tape device "LTO-4" (/dev/nst0) Volume
> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
> "LTO-4" (/dev/nst0): ERR=Input/output error
>
> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
> Tape device "LTO-4" (/dev/nst0) Volume
> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
> "LTO-4" (/dev/nst0): ERR=Input/output error
>
> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
> Tape device "LTO-4" (/dev/nst0) Volume
> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
> "LTO-4" (/dev/nst0): ERR=Input/output error
>
> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
> Tape device "LTO-4" (/dev/nst0) Volume
> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
> "LTO-4" (/dev/nst0): ERR=Input/output error
>
> 29-Jul 13:13 babylon5-sd JobId 14248: Fatal error: acquire.c:328 Too
> many errors trying to mount Tape device "LTO-4"
> +(/dev/nst0) for reading.
> 29-Jul 13:13 babylon4 JobId 14248: Fatal error: job.c:2699 Bad response
> from SD to Read Data command. Wanted 3000 OK data
> , got len=11 msg="3000 error "
>
>
> If I *START* the sd with the correct tape in place, it automounts it
> just fine.  I was able to complete a test restore that required a single
> tape by pre-loading the tape.  But I cannot manually mount or unmount
> tapes, either from BAT or from the console.  It just plain doesn't work.
>   Nothing happens.  The SD doesn't log *anything* (at -d200) and as far
> as I can tell, never receives the mount or umount commands.
>
>
> status storage=babylon5-sd says about the device:
>
> Device status:
>
> Device Tape is "LTO-4" (/dev/nst0) mounted with:
>      Volume:      LTO4-FULL-0019
>      Pool:        *unknown*
>      Media type:  LTO-4
>      Total Bytes Read=0 Blocks Read=0 Bytes/block=0
>      Positioned at File=0 Block=0
> Configured device capabilities:
>     EOF BSR BSF FSR FSF EOM REM !RACCESS AUTOMOUNT !LABEL !ANONVOLS
> ALWAYSOPEN
> Device state:
>     OPENED TAPE LABEL !MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL
> !SHORT !MOUNTED
>     Writers=0 reserves=0 blocked=0 enabled=1 usage=1,024
> Attached JobIds:
> Device parameters:
>     Archive name: /dev/nst0 Device name: LTO-4
>     File=0 block=0
>     Min block=0 Max block=2048000
>
>
> Do I need to re-test my tape drive under Bacula 9.x?
> Has something changed between 7.4.7 and 9 x in tape handling that
> requires configuration changes?
>
>
> Summary:
> - Can't run full backups because I can't mount and unmount LTO4 tapes
> except by restarting the SD, which will cause the running jobs to fail
> - Database write batching is not working, causing jobs that back up more
> than 128K files to fail
> - Schema is not compliant with MySQL 5.7 or MariaDB 10.2 with strict SQL
> compliance enabled, which will cause many database-related failures
>
>
>
You have specified too many problems for me to deal with -- sorry.

I will say that the driver code for 9.0.x is totally rewritten from 7.4.x.
However the general high level code that mounts, unmounts, and
all that has not changed much.  Since it was such a massive rewrite,
there is a possibility of problems, but none of the rather extensive
regression tests shows problems, and the new code has been working
here on my (very simple) autochanger setup for at least a year.

I am not planning on working on Bat any more, but I do use it myself,
and I have noticed the annoying number of prompts to do something, but
it is not sufficiently annoying enough to make me dig into the code.
If it will not do the few things I need, I will fix it.  Otherwise, I am
trying
to switch over to Baculum, but I have not quite succeeded in making the
change.

Most of the things you mention will probably ultimately end up being fixed
except perhaps bat, because Bacula Systems has a team of programmers
working on the problems that are reported to them, and obviously at some
point (or immediately if I notice it) I backport the fixes they make from
the Enterprise version to the Community version.  Obviously if the
problem strikes me, or it is clearly documented in a bug report, it has
a much higher probability of being fixed.


Best regards,
Kern

Best regards,
Kern

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bacula 9.0.2 testing

Kern Sibbald
Phil,

By the way, if you can definitively tell me how to distinguish Solaris
10 from any newer Solaris I will attempt to fix the conio.c problem.
This is an old "bug" of Solaris 10 where I have never had the
information I needed to fix it.  In fact, if I am not mistaken, the move
the line up one fix is documented in the source code.

Best regards,

Kern

PS: I am seriously considering deprecating MySQL and MariaDB, because
both are constantly changing their code and cause Bacula to break.  In
other words, they and the C++ programmers don't seem to mind introducing
serious incompatibilities with previous version.  This is still an open
subject.

With PostgreSQL, we not only get 30% better performance (in my tests),
but the incompatibilities that have been regularly hitting Bacula in
MySQL/MariaDB just don't exist.  Of course, if we suddenly find that no
one uses MySQL and we only have MariaDB that would be a different
consideration.  To be clear about what I mean by deprecation is I will
no longer maintain the code, but will not remove it.  If it breaks then
I will wait for a patch from the community.  This is essentially the
current case with bat.

Doing so allows me to concentrate on new features without worrying about
the *huge* task of maintaining the large body of Bacula code for
incompatibilities (whether they are good or bad) with ever changing
external software.

Best regards,

Kern


On 07/30/2017 11:19 AM, Kern Sibbald wrote:

> Hello Phil,
>
> See below ...
>
>
> On 07/29/2017 10:35 PM, Phil Stracchino wrote:
>> I've now got everything updated to 9.0.2 using a work-in-progress
>> development version of the app-backup/bacula-9.0.2 ebuild.  I'm running
>> against MariaDB 10.2.7 (which more or less approximates MySQL 5.7) with
>> Galera enabled.
> Warning MariaDB 10.2.7 has a serious bug that causes batch inserts to
> occasionally fail.  I submitted a bug report, and they duplicated the
> problem with a script and are working on fixing it.
>>
>>
>> Build platforms:
>> Gentoo Linux on amd64 (AMD Phenom II, Thuban microarchitecture) using
>> gcc 6.3.0
>> Solaris 10u9 on amd64 (Intel P4 Xeon, Nocona microarchitecture) using
>> Solaris Studio 12.2
>> Solaris 11.3 on amd64 (AMD Opteron 2384, Shanghai microarchitecture)
>> using gcc 4.9.4
>>
>> Build considerations:  Solaris 10 required the tgoto prototype in
>> conio.c to be moved down one line.  No other build issues encountered
>> other than that enabling building the storage daemon also forces
>> enabling the director, even if director is requested to be disabled.
>>
>>
>> I did change the DB write batch size limit at sql_create.c:870 from
>> 500000 to 1000 per Galera best-performance recommendations.  I was able
>> to complete incremental backups and some differential backups. I was
>> able to successfully run jobs that backed up as many as 120,000 files,
>> with wsrep_max_rows at its default of 128K.  A differential job that
>> tried to back up 177,000 files failed with wsrep_max_rows_exceeded.  If
>> that is truly the only place in the code that the write batch size is
>> set, then it appears database write batching is not actually working.
>>
>> I maintain that even without Galera, 500000 is an unreasonably large
>> batch size.  Just because a modern database *can* handle it doesn't make
>> it a good idea.  50000 would be more reasonable, and 10000 would be
>> better.
> I had always thought the limit was 25,000, so was surprised
> when I saw 500,000 I was a bit surprised.  I suppose it is a sort of
> insane limit I added.  Whether it works or not, I don't know.  A bit
> I found out why I remembered 25,000.  That is because it is the
> maximum set for PostgreSQL.
>>
>>
>>
>> Problems encountered so far, running the Director and both SDs in the
>> foreground at -d200:
>>
>> 1.  None of the datetime fields in the schema have defaults. This is a
>> problem unless STRICT SQL mode is disabled, which is a bad idea.  It is
>> probable that in upcoming Oracle MySQL versions (and forks thereof),
>> strict SQL will be mandatory.
>>
>> Adding the canonically-correct-SQL DEFAULT '1970-01-01-00:00:00' to all
>> datetime fields prevented any further DB-related outright *failures*.
>> However, this causes problems with Volume Use Duration settings.
>>
>> Using DEFAULT '0000-00-00 00:00:00' for datetime is permitted by MySQL
>> 5.7 or MariaDB 10.2.x *as long as* SQL_MODE does not include
>> NO_ZERO_DATE or NO_ZERO_IN_DATE.  This does not APPEAR to cause any
>> problems with volume expiration.
>>
>>
>>
>> 2.  Various actions in BAT still create multiple overlapping and
>> often-confusing dialog boxes.  Deleting a volume, for example, emits a
>> confirmation dialog, followed by three more simultaneous dialogs:
>>
>> - Warning:  This command will delete volume ... and all Jobs saved on
>> that volume from the Catalog
>> - Bat Question:  Are you sure you want to delete Volume ...? (yes/no)
>> - Text input dialog:  Are you sure you want to delete Volume ...?
>> (yes/no)
>>
>> You can't respond to the Warning until you respond to the Text Input
>> Dialog.  You can't respond to the Text Input Dialog until you respond to
>> the Bat Question.  If you type in the text input dialog's text input
>> box, it will throw an error.  You have to ignore the text box and click
>> OK instead.
>>
>> However, this APPEARS to no longer cause BAT to become unresponsive.  I
>> have not yet tried a PURGE VOLUME, which is the other operation that
>> would in the past cause BAT to become unresponsive.
>>
>>
>>
>> 3.  I am having difficulty getting my LTO4 SD to mount and unmount
>> tapes.
>>
>> This is what the director logged when trying to run a restore from the
>> LTO4 tape SD with the wrong tape mounted:
>>
>>
>> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:279 Read
>> acquire: Wrong Volume mounted on Tape device "LTO-4"
>> +(/dev/nst0): Wanted LTO4-FULL-0019 have LTO4-FULL-0013
>> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
>> Tape device "LTO-4" (/dev/nst0) Volume
>> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
>> "LTO-4" (/dev/nst0): ERR=No medium found
>>
>> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
>> Tape device "LTO-4" (/dev/nst0) Volume
>> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
>> "LTO-4" (/dev/nst0): ERR=No medium found
>>
>> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
>> Tape device "LTO-4" (/dev/nst0) Volume
>> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
>> "LTO-4" (/dev/nst0): ERR=No medium found
>>
>> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
>> Tape device "LTO-4" (/dev/nst0) Volume
>> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
>> "LTO-4" (/dev/nst0): ERR=No medium found
>>
>> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
>> Tape device "LTO-4" (/dev/nst0) Volume
>> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
>> "LTO-4" (/dev/nst0): ERR=Input/output error
>>
>> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
>> Tape device "LTO-4" (/dev/nst0) Volume
>> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
>> "LTO-4" (/dev/nst0): ERR=Input/output error
>>
>> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
>> Tape device "LTO-4" (/dev/nst0) Volume
>> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
>> "LTO-4" (/dev/nst0): ERR=Input/output error
>>
>> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
>> Tape device "LTO-4" (/dev/nst0) Volume
>> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
>> "LTO-4" (/dev/nst0): ERR=Input/output error
>>
>> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
>> Tape device "LTO-4" (/dev/nst0) Volume
>> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
>> "LTO-4" (/dev/nst0): ERR=Input/output error
>>
>> 29-Jul 13:13 babylon5-sd JobId 14248: Warning: acquire.c:235 Read open
>> Tape device "LTO-4" (/dev/nst0) Volume
>> +"LTO4-FULL-0019" failed: ERR=tape_dev.c:170 Unable to open device
>> "LTO-4" (/dev/nst0): ERR=Input/output error
>>
>> 29-Jul 13:13 babylon5-sd JobId 14248: Fatal error: acquire.c:328 Too
>> many errors trying to mount Tape device "LTO-4"
>> +(/dev/nst0) for reading.
>> 29-Jul 13:13 babylon4 JobId 14248: Fatal error: job.c:2699 Bad response
>> from SD to Read Data command. Wanted 3000 OK data
>> , got len=11 msg="3000 error "
>>
>>
>> If I *START* the sd with the correct tape in place, it automounts it
>> just fine.  I was able to complete a test restore that required a single
>> tape by pre-loading the tape.  But I cannot manually mount or unmount
>> tapes, either from BAT or from the console.  It just plain doesn't work.
>>   Nothing happens.  The SD doesn't log *anything* (at -d200) and as far
>> as I can tell, never receives the mount or umount commands.
>>
>>
>> status storage=babylon5-sd says about the device:
>>
>> Device status:
>>
>> Device Tape is "LTO-4" (/dev/nst0) mounted with:
>>      Volume:      LTO4-FULL-0019
>>      Pool:        *unknown*
>>      Media type:  LTO-4
>>      Total Bytes Read=0 Blocks Read=0 Bytes/block=0
>>      Positioned at File=0 Block=0
>> Configured device capabilities:
>>     EOF BSR BSF FSR FSF EOM REM !RACCESS AUTOMOUNT !LABEL !ANONVOLS
>> ALWAYSOPEN
>> Device state:
>>     OPENED TAPE LABEL !MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL
>> !SHORT !MOUNTED
>>     Writers=0 reserves=0 blocked=0 enabled=1 usage=1,024
>> Attached JobIds:
>> Device parameters:
>>     Archive name: /dev/nst0 Device name: LTO-4
>>     File=0 block=0
>>     Min block=0 Max block=2048000
>>
>>
>> Do I need to re-test my tape drive under Bacula 9.x?
>> Has something changed between 7.4.7 and 9 x in tape handling that
>> requires configuration changes?
>>
>>
>> Summary:
>> - Can't run full backups because I can't mount and unmount LTO4 tapes
>> except by restarting the SD, which will cause the running jobs to fail
>> - Database write batching is not working, causing jobs that back up more
>> than 128K files to fail
>> - Schema is not compliant with MySQL 5.7 or MariaDB 10.2 with strict SQL
>> compliance enabled, which will cause many database-related failures
>>
>>
>>
> You have specified too many problems for me to deal with -- sorry.
>
> I will say that the driver code for 9.0.x is totally rewritten from
> 7.4.x.
> However the general high level code that mounts, unmounts, and
> all that has not changed much.  Since it was such a massive rewrite,
> there is a possibility of problems, but none of the rather extensive
> regression tests shows problems, and the new code has been working
> here on my (very simple) autochanger setup for at least a year.
>
> I am not planning on working on Bat any more, but I do use it myself,
> and I have noticed the annoying number of prompts to do something, but
> it is not sufficiently annoying enough to make me dig into the code.
> If it will not do the few things I need, I will fix it. Otherwise, I
> am trying
> to switch over to Baculum, but I have not quite succeeded in making the
> change.
>
> Most of the things you mention will probably ultimately end up being
> fixed
> except perhaps bat, because Bacula Systems has a team of programmers
> working on the problems that are reported to them, and obviously at some
> point (or immediately if I notice it) I backport the fixes they make from
> the Enterprise version to the Community version.  Obviously if the
> problem strikes me, or it is clearly documented in a bug report, it has
> a much higher probability of being fixed.
>
>
> Best regards,
> Kern
>
> Best regards,
> Kern
>
> ------------------------------------------------------------------------------
>
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Bacula-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bacula 9.0.2 testing

Ana Emília M. Arruda
In reply to this post by Phil Stracchino-2
Hi Phil,

LTOX was just an example, you can have whatever value you want here.

The "Autochanger = yes" is just to force the mount/unmount commands to work with a standalone tape drive. If yiu do not specify a drive and a slot it will finish silently.

You can keep your standalone tape drive  onfiguration, but when issuying a mount/unmount command you need to specify "drive = 0 slot =0". If have scripts, just add these two parameters to the mount/unmount commands.

I think this was introduced in version 9.

Best regards,
Ana


El 30 jul. 2017 3:04, "Phil Stracchino" <[hidden email]> escribió:

Thanks Ana, I'll try that.  I *wondered* why the Director was so
insistent about whether the needed drive was in-changer or not.

As to what version it stopped working in, mine was working perfectly
well as a standalone device in 7.4.7.

Is LTOX a shorthand for "LTO Any"?


--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: <a href="tel:%2B1.603.293.8485" value="+16032938485">+1.603.293.8485
  Mobile:   <a href="tel:%2B1.603.998.6958" value="+16039986958">+1.603.998.6958


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bacula 9.0.2 testing

Phil Stracchino-2
In reply to this post by Kern Sibbald
On 07/30/17 05:19, Kern Sibbald wrote:

> Hello Phil,
>
> See below ...
>
>
> On 07/29/2017 10:35 PM, Phil Stracchino wrote:
>> I've now got everything updated to 9.0.2 using a work-in-progress
>> development version of the app-backup/bacula-9.0.2 ebuild.  I'm running
>> against MariaDB 10.2.7 (which more or less approximates MySQL 5.7) with
>> Galera enabled.
> Warning MariaDB 10.2.7 has a serious bug that causes batch inserts to
> occasionally fail.  I submitted a bug report, and they duplicated the
> problem with a script and are working on fixing it.

Useful to know.


>> I maintain that even without Galera, 500000 is an unreasonably large
>> batch size.  Just because a modern database *can* handle it doesn't make
>> it a good idea.  50000 would be more reasonable, and 10000 would be better.

> I had always thought the limit was 25,000, so was surprised
> when I saw 500,000 I was a bit surprised.  I suppose it is a sort of
> insane limit I added.  Whether it works or not, I don't know.  A bit
> I found out why I remembered 25,000.  That is because it is the
> maximum set for PostgreSQL.

25000 would be a much more reasonable size.


> You have specified too many problems for me to deal with -- sorry.

Not trying to make a bug list here, just reporting my test results, what
I tried, what worked, what didn't, what broke.

> I will say that the driver code for 9.0.x is totally rewritten from 7.4.x.
> However the general high level code that mounts, unmounts, and
> all that has not changed much.  Since it was such a massive rewrite,
> there is a possibility of problems, but none of the rather extensive
> regression tests shows problems, and the new code has been working
> here on my (very simple) autochanger setup for at least a year.

> I am not planning on working on Bat any more, but I do use it myself,
> and I have noticed the annoying number of prompts to do something, but
> it is not sufficiently annoying enough to make me dig into the code.
> If it will not do the few things I need, I will fix it.  Otherwise, I am
> trying
> to switch over to Baculum, but I have not quite succeeded in making the
> change.

I've been meaning to try out Baculum myself, but haven't gotten around
to it yet.  I really should do that, since I get the strong impression
BAT is more or less end-of-life.  I do like BAT though — I much prefer a
standalone application and have never been convinced of the merit of the
"EVERYTHING is a web app" model.  HTML was designed as a stateless
protocol and there are *many* things it's now being made to do that,
honestly, it just doesn't do well even with the aid of multiple
underlying glue layers.

Years ago there was a humorous list circulated of communication
protocols compared in terms of chickens crossing the road.  According to
that list, the web chicken reached the middle of the road, turned left
and just started running, and flattened everything else it ran across.


> Most of the things you mention will probably ultimately end up being fixed
> except perhaps bat, because Bacula Systems has a team of programmers
> working on the problems that are reported to them, and obviously at some
> point (or immediately if I notice it) I backport the fixes they make from
> the Enterprise version to the Community version.  Obviously if the
> problem strikes me, or it is clearly documented in a bug report, it has
> a much higher probability of being fixed.

When I've got things narrowed down to specific reportable items and
separated out from the configuration-change issues, I'll file some bug
reports.


--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bacula 9.0.2 testing

Phil Stracchino-2
In reply to this post by Kern Sibbald
On 07/30/17 05:31, Kern Sibbald wrote:
> Phil,
>
> By the way, if you can definitively tell me how to distinguish Solaris
> 10 from any newer Solaris I will attempt to fix the conio.c problem.
> This is an old "bug" of Solaris 10 where I have never had the
> information I needed to fix it.  In fact, if I am not mistaken, the move
> the line up one fix is documented in the source code.


Indeed it is, and anyone installing on Solaris is almost certainly
building from source anyway.  Which is why I didn't make a big deal of
it, just mentioned applying the usual fix.

I'm not sure how to best autodetect Solaris 11 vs. 10 at the code level
either.  One does presume that there must be a way, though.  It ought to
be possible to set a flag based upon the output of 'uname -r', which
will return either 5.10 or 5.11.


> PS: I am seriously considering deprecating MySQL and MariaDB, because
> both are constantly changing their code and cause Bacula to break.  In
> other words, they and the C++ programmers don't seem to mind introducing
> serious incompatibilities with previous version.  This is still an open
> subject.

Oracle has been doing a lot of work to clean up MySQL, fix old legacy
cruft, improve performance, and make it comply better with the formal
SQL spec.  The other major fork of community MySQL, Percona Server,
tracks 100% interchangeably with MySQL to the extent that I have run
MySQL and Percona Server interchangeably as members of the same cluster
with no issues, and Percona Software has actually contributed a lot of
their own bug fixes and enhancements back to the MySQL community (and
Oracle has happily adopted them).

MariaDB, particularly 10.1 and later, not so much.  MariaDB is
*INTENTIONALLY* diverging from community MySQL.  MariaDB 10.2 and MySQL
5.7 are not really fully interoperable any more.  I actually recommend
against MariaDB to clients at my day job for this reason, and in
particular I have established a policy that we *will not* support
MariaDB Galera clustering in customer environments because it is just
not sufficiently well integrated to be considered production-stable for
mission-critical applications.  In fact, when I first evaluated it at
work we couldn't even get it to successfully form a cluster.  The only
reason I'm currently running MariaDB instead of Percona XtraDB Cluster
(which is Percona Server plus Galera) is that there is no Gentoo ebuild
for XtraDB Cluster, and my efforts to create my own have so far been
unsuccessful.  If a Gentoo ebuild for XtraDB Cluster were released
today, I would convert to XtraDB Cluster today.

> With PostgreSQL, we not only get 30% better performance (in my tests),
> but the incompatibilities that have been regularly hitting Bacula in
> MySQL/MariaDB just don't exist.

Honestly, I think a lot of the performance issues reported with MySQL is
that most people who are not professional MySQL DBAs don't understand
how to properly tune MySQL.  It can be a lot of work.  The typical Linux
distribution's default MySQL configuration files are generally not
helpful in this regard.  Until a couple of years ago, Red Hat Enterprise
Linux not only shipped an old end-of-life MySQL release, but shipped it
with a default configuration file in which only one directive actually
*did* anything - and that one directive was *actively harmful*.

The usual response to tuning discussions from PostgreSQL advocates is
that PostgreSQL is "self-tuning" and very little tuning is required.  I
tend to think what that really translates to is that very little tuning
is *possible*.  There are technical choices made in some of PostgreSQL's
low-level implementation that frankly make me shudder.


>  Of course, if we suddenly find that no
> one uses MySQL and we only have MariaDB that would be a different
> consideration.  To be clear about what I mean by deprecation is I will
> no longer maintain the code, but will not remove it.  If it breaks then
> I will wait for a patch from the community.  This is essentially the
> current case with bat.


--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Bacula 9.0.2 testing

Phil Stracchino-2
In reply to this post by Phil Stracchino-2
On 07/30/17 14:55, Phil Stracchino wrote:
> On 07/30/17 05:19, Kern Sibbald wrote:
>> I had always thought the limit was 25,000, so was surprised
>> when I saw 500,000 I was a bit surprised.  I suppose it is a sort of
>> insane limit I added.  Whether it works or not, I don't know.  A bit
>> I found out why I remembered 25,000.  That is because it is the
>> maximum set for PostgreSQL.
>
> 25000 would be a much more reasonable size.

Oops ... I didn't finish that thought.

What I'd meant to say was:  25000 would be a much more reasonable size,
and certainly should *work* fine with or without Galera in use (even
though it's larger than ideal according to Galera best-performance
guidelines).  However, what my testing exposed was that unless there are
actually other places that limit needs to be set, then what it's set to
is moot because the batch size limit doesn't actually work anyway.


--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Loading...