Release 9.0.2

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Release 9.0.2

Kern Sibbald
Hello,

We are pleased to announce that we have just released Bacula version 9.0.2

This is a minor bug fix release, but a few of the bugs are important. The main items fixed are:

– Postgresql should now work with Postgresql prior to 9.0 Note: the ssl connection feature added in 9.0 is not available on postgresql servers older than 9.0 (it needs the new connection API).
– The issues with MariaDB (reconnect variable) are now fixed
– The problem of the btape “test” command finding a wrong number of files in the append test was a bug. It is now fixed. It is unlikely that it affected anything but btape.
– The bacula-tray-monitor.deskop script is released in the scripts directory.
– We recommend that you build with both libz and lzo library support (the developer packages must be installed when building, and the shared object libraries must be installed at run time). However we have modified the code so that Bacula *should* build and run with either or both libz or lzo absent.

23Jul17
– Use Bacula in place of Libz variables so we can build with/without libz and lzo
– Apply ideas from bug #2255 prettier status slots output
– Configure and install bacula-tray-monitor.desktop
– Fix btape test which counted files incorrectly on EOT
– Fix bug #2296 where Bacula would not compile with postgres 8 or older
– Fix bug #2294 Bacula does not build with MariaDB 10.2
– baculum: Fix multiple directors support
– baculum: Fix showing errors from the API

Bugs fixed/closed since last release:
2255 2294 2296

Thanks for using Bacula,

Kern


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Release 9.0.2

Phil Stracchino-2
On 07/24/17 07:38, Kern Sibbald wrote:

> Hello,
>
> We are pleased to announce that we have just released Bacula version 9.0.2
>
> This is a minor bug fix release, but a few of the bugs are important.
> The main items fixed are:
>
> – Postgresql should now work with Postgresql prior to 9.0 Note: the ssl
> connection feature added in 9.0 is not available on postgresql servers
> older than 9.0 (it needs the new connection API).
> – The issues with MariaDB (reconnect variable) are now fixed
> – The problem of the btape “test” command finding a wrong number of
> files in the append test was a bug. It is now fixed. It is unlikely that
> it affected anything but btape.
> – The bacula-tray-monitor.deskop script is released in the scripts
> directory.
> – We recommend that you build with both libz and lzo library support
> (the developer packages must be installed when building, and the shared
> object libraries must be installed at run time). However we have
> modified the code so that Bacula *should* build and run with either or
> both libz or lzo absent.


Kern,
I've just discovered a serious compatibility problem - which should,
however, be fairly straightforward to fix.  This affects ALL Bacula
releases.  (I'm actually still running 7.4.7, because Gentoo's Bacula
ebuild has not yet updated to 9.0.x.)


The problem occurs when running Bacula against a MySQL+Galera cluster.
In this case, the cluster is MariaDB 10.1 plus Galera, but I have no
doubt the same problem will occur with Percona XtraDB Cluster, and
possibly with Oracle MySQL 5.7 using Group Replication, which is
Oracle's copy of Galera synchronous replication.

The problem is that Bacula apparently sends all of the records of a
backup job to the database in a single massive blast.  Because writesets
must be prepared in memory for certification before being committed,
Galera has a limit on the maximum size of a writeset which can be
applied in a single transaction.  This limit is set by two Galera
variables, wsrep_max_ws_rows and wsrep_max_ws_size.  These default to
128K rows and 2GB total writeset size respectively.  If a single
transaction exceeds either of these, it will fail.  My nightly
incremental backups since I converted to a cluster last week have been
working, but about half of this morning's differentials failed because
they tried to insert too many records in a single transaction.

The failure APPEARS to occur when sending spooled attributes to the
database, and the failure point APPEARS to occur somewhere between about
25,000 and 60,000 files backed up, but I don't have enough information
yet to narrow it down more closely than that.

The proper fix for this is to batch these writes into chunks (honestly,
it's best practice to do this anyway; you should never be sending
millions of rows in a single operation because it can consume huge
amounts of memory), and provide a Bacula configuration variable to limit
the chunk size, with the default of 0 for "unlimited".  Users running
standalone or asynchronous-replicated MySQL could simply leave it at the
default, or set it to whatever they feel comfortable with as a batch
size.  Users running against a Galera cluster or Group Replication
should probably set the chunk size no higher than 25K.  Codership Øy,
the creators of Galera, and Percona Software actually recommend not
exceeding 1000 rows per writeset for best performance.

I will try to find or make time to look at the code myself and see if I
can propose a patch.

As a temporary workaround, I tried bringing my DB cluster down to a
single node (this, no writeset replication), as well as making sure that
attribute spooling was turned off for ALL jobs.  Neither of these
worked.  I was able to complete backups ONLY by bringing my DB down to a
single standalone node with Galera disabled.


Until such time as there is a DB writeset size limit in Bacula, Bacula
will not be usable against Galera clusters and should probably be
presumed unusable against MySQL Group Replication.



--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Release 9.0.2

Kern Sibbald
Hello Phil,

Well, Bacula does not send millions of rows at a time.  It does already
batch data together and submits up to a maximum of 500,000 records at
one time.  It has never been necessary to change that number because any
respectable database should be able to handle a batch of 500,000 records
at a time (about 50-100MB of data)-- even a cluster, in my opinion.   I
suspect that the Galera guys are very sadly mistaken to suggest an
optimum of only 1,000 at a time for the size of the datasets we see some
Bacula customers using, and I will be *extremely* surprised if this
problem shows up in Oracle MySQL.

That said, you can experiment if you wish and try changing the maximum
number of rows (changes).  It is in <bacula>/src/cats/sql_create.c at
line 870.  If this is something that comes up frequently, we certainly
could put the maximum on a directive in the Catalog resource or perhaps
even elsewhere.

Best regards,

Kern


On 07/24/2017 04:24 PM, Phil Stracchino wrote:

> On 07/24/17 07:38, Kern Sibbald wrote:
>> Hello,
>>
>> We are pleased to announce that we have just released Bacula version 9.0.2
>>
>> This is a minor bug fix release, but a few of the bugs are important.
>> The main items fixed are:
>>
>> – Postgresql should now work with Postgresql prior to 9.0 Note: the ssl
>> connection feature added in 9.0 is not available on postgresql servers
>> older than 9.0 (it needs the new connection API).
>> – The issues with MariaDB (reconnect variable) are now fixed
>> – The problem of the btape “test” command finding a wrong number of
>> files in the append test was a bug. It is now fixed. It is unlikely that
>> it affected anything but btape.
>> – The bacula-tray-monitor.deskop script is released in the scripts
>> directory.
>> – We recommend that you build with both libz and lzo library support
>> (the developer packages must be installed when building, and the shared
>> object libraries must be installed at run time). However we have
>> modified the code so that Bacula *should* build and run with either or
>> both libz or lzo absent.
>
> Kern,
> I've just discovered a serious compatibility problem - which should,
> however, be fairly straightforward to fix.  This affects ALL Bacula
> releases.  (I'm actually still running 7.4.7, because Gentoo's Bacula
> ebuild has not yet updated to 9.0.x.)
>
>
> The problem occurs when running Bacula against a MySQL+Galera cluster.
> In this case, the cluster is MariaDB 10.1 plus Galera, but I have no
> doubt the same problem will occur with Percona XtraDB Cluster, and
> possibly with Oracle MySQL 5.7 using Group Replication, which is
> Oracle's copy of Galera synchronous replication.
>
> The problem is that Bacula apparently sends all of the records of a
> backup job to the database in a single massive blast.  Because writesets
> must be prepared in memory for certification before being committed,
> Galera has a limit on the maximum size of a writeset which can be
> applied in a single transaction.  This limit is set by two Galera
> variables, wsrep_max_ws_rows and wsrep_max_ws_size.  These default to
> 128K rows and 2GB total writeset size respectively.  If a single
> transaction exceeds either of these, it will fail.  My nightly
> incremental backups since I converted to a cluster last week have been
> working, but about half of this morning's differentials failed because
> they tried to insert too many records in a single transaction.
>
> The failure APPEARS to occur when sending spooled attributes to the
> database, and the failure point APPEARS to occur somewhere between about
> 25,000 and 60,000 files backed up, but I don't have enough information
> yet to narrow it down more closely than that.
>
> The proper fix for this is to batch these writes into chunks (honestly,
> it's best practice to do this anyway; you should never be sending
> millions of rows in a single operation because it can consume huge
> amounts of memory), and provide a Bacula configuration variable to limit
> the chunk size, with the default of 0 for "unlimited".  Users running
> standalone or asynchronous-replicated MySQL could simply leave it at the
> default, or set it to whatever they feel comfortable with as a batch
> size.  Users running against a Galera cluster or Group Replication
> should probably set the chunk size no higher than 25K.  Codership Øy,
> the creators of Galera, and Percona Software actually recommend not
> exceeding 1000 rows per writeset for best performance.
>
> I will try to find or make time to look at the code myself and see if I
> can propose a patch.
>
> As a temporary workaround, I tried bringing my DB cluster down to a
> single node (this, no writeset replication), as well as making sure that
> attribute spooling was turned off for ALL jobs.  Neither of these
> worked.  I was able to complete backups ONLY by bringing my DB down to a
> single standalone node with Galera disabled.
>
>
> Until such time as there is a DB writeset size limit in Bacula, Bacula
> will not be usable against Galera clusters and should probably be
> presumed unusable against MySQL Group Replication.
>
>
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Release 9.0.2

Phil Stracchino-2
On 07/24/17 11:26, Kern Sibbald wrote:

> Hello Phil,
>
> Well, Bacula does not send millions of rows at a time.  It does already
> batch data together and submits up to a maximum of 500,000 records at
> one time.  It has never been necessary to change that number because any
> respectable database should be able to handle a batch of 500,000 records
> at a time (about 50-100MB of data)-- even a cluster, in my opinion.   I
> suspect that the Galera guys are very sadly mistaken to suggest an
> optimum of only 1,000 at a time for the size of the datasets we see some
> Bacula customers using, and I will be *extremely* surprised if this
> problem shows up in Oracle MySQL.

It doesn't show up in MySQL using native asynchronous single-threaded
replication.  There are very sound technical reasons for it when using
Galera synchronous parallel replication, and I would not be in the least
surprised to find a similar limitation in Oracle's MySQL Group
Replication feature since Group Replication is basically Oracle's
reverse-engineered Galera replication with the serial numbers filed off.

> That said, you can experiment if you wish and try changing the maximum
> number of rows (changes).  It is in <bacula>/src/cats/sql_create.c at
> line 870.  If this is something that comes up frequently, we certainly
> could put the maximum on a directive in the Catalog resource or perhaps
> even elsewhere.

Oh good!  If it's a single change in one place, that should be very
simple to test a fix for.  I'll get right on it first chance I get.



--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Release 9.0.2

Phil Stracchino-2
In reply to this post by Kern Sibbald
On 07/24/17 11:26, Kern Sibbald wrote:

> Hello Phil,
>
> Well, Bacula does not send millions of rows at a time.  It does already
> batch data together and submits up to a maximum of 500,000 records at
> one time.  It has never been necessary to change that number because any
> respectable database should be able to handle a batch of 500,000 records
> at a time (about 50-100MB of data)-- even a cluster, in my opinion.   I
> suspect that the Galera guys are very sadly mistaken to suggest an
> optimum of only 1,000 at a time for the size of the datasets we see some
> Bacula customers using, and I will be *extremely* surprised if this
> problem shows up in Oracle MySQL.

1000 rows is not a hard limit; it is a recommended value for performance
in Galera clusters.  You can use larger transactions, but the cost is
higher memory usage and a higher chance of write commit conflicts.  My
employer had one client who was ignoring that recommendation and had ten
threads simultaneously writing blocks of 100,000 records at a time, and
then couldn't understand why they were getting write conflicts on 2 out
of 3 transactions.

The default hard limit for Galera is 128K rows and 2GB total writeset
size.  But it turns out that transactions that large seriously impact
both performance and memory use.


> That said, you can experiment if you wish and try changing the maximum
> number of rows (changes).  It is in <bacula>/src/cats/sql_create.c at
> line 870.  If this is something that comes up frequently, we certainly
> could put the maximum on a directive in the Catalog resource or perhaps
> even elsewhere.


I tested setting the limit down from 500,000 to 1000 (actually at line
866, not 870), and it had no visible impact.  However, it didn't solve
the problem, either.  The point of failure is a little before that,
starting at line 824 in bdb_write_batch_file_records( ):

   if (!db_sql_query(jcr->db_batch,
"INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
DeltaSeq) "
    "SELECT batch.FileIndex, batch.JobId, Path.PathId, "
           "Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq "
      "FROM batch "
      "JOIN Path ON (batch.Path = Path.Path) "
      "JOIN Filename ON (batch.Name = Filename.Name)",
                     NULL, NULL))
   {
      Jmsg1(jcr, M_FATAL, 0, "Fill File table %s\n", jcr->db_batch->errmsg);
      goto bail_out;
   }

This query causes a 'wsrep_max_ws_rows exceeded' failure on several of
my differential or even incremental jobs, even if the batch size limit
is set to 1000 at line 866.

This puzzled me, so I devised a way to trace the number of rows in the
batch table as the job progresses, and with the batch limit set to 1000
rows at sql_create.c:866, on a differential backup of my workstation the
row count of the batch table (and thus of the JOIN in that query) peaked
at 115,989 rows.  I believe the intention is that the batch table should
be being flushed into the main tables whenever its row count exceeds the
value set at line 866, but that isn't happening.  It appears to be
flushed only at the end of every job.

I hate to say this, but your batch size limit isn't working.




Also, looking at the DB code in general, I have to ask:  Why are you
explicitly locking tables instead of trusting the DB engine to handle
any necessary locking?  In any modern SQL DB engine, explicit table
locks by the application are nearly always a bad idea.  Not only do you
very seldom need to explicitly lock tables, but 99% of the time, you
shouldn't.  Even MySQL's ancient legacy (and all-but-officially
deprecated) MyISAM storage engine will manage all necessary locks for you.

As an experiment, I removed the table locks and unlocks from
bdb_write_batch_file_records( ), and ran several small incremental jobs
that I know will not exceed wsrep_max_ws_rows because they ran
successfully last night - and it worked just fine.  I'm pretty certain
that you'll find, at least on MySQL using InnoDB, and almost certainly
in PostgreSQL as well, that none of those table locks are actually
needed, and in fact it is highly likely they actively harm performance.
The only place I can see it being likely they may be required is in
SQLite, and I believe SQLite support is deprecated for Bacula and
scheduled for removal...?




--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Release 9.0.2

Kern Sibbald
Hello Phil,



On 07/24/2017 08:33 PM, Phil Stracchino wrote:

> On 07/24/17 11:26, Kern Sibbald wrote:
>> Hello Phil,
>>
>> Well, Bacula does not send millions of rows at a time.  It does already
>> batch data together and submits up to a maximum of 500,000 records at
>> one time.  It has never been necessary to change that number because any
>> respectable database should be able to handle a batch of 500,000 records
>> at a time (about 50-100MB of data)-- even a cluster, in my opinion.   I
>> suspect that the Galera guys are very sadly mistaken to suggest an
>> optimum of only 1,000 at a time for the size of the datasets we see some
>> Bacula customers using, and I will be *extremely* surprised if this
>> problem shows up in Oracle MySQL.
> 1000 rows is not a hard limit; it is a recommended value for performance
> in Galera clusters.
Yes, I understood that.  I maintain what I wrote.

> You can use larger transactions, but the cost is
> higher memory usage and a higher chance of write commit conflicts.  My
> employer had one client who was ignoring that recommendation and had ten
> threads simultaneously writing blocks of 100,000 records at a time, and
> then couldn't understand why they were getting write conflicts on 2 out
> of 3 transactions.
>
> The default hard limit for Galera is 128K rows and 2GB total writeset
> size.  But it turns out that transactions that large seriously impact
> both performance and memory use.
>
>
>> That said, you can experiment if you wish and try changing the maximum
>> number of rows (changes).  It is in <bacula>/src/cats/sql_create.c at
>> line 870.  If this is something that comes up frequently, we certainly
>> could put the maximum on a directive in the Catalog resource or perhaps
>> even elsewhere.
>
> I tested setting the limit down from 500,000 to 1000 (actually at line
> 866, not 870), and it had no visible impact.
You seem to be using an older Bacula.  I am referring to version 9.0.

> However, it didn't solve
> the problem, either.  The point of failure is a little before that,
> starting at line 824 in bdb_write_batch_file_records( ):
>
>     if (!db_sql_query(jcr->db_batch,
> "INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
> DeltaSeq) "
>      "SELECT batch.FileIndex, batch.JobId, Path.PathId, "
>             "Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq "
>        "FROM batch "
>        "JOIN Path ON (batch.Path = Path.Path) "
>        "JOIN Filename ON (batch.Name = Filename.Name)",
>                       NULL, NULL))
>     {
>        Jmsg1(jcr, M_FATAL, 0, "Fill File table %s\n", jcr->db_batch->errmsg);
>        goto bail_out;
>     }
>
> This query causes a 'wsrep_max_ws_rows exceeded' failure on several of
> my differential or even incremental jobs, even if the batch size limit
> is set to 1000 at line 866.
Yes, of course, that is where the actual error occurs.  I gave you the
location at which bdb_write_batch_file_records() was called, because
that is where the check on the number of inserts occurs.

>
> This puzzled me, so I devised a way to trace the number of rows in the
> batch table as the job progresses, and with the batch limit set to 1000
> rows at sql_create.c:866, on a differential backup of my workstation the
> row count of the batch table (and thus of the JOIN in that query) peaked
> at 115,989 rows.  I believe the intention is that the batch table should
> be being flushed into the main tables whenever its row count exceeds the
> value set at line 866, but that isn't happening.  It appears to be
> flushed only at the end of every job.
>
> I hate to say this, but your batch size limit isn't working.
It is possible that the person who implemented the code forgot to add an
update of the counter at all the appropriate spots.

>
>
>
>
> Also, looking at the DB code in general, I have to ask:  Why are you
> explicitly locking tables instead of trusting the DB engine to handle
> any necessary locking?  In any modern SQL DB engine, explicit table
> locks by the application are nearly always a bad idea.  Not only do you
> very seldom need to explicitly lock tables, but 99% of the time, you
> shouldn't.  Even MySQL's ancient legacy (and all-but-officially
> deprecated) MyISAM storage engine will manage all necessary locks for you.
>
> As an experiment, I removed the table locks and unlocks from
> bdb_write_batch_file_records( ), and ran several small incremental jobs
> that I know will not exceed wsrep_max_ws_rows because they ran
> successfully last night - and it worked just fine.  I'm pretty certain
> that you'll find, at least on MySQL using InnoDB, and almost certainly
> in PostgreSQL as well, that none of those table locks are actually
> needed, and in fact it is highly likely they actively harm performance.
> The only place I can see it being likely they may be required is in
> SQLite, and I believe SQLite support is deprecated for Bacula and
> scheduled for removal...?
If you want to try to remove table locks, you are completely on your
own, and I will postulate 99% that you will encounter conflicts.
If the "table locks" you mention are calls to bdb_lock(), these are not
table locks but a global database lock that are important to the proper
functioning of the batch insert code as implemented in Bacula.  That
said, it is remotely possible that they may be less critical due to some
recent changes in how we use the database, but I am not prepared to try
to remove them especially since we are not seeing performance problems
even with monster databases.

Best regards,
Kern


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Release 9.0.2

Phil Stracchino-2
On 07/25/17 05:24, Kern Sibbald wrote:
> Hello Phil,

>> I tested setting the limit down from 500,000 to 1000 (actually at line
>> 866, not 870), and it had no visible impact.
> You seem to be using an older Bacula.  I am referring to version 9.0.

Yes, I'm still waiting for the Gentoo ebuild to update.  But if it takes
much longer, I'll update it myself.  I really should retest in 9.0.



--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Release 9.0.2

Phil Stracchino-2
On 07/25/17 09:10, Phil Stracchino wrote:
> Yes, I'm still waiting for the Gentoo ebuild to update.  But if it takes
> much longer, I'll update it myself.  I really should retest in 9.0.

I am informed by the maintainer that the Gentoo ebuild should be coming
sometime in the next week or so.


--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: +1.603.293.8485
  Mobile:   +1.603.998.6958

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-devel
Loading...