Restores very slow while selecting files

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Restores very slow while selecting files

Tom Yates
I've got a fairly big filesystem (3TB, 15M files) of which I want to
(test) restore a part.  I know that if the backend DB is slow the
"Building file list" stage can take some time, but I have it striped over
a 5-SAS-disc RAID-0, and this step takes only about eight minutes.

The problems start once I navigate to the directory I want restored
(which admittedly contains the bulk of the files and about half the total
space), and do an "add home".

The current job has been stuck on this step for over 15 hours, now.  When
I strace bacula-dir I see a lot of:

[pid 26711] write(6, "P\0\0\0\3SELECT FilenameId FROM File"..., 84) = 84
[pid 26711] read(6, "\1\0\0\1\1@\0\0\2\3def\6bacula\10Filename\10Fi"..., 16384) = 102
[pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 26711] write(6, "m\0\0\0\3SELECT FileId, LStat, MD5 F"..., 113) = 113
[pid 26711] read(6, "\1\0\0\1\0030\0\0\2\3def\6bacula\4File\4File\6F"..., 16384) = 249
[pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 26711] write(6, "P\0\0\0\3SELECT FilenameId FROM File"..., 84) = 84
[pid 26711] read(6, "\1\0\0\1\1@\0\0\2\3def\6bacula\10Filename\10Fi"..., 16384) = 102
[pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 26711] write(6, "m\0\0\0\3SELECT FileId, LStat, MD5 F"..., 113) = 113
[pid 26711] read(6, "\1\0\0\1\0030\0\0\2\3def\6bacula\4File\4File\6F"..., 16384) = 250
[pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)

So I presume it's stepping through the built directory tree querying the
database about each of these files.  Problem is that any restore that
takes ~24 hours just to kick off is not making my clients happy.

The CentOS 6 server has 16GB of memory and does not seem short of it
(negligible swap usage).  We're currently using the CentOS 6 bacula
packages, which are v5.0.0.  I tried building 5.2.13 from source,
upgrading, and running that, but it wasn't noticeably better, so I
downgraded again.  I'm happy to go to a still-later version if there is
reason to think that this step is better optimised in that version.  If
building custom indexes would help, I'm open to that, too.  If I'm doing
something fundamentally stupid, it would be really useful to know!

Apart from "don't restore your home area", does anyone have any advice?
Thanks.


--

    Tom Yates - Teaparty Network Central - +44/0 1223 704038


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|

Re: Restores very slow while selecting files

francisco javier funes nieto
The missing question, which Database Catalog are you using ? 

El 12 abr. 2017 9:26 a. m., "Tom Yates" <[hidden email]> escribió:
I've got a fairly big filesystem (3TB, 15M files) of which I want to
(test) restore a part.  I know that if the backend DB is slow the
"Building file list" stage can take some time, but I have it striped over
a 5-SAS-disc RAID-0, and this step takes only about eight minutes.

The problems start once I navigate to the directory I want restored
(which admittedly contains the bulk of the files and about half the total
space), and do an "add home".

The current job has been stuck on this step for over 15 hours, now.  When
I strace bacula-dir I see a lot of:

[pid 26711] write(6, "P\0\0\0\3SELECT FilenameId FROM File"..., 84) = 84
[pid 26711] read(6, "\1\0\0\1\1@\0\0\2\3def\6bacula\10Filename\10Fi"..., 16384) = 102
[pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 26711] write(6, "m\0\0\0\3SELECT FileId, LStat, MD5 F"..., 113) = 113
[pid 26711] read(6, "\1\0\0\1\0030\0\0\2\3def\6bacula\4File\4File\6F"..., 16384) = 249
[pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 26711] write(6, "P\0\0\0\3SELECT FilenameId FROM File"..., 84) = 84
[pid 26711] read(6, "\1\0\0\1\1@\0\0\2\3def\6bacula\10Filename\10Fi"..., 16384) = 102
[pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
[pid 26711] write(6, "m\0\0\0\3SELECT FileId, LStat, MD5 F"..., 113) = 113
[pid 26711] read(6, "\1\0\0\1\0030\0\0\2\3def\6bacula\4File\4File\6F"..., 16384) = 250
[pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)

So I presume it's stepping through the built directory tree querying the
database about each of these files.  Problem is that any restore that
takes ~24 hours just to kick off is not making my clients happy.

The CentOS 6 server has 16GB of memory and does not seem short of it
(negligible swap usage).  We're currently using the CentOS 6 bacula
packages, which are v5.0.0.  I tried building 5.2.13 from source,
upgrading, and running that, but it wasn't noticeably better, so I
downgraded again.  I'm happy to go to a still-later version if there is
reason to think that this step is better optimised in that version.  If
building custom indexes would help, I'm open to that, too.  If I'm doing
something fundamentally stupid, it would be really useful to know!

Apart from "don't restore your home area", does anyone have any advice?
Thanks.


--

    Tom Yates - Teaparty Network Central - +44/0 1223 704038


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|

Re: Restores very slow while selecting files

Tom Yates
On Wed, 12 Apr 2017, Francisco Javier Funes Nieto wrote:

> The missing question, which Database Catalog are you using ? 

The catalogue database is on MySQL, again using the version that comes
with CentOS 6 (5.1.73).


--

   Tom Yates  -  http://www.teaparty.net
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|

Re: Restores very slow while selecting files

Martin Simmons
In reply to this post by Tom Yates
>>>>> On Wed, 12 Apr 2017 08:03:32 +0100 (BST), Tom Yates said:
>
> I've got a fairly big filesystem (3TB, 15M files) of which I want to
> (test) restore a part.  I know that if the backend DB is slow the
> "Building file list" stage can take some time, but I have it striped over
> a 5-SAS-disc RAID-0, and this step takes only about eight minutes.
>
> The problems start once I navigate to the directory I want restored
> (which admittedly contains the bulk of the files and about half the total
> space), and do an "add home".
>
> The current job has been stuck on this step for over 15 hours, now.  When
> I strace bacula-dir I see a lot of:
>
> [pid 26711] write(6, "P\0\0\0\3SELECT FilenameId FROM File"..., 84) = 84
> [pid 26711] read(6, "\1\0\0\1\1@\0\0\2\3def\6bacula\10Filename\10Fi"..., 16384) = 102
> [pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
> [pid 26711] write(6, "m\0\0\0\3SELECT FileId, LStat, MD5 F"..., 113) = 113
> [pid 26711] read(6, "\1\0\0\1\0030\0\0\2\3def\6bacula\4File\4File\6F"..., 16384) = 249
> [pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
> [pid 26711] write(6, "P\0\0\0\3SELECT FilenameId FROM File"..., 84) = 84
> [pid 26711] read(6, "\1\0\0\1\1@\0\0\2\3def\6bacula\10Filename\10Fi"..., 16384) = 102
> [pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
> [pid 26711] write(6, "m\0\0\0\3SELECT FileId, LStat, MD5 F"..., 113) = 113
> [pid 26711] read(6, "\1\0\0\1\0030\0\0\2\3def\6bacula\4File\4File\6F"..., 16384) = 250
> [pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
>
> So I presume it's stepping through the built directory tree querying the
> database about each of these files.  Problem is that any restore that
> takes ~24 hours just to kick off is not making my clients happy.

Does that file tree have a lot of hard links (I think the add command only
makes those queries for hard links)?  If so, then using Bacula 7 might help
(see "restore optimizespeed" in
http://www.bacula.org/downloads/Bacula-7.4.0/ReleaseNotes).

__Martin

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|

Re: Restores very slow while selecting files

Tom Yates
On Wed, 12 Apr 2017, Martin Simmons wrote:

> Does that file tree have a lot of hard links (I think the add command only
> makes those queries for hard links)?  If so, then using Bacula 7 might help
> (see "restore optimizespeed" in
> http://www.bacula.org/downloads/Bacula-7.4.0/ReleaseNotes).

That might well be it.  "find . -type f -links +1" says that, of the ten
million or so files in that tree, around a million have more than one
hard link (some have several hundred, don't ask me why).

If the client will permit it, I'll investigate "restore optimizespeed" and
report back.  Thank you!


   Tom Yates
   Cambridge, UK.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|

Re: Restores very slow while selecting files

Kern Sibbald
In reply to this post by Tom Yates
Hello,

Bacula was designed to handle a maximum of 10M files (Bacula 5.0.x).  
Since then file systems have grown a lot and so has Bacula.  We have
redesigned Bacula a number of times to be able to cope with 50M or more
files.  From what I see, your problems are in the following areas:

1. You are running what I would call an antiquated Bacula -- moving to
5.2.x will probably not help much.  To make a significant improvement
you should move up to version 7.4.7.

2. MySQL is a great database, but it is primarily designed to be fast
for web servers.  It is also easy to use in Bacula. However, if you want
real performance you will need to use PostgreSQL.  I switched about 5
years ago and have not regretted it at all.

3. For both MySQL and PostgreSQL, you need to do a certain minimum
tuning to make them preform well with Bacula.

4. To make building the restore tree fast, you need lots of memory.  For
the number of files you have, you probably need 32GB.  You need to
allocate up to 1/3 to PostgreSQL, which will use it only if it needs it.

5. Trying to make custom indexes usually results in disaster (very bad
performance) in some other area that is discovered later.  The same goes
for programs such as PostgreSQL, if you give it too much memory it will
actually run slower.  Stick to the developer's recommendations, which
unfortunately not configured by default.

6. You need to be 100% sure that batch insert is turned on (normally it
is by default).

7. Martin gave a good tip about "optimizespeed".

As far as I can tell, this problem has nothing to do with spooling and such.

Best regards,

Kern



On 04/12/2017 09:03 AM, Tom Yates wrote:

> I've got a fairly big filesystem (3TB, 15M files) of which I want to
> (test) restore a part.  I know that if the backend DB is slow the
> "Building file list" stage can take some time, but I have it striped over
> a 5-SAS-disc RAID-0, and this step takes only about eight minutes.
>
> The problems start once I navigate to the directory I want restored
> (which admittedly contains the bulk of the files and about half the total
> space), and do an "add home".
>
> The current job has been stuck on this step for over 15 hours, now.  When
> I strace bacula-dir I see a lot of:
>
> [pid 26711] write(6, "P\0\0\0\3SELECT FilenameId FROM File"..., 84) = 84
> [pid 26711] read(6, "\1\0\0\1\1@\0\0\2\3def\6bacula\10Filename\10Fi"..., 16384) = 102
> [pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
> [pid 26711] write(6, "m\0\0\0\3SELECT FileId, LStat, MD5 F"..., 113) = 113
> [pid 26711] read(6, "\1\0\0\1\0030\0\0\2\3def\6bacula\4File\4File\6F"..., 16384) = 249
> [pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
> [pid 26711] write(6, "P\0\0\0\3SELECT FilenameId FROM File"..., 84) = 84
> [pid 26711] read(6, "\1\0\0\1\1@\0\0\2\3def\6bacula\10Filename\10Fi"..., 16384) = 102
> [pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
> [pid 26711] write(6, "m\0\0\0\3SELECT FileId, LStat, MD5 F"..., 113) = 113
> [pid 26711] read(6, "\1\0\0\1\0030\0\0\2\3def\6bacula\4File\4File\6F"..., 16384) = 250
> [pid 26711] poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
>
> So I presume it's stepping through the built directory tree querying the
> database about each of these files.  Problem is that any restore that
> takes ~24 hours just to kick off is not making my clients happy.
>
> The CentOS 6 server has 16GB of memory and does not seem short of it
> (negligible swap usage).  We're currently using the CentOS 6 bacula
> packages, which are v5.0.0.  I tried building 5.2.13 from source,
> upgrading, and running that, but it wasn't noticeably better, so I
> downgraded again.  I'm happy to go to a still-later version if there is
> reason to think that this step is better optimised in that version.  If
> building custom indexes would help, I'm open to that, too.  If I'm doing
> something fundamentally stupid, it would be really useful to know!
>
> Apart from "don't restore your home area", does anyone have any advice?
> Thanks.
>
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|

Re: Restores very slow while selecting files

Tom Yates
In reply to this post by Tom Yates
On Thu, 13 Apr 2017, Tom Yates wrote:

> On Wed, 12 Apr 2017, Martin Simmons wrote:
>
>> Does that file tree have a lot of hard links (I think the add command only
>> makes those queries for hard links)?  If so, then using Bacula 7 might help
>> (see "restore optimizespeed" in
>> http://www.bacula.org/downloads/Bacula-7.4.0/ReleaseNotes).
>
> That might well be it.  "find . -type f -links +1" says that, of the ten
> million or so files in that tree, around a million have more than one
> hard link (some have several hundred, don't ask me why).
>
> If the client will permit it, I'll investigate "restore optimizespeed" and
> report back.  Thank you!

So it turns out that going to 7.4.7 was enough.  The FD clients all stayed
on CentOS 6's 5.0.0, and seem to be fine (though testing continues).
"optimizespeed=true" seems to be the default in 7.x; in the first test the
upgrade cut the time for the "add home" phase from twenty-some HOURS to
about eight SECONDS.  We have made no further changes, though we
gratefully note Kern's list of other improvements we could make if things
start to drag again.

Thanks to all, but especially Martin and Kern, for help with this.
Bacula's back on the menu!


   Tom Yates
   Cambridge, UK.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|

Re: Restores very slow while selecting files

Kern Sibbald
Hello Tom,

Thanks for the feedback.  I am pleased that you got such a nice
improvement in performance.  From the numbers you cite, it doesn't seem
likely you will need any of the other ideas for possible performance
improvements :-)

Best regards,

Kern


On 04/21/2017 10:02 AM, Tom Yates wrote:

> On Thu, 13 Apr 2017, Tom Yates wrote:
>
>> On Wed, 12 Apr 2017, Martin Simmons wrote:
>>
>>> Does that file tree have a lot of hard links (I think the add command only
>>> makes those queries for hard links)?  If so, then using Bacula 7 might help
>>> (see "restore optimizespeed" in
>>> http://www.bacula.org/downloads/Bacula-7.4.0/ReleaseNotes).
>> That might well be it.  "find . -type f -links +1" says that, of the ten
>> million or so files in that tree, around a million have more than one
>> hard link (some have several hundred, don't ask me why).
>>
>> If the client will permit it, I'll investigate "restore optimizespeed" and
>> report back.  Thank you!
> So it turns out that going to 7.4.7 was enough.  The FD clients all stayed
> on CentOS 6's 5.0.0, and seem to be fine (though testing continues).
> "optimizespeed=true" seems to be the default in 7.x; in the first test the
> upgrade cut the time for the "add home" phase from twenty-some HOURS to
> about eight SECONDS.  We have made no further changes, though we
> gratefully note Kern's list of other improvements we could make if things
> start to drag again.
>
> Thanks to all, but especially Martin and Kern, for help with this.
> Bacula's back on the menu!
>
>
>     Tom Yates
>     Cambridge, UK.
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Bacula-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users