Recreate boostrap files for jobs from the output of `bls -j`

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Recreate boostrap files for jobs from the output of `bls -j`

Konstantin Khomoutov
Hi!

I have a somewhat unusual Bacula setup as we're backing call records
produced by a call-center software -- so the files are tiny in size,
there is very large number of them produced daily, they are all unique
and we have to keep them for several years as required by the local
law.

Due to the uniqueness + tiny size + large quantities property of these
data, we have decided to keep only the information on the last few
backup session in the database to keep its size manageable, and resort
to "painful" restoration of old files -- for which job records are
expired by the volumes are still there -- using `bextract`.

To do restores using `bextract` we have lists of files backed during
each backup session saved in flat files, and we also have Bacula
writing bootstrap file for each backup session.  So the restoration
of old files basically rolls like this:

1) Know the date of the files to be restored.
2) Find what backup session backed them up.
3) Grab the matching bootstrap file.
4) Use `bextract -b boostrap.bsr -V ... /output/dir`.

The problem is that as it turned out, Bacula silently wasn't generating
bootstrap files for our backup jobs despite having

  Write bootstrap = "/path/to/bootstrap/files/bsr-%j"

directive.  Okay, that's for another question.

So when we need to restore an old file, we know (1) and (2) but fail at
(3).  I learned that `bls -j` is able to dump the list of jobs recorded
on a given volume.  But what I'd really like to know is how could I
recreate bootstrap files from the output of `bls -j` run on old volumes?

Programming the solution is not the problem but I'd like to know how do
I properly form the contents for such bootstrap files given the records
I'd parse from the output of `bls -j`.
The problem I have with interpreting them is that `bls -j` gives
me out two pairs of file+block numbers identifying each backup session.
That appears to be neatly covering the range of data occupied by a
single backup job run but how do I properly put VolFile and VolBlock
ranges if a job's data crosses the boundary of a file?

For instance, here's a fragment from the real output for one of my
volumes (elided a bit for terseness):

Begin Job Session Record: File:blk=338:1 SessId=142 \
   SessTime=1427996423 JobId=5007
  Job=call-recs.2015-05-12_14.00.00_41 ...
End Job Session Record: File:blk=339:10868 SessId=142 \
   SessTime=1427996423 JobId=5007
  Date=12-May-2015 14:44:22 Level=F Type=B ...

Here, the job starts at 338:1 and ends at 339:10868, how do I write out
the bootstrap file's data to cover that whole range in it?
>From [1], it appears that I could use

  VolFile=338-339

to cover the files but the VolBlock spec appears to pertain to a single
VolFile.  I thought of using multiple VolFile + VolBlock pairs but
that would require me to use an open range for the VolBlock of the
first pair, something like

  VolFile=338
  VolBlock=1-
  VolFile=339
  VolBlock=1-10868

(and by the way, are blocks numbered from 0 or 1?)

Is there a solution for my case?
Or should I maybe look for some other data from the output of `bls -j`?

FWIW, for these backups, there's always a single client backing up to
that tape at any given time, so the jobs do not have their data
intermixed on the tape.

1. http://www.bacula.org/5.1.x-manuals/de/main/main/Bootstrap_File.html

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recreate boostrap files for jobs from the output of `bls -j`

andreas nastke
how about archiving the 'tiny' items into larger tar- or zip-files
on a regular basis (say every 15 minutes) and then only backup the
archives.


Konstantin Khomoutov schrieb:

> Hi!
>
> I have a somewhat unusual Bacula setup as we're backing call records
> produced by a call-center software -- so the files are tiny in size,
> there is very large number of them produced daily, they are all unique
> and we have to keep them for several years as required by the local
> law.
>
> Due to the uniqueness + tiny size + large quantities property of these
> data, we have decided to keep only the information on the last few
> backup session in the database to keep its size manageable, and resort
> to "painful" restoration of old files -- for which job records are
> expired by the volumes are still there -- using `bextract`.
>
> To do restores using `bextract` we have lists of files backed during
> each backup session saved in flat files, and we also have Bacula
> writing bootstrap file for each backup session.  So the restoration
> of old files basically rolls like this:
>
> 1) Know the date of the files to be restored.
> 2) Find what backup session backed them up.
> 3) Grab the matching bootstrap file.
> 4) Use `bextract -b boostrap.bsr -V ... /output/dir`.
>
> The problem is that as it turned out, Bacula silently wasn't generating
> bootstrap files for our backup jobs despite having
>
>   Write bootstrap = "/path/to/bootstrap/files/bsr-%j"
>
> directive.  Okay, that's for another question.
>
> So when we need to restore an old file, we know (1) and (2) but fail at
> (3).  I learned that `bls -j` is able to dump the list of jobs recorded
> on a given volume.  But what I'd really like to know is how could I
> recreate bootstrap files from the output of `bls -j` run on old volumes?
>
> Programming the solution is not the problem but I'd like to know how do
> I properly form the contents for such bootstrap files given the records
> I'd parse from the output of `bls -j`.
> The problem I have with interpreting them is that `bls -j` gives
> me out two pairs of file+block numbers identifying each backup session.
> That appears to be neatly covering the range of data occupied by a
> single backup job run but how do I properly put VolFile and VolBlock
> ranges if a job's data crosses the boundary of a file?
>
> For instance, here's a fragment from the real output for one of my
> volumes (elided a bit for terseness):
>
> Begin Job Session Record: File:blk=338:1 SessId=142 \
>    SessTime=1427996423 JobId=5007
>   Job=call-recs.2015-05-12_14.00.00_41 ...
> End Job Session Record: File:blk=339:10868 SessId=142 \
>    SessTime=1427996423 JobId=5007
>   Date=12-May-2015 14:44:22 Level=F Type=B ...
>
> Here, the job starts at 338:1 and ends at 339:10868, how do I write out
> the bootstrap file's data to cover that whole range in it?
>>From [1], it appears that I could use
>
>   VolFile=338-339
>
> to cover the files but the VolBlock spec appears to pertain to a single
> VolFile.  I thought of using multiple VolFile + VolBlock pairs but
> that would require me to use an open range for the VolBlock of the
> first pair, something like
>
>   VolFile=338
>   VolBlock=1-
>   VolFile=339
>   VolBlock=1-10868
>
> (and by the way, are blocks numbered from 0 or 1?)
>
> Is there a solution for my case?
> Or should I maybe look for some other data from the output of `bls -j`?
>
> FWIW, for these backups, there's always a single client backing up to
> that tape at any given time, so the jobs do not have their data
> intermixed on the tape.
>
> 1. http://www.bacula.org/5.1.x-manuals/de/main/main/Bootstrap_File.html
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Bacula-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bacula-users

--
Mit besten Grüßen / Kind Regards

Andreas Nastke
IT System Management

g/d/p Markt- und Sozialforschung GmbH
Ein Unternehmen der Forschungsgruppe g/d/p
Richardstr. 18
D-22081 Hamburg
Fon: +49 (0)40 / 29876-117
Fax: +49 (0)40 / 29876-127
[hidden email]
www.gdp-group.com

Sitz der Gesellschaft ist Hamburg, Handelsregister Hamburg, HRB 40482
Geschäftsführer: Christa Braaß, Volker Rohweder

-----------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information.  If
you are not the intended recipient please notify the sender and  delete
this e-mail from your whole system. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.
-----------------------------------------------------------------------

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recreate boostrap files for jobs from the output of `bls -j`

Konstantin Khomoutov
On Wed, 22 Mar 2017 10:31:16 +0100
Andreas Nastke <[hidden email]> wrote:

> how about archiving the 'tiny' items into larger tar- or zip-files
> on a regular basis (say every 15 minutes) and then only backup the
> archives.

The thing is that having "low-level addressing" data in a bootstrap
file allows Bacula to be blazing fast in positioning the tape to the
beginning of the required backup's data.

If the only thing you know the name of the file, you basically need
Bacula to read all the entries from the beginning of the volume one by
one until it finds the one with the matching name.  That's why it would
literally buy us nothing: we already have lists of files to supply
`bextract` via its "-i" command-line option.  Restoration in this case
takes several hours for an LTO-4 drive -- spent mostly in reading the
headers of the unwanted files and skipping past them.

IOW proper boostrap files serve like "indexes" to the tape's data.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recreate boostrap files for jobs from the output of `bls -j`

Konstantin Khomoutov
In reply to this post by Konstantin Khomoutov
On Wed, 22 Mar 2017 12:07:55 +0300
Konstantin Khomoutov <[hidden email]> wrote:

[...]
> Due to the uniqueness + tiny size + large quantities property of these
> data, we have decided to keep only the information on the last few
> backup session in the database to keep its size manageable, and resort
> to "painful" restoration of old files -- for which job records are
> expired by the volumes are still there -- using `bextract`.
[...]

> Here, the job starts at 338:1 and ends at 339:10868, how do I write
> out the bootstrap file's data to cover that whole range in it?
> From [1], it appears that I could use
>
>   VolFile=338-339
>
> to cover the files but the VolBlock spec appears to pertain to a
> single VolFile.  I thought of using multiple VolFile + VolBlock pairs
> but that would require me to use an open range for the VolBlock of the
> first pair, something like
[...]

Looks like I could combine the output of `bls -j` with the output of
`bls -v` run on the same volume: from the former, I'd get the starting
and ending VolFile-s, and from the latter, I'd get a range of FileIndex
values and/or VolSessionId+VolSessionTime tuple which would cover both
"coarse-grained" tape positioning and "fine-grained" selection of what
to restore.

Still, I'd love to hear opinions on this matter from those well-versed
in how restoration from the tape works.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recreate boostrap files for jobs from the output of `bls -j`

Josip Deanovic
In reply to this post by Konstantin Khomoutov
On Wednesday 2017-03-22 12:07:55 Konstantin Khomoutov wrote:

> For instance, here's a fragment from the real output for one of my
> volumes (elided a bit for terseness):
>
> Begin Job Session Record: File:blk=338:1 SessId=142 \
>    SessTime=1427996423 JobId=5007
>   Job=call-recs.2015-05-12_14.00.00_41 ...
> End Job Session Record: File:blk=339:10868 SessId=142 \
>    SessTime=1427996423 JobId=5007
>   Date=12-May-2015 14:44:22 Level=F Type=B ...
>
> Here, the job starts at 338:1 and ends at 339:10868, how do I write out
> the bootstrap file's data to cover that whole range in it?
>
> >From [1], it appears that I could use
>
>   VolFile=338-339
>
> to cover the files but the VolBlock spec appears to pertain to a single
> VolFile.  I thought of using multiple VolFile + VolBlock pairs but
> that would require me to use an open range for the VolBlock of the
> first pair, something like
>
>   VolFile=338
>   VolBlock=1-
>   VolFile=339
>   VolBlock=1-10868


Here is the example of the bsr file format I have used in the past
(in this example a job spreads through two volumes):

Volume = volume-0003
VolSessionId = 592
VolSessionTime = 1435614003
Volume = volume-0004
VolSessionId = 592
VolSessionTime = 1435614003


The above example was created from the output of bls -j <volume>
VolSessionId in the above example is the same as the SessId in
the bls output and VolSessionTime in the above example is the
same as the SessTime in the bls output.

That should be enough to get the specific job restored using the
bextract tool but I tested it only with file storage where volumes
are files on the disk and not the actual tapes.
However, I believe that there shouldn't be much difference in this
case.


Regards

--
Josip Deanovic

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recreate boostrap files for jobs from the output of `bls -j`

Martin Simmons
In reply to this post by Konstantin Khomoutov
>>>>> On Wed, 22 Mar 2017 13:52:50 +0300, Konstantin Khomoutov said:
>
> On Wed, 22 Mar 2017 12:07:55 +0300
> Konstantin Khomoutov <[hidden email]> wrote:
>
> [...]
> > Here, the job starts at 338:1 and ends at 339:10868, how do I write
> > out the bootstrap file's data to cover that whole range in it?
> > From [1], it appears that I could use
> >
> >   VolFile=338-339
> >
> > to cover the files but the VolBlock spec appears to pertain to a
> > single VolFile.  I thought of using multiple VolFile + VolBlock pairs
> > but that would require me to use an open range for the VolBlock of the
> > first pair, something like
> [...]
>
> Looks like I could combine the output of `bls -j` with the output of
> `bls -v` run on the same volume: from the former, I'd get the starting
> and ending VolFile-s, and from the latter, I'd get a range of FileIndex
> values and/or VolSessionId+VolSessionTime tuple which would cover both
> "coarse-grained" tape positioning and "fine-grained" selection of what
> to restore.
>
> Still, I'd love to hear opinions on this matter from those well-versed
> in how restoration from the tape works.

The bsr format has an undocumented keyword VolAddr (which Bacula uses when it
writes a bsr).  When present, it overrides VolFile and VolBlock like this:

VolAddr = (VolFile << 32) + VolBlock

__Martin

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recreate boostrap files for jobs from the output of `bls -j`

Kern Sibbald
Hello Martin,

Thanks for mentioning that VolAddr is undocumented.  That was an
oversight that I will fix.

Please be aware that VolAddr is a relatively new bsr keyword, so it will
not work on older Bacula's such as 5.2.x.  I don't remember when it was
implemented.

Note also: VolAddr is formed as you say for File and Tape volumes, but
in the next version of Bacula, VolAddr is used for Cloud Volumes and the
algorithm is not the same.  I will document it once it is released.  
Bottom line each driver, File, Tape, Aligned, Dedup, Cloud, ... has its
own 64 bit VolAddr representation of the address.  In this next version
for File volumes for example, there is no longer any concept of File and
Block there is only a 64 bit address.  For tapes, due to their
architecture (currently) we still must have File and Block.

Best regards,

Kern


On 03/22/2017 01:58 PM, Martin Simmons wrote:

>>>>>> On Wed, 22 Mar 2017 13:52:50 +0300, Konstantin Khomoutov said:
>> On Wed, 22 Mar 2017 12:07:55 +0300
>> Konstantin Khomoutov <[hidden email]> wrote:
>>
>> [...]
>>> Here, the job starts at 338:1 and ends at 339:10868, how do I write
>>> out the bootstrap file's data to cover that whole range in it?
>>>  From [1], it appears that I could use
>>>
>>>    VolFile=338-339
>>>
>>> to cover the files but the VolBlock spec appears to pertain to a
>>> single VolFile.  I thought of using multiple VolFile + VolBlock pairs
>>> but that would require me to use an open range for the VolBlock of the
>>> first pair, something like
>> [...]
>>
>> Looks like I could combine the output of `bls -j` with the output of
>> `bls -v` run on the same volume: from the former, I'd get the starting
>> and ending VolFile-s, and from the latter, I'd get a range of FileIndex
>> values and/or VolSessionId+VolSessionTime tuple which would cover both
>> "coarse-grained" tape positioning and "fine-grained" selection of what
>> to restore.
>>
>> Still, I'd love to hear opinions on this matter from those well-versed
>> in how restoration from the tape works.
> The bsr format has an undocumented keyword VolAddr (which Bacula uses when it
> writes a bsr).  When present, it overrides VolFile and VolBlock like this:
>
> VolAddr = (VolFile << 32) + VolBlock
>
> __Martin
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Bacula-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recreate boostrap files for jobs from the output of `bls -j`

Konstantin Khomoutov
On Thu, 23 Mar 2017 09:58:35 +0100
Kern Sibbald <[hidden email]> wrote:

> Thanks for mentioning that VolAddr is undocumented.  That was an
> oversight that I will fix.
>
> Please be aware that VolAddr is a relatively new bsr keyword, so it
> will not work on older Bacula's such as 5.2.x.  I don't remember when
> it was implemented.

Too bad for us: for administrative reasons we're stuck with 5.2.6 on
the box which is our tape SD.

> Note also: VolAddr is formed as you say for File and Tape volumes,
> but in the next version of Bacula, VolAddr is used for Cloud Volumes
> and the algorithm is not the same.  I will document it once it is
> released. Bottom line each driver, File, Tape, Aligned, Dedup,
> Cloud, ... has its own 64 bit VolAddr representation of the address.
> In this next version for File volumes for example, there is no longer
> any concept of File and Block there is only a 64 bit address.  For
> tapes, due to their architecture (currently) we still must have File
> and Block.

OK, so given that I can't use VolAddr with 5.2.6's bextract, is there a
way to make use of the two pairs of File+Block values extracted for a
particular job from the output of `bls -j` to create a sensible spec
for the bootstrap file to speed up extracts using `bextract`?

To provide an "executive summary" of our situation:

- We have old tapes with many backup sessions.  These sessions are
  always recorded sequentially; no jobs overlap.

- `bls -j` extracts two pairs of File+Block values for each job:
  the first denotes the beginning of the job, and the latter -- its end.

- We'd like to write a script which would consume the output of `bls -j`
  and produce a single bootstrap file for each job found, and we'd like
  these bootstrap files to somehow refer to those low-level "addresses"
  to speed up extractions via `bextract`.

Is there a way to do what we need w/o using VolAddr?

(Extraction based only on the list of file names takes more that two
hours for our LTO-4 drive and a full tape; this is unwieldy, and it's
sad to watch `bextract` scan the whole tape while it could be told to
just scan from there to there -- if only we knew how to do that.)

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recreate boostrap files for jobs from the output of `bls -j`

Konstantin Khomoutov
On Fri, 24 Mar 2017 14:01:39 +0300
Konstantin Khomoutov <[hidden email]> wrote:

Answering my other question:

> > Thanks for mentioning that VolAddr is undocumented.  That was an
> > oversight that I will fix.
> >
> > Please be aware that VolAddr is a relatively new bsr keyword, so it
> > will not work on older Bacula's such as 5.2.x.  I don't remember
> > when it was implemented.
>
> Too bad for us: for administrative reasons we're stuck with 5.2.6 on
> the box which is our tape SD.
[...]
> OK, so given that I can't use VolAddr with 5.2.6's bextract, is there
> a way to make use of the two pairs of File+Block values extracted for
> a particular job from the output of `bls -j` to create a sensible spec
> for the bootstrap file to speed up extracts using `bextract`?
[...]

As it turned out -- I've tried to do a restore of the most recent such
backup to have Bacula generate a bootstrap file for me, -- 5.2.6 is
perfectly able to make use of the VolAddr parameter and uses it.

So there's the way to go for us.

Thank you, Martin and Kern!

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Recreate boostrap files for jobs from the output of `bls -j`

Kern Sibbald
Hello,

That is good to know.

You probably already know two things, but I mention them for those who
do not know:

1. There is a chapter (or at least a section) in the main manual on how
to create bootstrap files from Job output.

2. Bacula 7.4.4 or any other Bacula is perfectly capable of reading
tapes written by Bacula's much older than yours, so I do not understand
why you feel obliged to remain on Bacula 5.2.6. Of course, upgrading
from Bacula 5.2.6 to Bacula 7.4.7 is a fairly time consuming task.

Best regards,

Kern



On 03/24/2017 01:20 PM, Konstantin Khomoutov wrote:

> On Fri, 24 Mar 2017 14:01:39 +0300
> Konstantin Khomoutov <[hidden email]> wrote:
>
> Answering my other question:
>
>>> Thanks for mentioning that VolAddr is undocumented.  That was an
>>> oversight that I will fix.
>>>
>>> Please be aware that VolAddr is a relatively new bsr keyword, so it
>>> will not work on older Bacula's such as 5.2.x.  I don't remember
>>> when it was implemented.
>> Too bad for us: for administrative reasons we're stuck with 5.2.6 on
>> the box which is our tape SD.
> [...]
>> OK, so given that I can't use VolAddr with 5.2.6's bextract, is there
>> a way to make use of the two pairs of File+Block values extracted for
>> a particular job from the output of `bls -j` to create a sensible spec
>> for the bootstrap file to speed up extracts using `bextract`?
> [...]
>
> As it turned out -- I've tried to do a restore of the most recent such
> backup to have Bacula generate a bootstrap file for me, -- 5.2.6 is
> perfectly able to make use of the VolAddr parameter and uses it.
>
> So there's the way to go for us.
>
> Thank you, Martin and Kern!
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Loading...