opportunistic backup?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

opportunistic backup?

Paul J R
Hi All,

I have a data set that i'd like to backup thats large and not very
important. Backing it up is a "nice to have" not a must have. I've been
trying to find a way to back it up to disk that isnt disruptive to the
normal flow of backups, but everytime i end up in a place where bacula
wants to do a full backup of it (which takes too long and ends up
getting cancelled).

Currently im using 7.0.5 and something i noticed in the 7.4 tree is the
ability to resume stopped jobs but from my brief testing of it I wont
quite do what Im after either. Ideally what im trying to achieve is give
bacula 1 hour a night to backup as much as it can and then stop. Setting
a time limit doesnt work cause the backup just gets cancelled and it
forgets everything its backed up already and tries to start from scratch
again the following night.

VirtualFull doesnt really do what im after either and i've also tried
populating the database directly in a way that makes it think its
already got a full backup (varying results, and none of them fantastic).
A full backup of the dataset in one hit isnt realistically achievable.

Before I give up though, im curious if anyone has tried doing similar
and what results/ideas they had that might work?

Thanks in advance!





------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: opportunistic backup?

Phil Stracchino-2
On 11/16/16 09:12, Paul J R wrote:

> Hi All,
>
> I have a data set that i'd like to backup thats large and not very
> important. Backing it up is a "nice to have" not a must have. I've been
> trying to find a way to back it up to disk that isnt disruptive to the
> normal flow of backups, but everytime i end up in a place where bacula
> wants to do a full backup of it (which takes too long and ends up
> getting cancelled).
>
> Currently im using 7.0.5 and something i noticed in the 7.4 tree is the
> ability to resume stopped jobs but from my brief testing of it I wont
> quite do what Im after either. Ideally what im trying to achieve is give
> bacula 1 hour a night to backup as much as it can and then stop. Setting
> a time limit doesnt work cause the backup just gets cancelled and it
> forgets everything its backed up already and tries to start from scratch
> again the following night.
>
> VirtualFull doesnt really do what im after either and i've also tried
> populating the database directly in a way that makes it think its
> already got a full backup (varying results, and none of them fantastic).
> A full backup of the dataset in one hit isnt realistically achievable.
>
> Before I give up though, im curious if anyone has tried doing similar
> and what results/ideas they had that might work?



This is a pretty difficult problem.  To restate the problem, it sounds
like you are trying to create a consistent full backup, in piecewise
slices an hour or two at a time, of a large dataset that is changing
while you're trying to back it up - but without ever actually performing
a full backup.  The problem with this is that you need to be able to
keep state of a stopped backup for arbitrary periods, and at the same
time keep track of whether there have been changes to what you have
already backed up, and you don't have a full backup to refer back to.


The only thing I can think of is, is the dataset structured such that
you could split it [logically] into multiple chunks and back them up as
separate individual jobs?



--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: 603.293.8485

------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: opportunistic backup?

Bryn Hughes
In reply to this post by Paul J R
On 2016-11-16 06:12 AM, Paul J R wrote:

> Hi All,
>
> I have a data set that i'd like to backup thats large and not very
> important. Backing it up is a "nice to have" not a must have. I've been
> trying to find a way to back it up to disk that isnt disruptive to the
> normal flow of backups, but everytime i end up in a place where bacula
> wants to do a full backup of it (which takes too long and ends up
> getting cancelled).
>
> Currently im using 7.0.5 and something i noticed in the 7.4 tree is the
> ability to resume stopped jobs but from my brief testing of it I wont
> quite do what Im after either. Ideally what im trying to achieve is give
> bacula 1 hour a night to backup as much as it can and then stop. Setting
> a time limit doesnt work cause the backup just gets cancelled and it
> forgets everything its backed up already and tries to start from scratch
> again the following night.
>
> VirtualFull doesnt really do what im after either and i've also tried
> populating the database directly in a way that makes it think its
> already got a full backup (varying results, and none of them fantastic).
> A full backup of the dataset in one hit isnt realistically achievable.
>
> Before I give up though, im curious if anyone has tried doing similar
> and what results/ideas they had that might work?
>
> Thanks in advance!

While there's many things that Bacula does very well, what you are
describing is not one of them.

Check out CrashPlan - you can set it up to back up to another machine
for free. I use it in exactly this sort of manner to back up a few
laptops to a server. It's particularly good at handling intermittent
connectivity, changing files, things like that. You don't get the same
amount of control over expiry, it doesn't give you any real way to send
data out to tape (you'd have to use Bacula or something to do a full on
the entire directory it is using after stopping CrashPlan or using
filesystem snapshots) but it will handle multiple machines, multiple
versions of backed up files and can be scheduled the way you want.

Another tool that works well for stuff like this (plus doing a whole lot
of other things) is OwnCloud. I use that on my main personal laptop now.
The files deposited on the OwnCloud server are a lot easier to back up
with tools like Bacula as well, though there is still a separate
database involved that you have to worry about. That reminds me to go
check and make sure I've included OwnCloud's DB in my Bacula jobs!

Bryn

------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: opportunistic backup?

Paul J R
In reply to this post by Phil Stracchino-2
On 17/11/16 07:26, Phil Stracchino wrote:

> On 11/16/16 09:12, Paul J R wrote:
>> Hi All,
>>
>> I have a data set that i'd like to backup thats large and not very
>> important. Backing it up is a "nice to have" not a must have. I've been
>> trying to find a way to back it up to disk that isnt disruptive to the
>> normal flow of backups, but everytime i end up in a place where bacula
>> wants to do a full backup of it (which takes too long and ends up
>> getting cancelled).
>>
>> Currently im using 7.0.5 and something i noticed in the 7.4 tree is the
>> ability to resume stopped jobs but from my brief testing of it I wont
>> quite do what Im after either. Ideally what im trying to achieve is give
>> bacula 1 hour a night to backup as much as it can and then stop. Setting
>> a time limit doesnt work cause the backup just gets cancelled and it
>> forgets everything its backed up already and tries to start from scratch
>> again the following night.
>>
>> VirtualFull doesnt really do what im after either and i've also tried
>> populating the database directly in a way that makes it think its
>> already got a full backup (varying results, and none of them fantastic).
>> A full backup of the dataset in one hit isnt realistically achievable.
>>
>> Before I give up though, im curious if anyone has tried doing similar
>> and what results/ideas they had that might work?
>
>
> This is a pretty difficult problem.  To restate the problem, it sounds
> like you are trying to create a consistent full backup, in piecewise
> slices an hour or two at a time, of a large dataset that is changing
> while you're trying to back it up - but without ever actually performing
> a full backup.  The problem with this is that you need to be able to
> keep state of a stopped backup for arbitrary periods, and at the same
> time keep track of whether there have been changes to what you have
> already backed up, and you don't have a full backup to refer back to.
>
>
> The only thing I can think of is, is the dataset structured such that
> you could split it [logically] into multiple chunks and back them up as
> separate individual jobs?
>
>
>

I dont need a consistent full, any backup it manages to do is a plus.
I've tried splitting it into multiple job sets but the way the dataset
changes makes it fairly resistant cause the data changes its name as it
gets older.

Recovering the data when it gets broken isnt difficult (the last time
was via a windows machine connected to it with samba that got
cryptolocker'ed), it just gets synced back across the internet (which is
a little painful on the internet link for a couple of days). Any backup
data I have is simply a plus that means that particular chunk of data
doesnt have to be re-pulled. Unfortunately unless it completes a backup
it doesnt even record what it managed to backup:

This backup had run for four hours and chewed thru quite a decent chunk
of data, but it never recorded the files it backed up (though this was
only a test):

+-------+-----------------------+---------------------+------+-------+----------+--------------+-----------+
| JobId | Name                  | StartTime           | Type | Level |
JobFiles | JobBytes     | JobStatus |
+-------+-----------------------+---------------------+------+-------+----------+--------------+-----------+
| 38    | NAS02-Job             | 2016-11-15 17:33:24 | B    | F     |
0        | 0            | A         |

Part of the reason why I was hoping to do it with bacula is it'll keep
some of the history that gets lost when it is reconstructed from bare
metal (which really isnt important either in reality, just handy to have).

One thing i did try was moving the data to be backed up out of the way,
getting bacula to run (which completed the full backup) then moving the
data back into place which then meant bacula did the next one as
incremental, but that really didnt have any success either as it didnt
complete the incremental and didnt record what it backed up.





------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: opportunistic backup?

Josip Deanovic
On Friday 2016-11-18 00:08:36 Paul J R wrote:

> I dont need a consistent full, any backup it manages to do is a plus.
> I've tried splitting it into multiple job sets but the way the dataset
> changes makes it fairly resistant cause the data changes its name as it
> gets older.
>
> Recovering the data when it gets broken isnt difficult (the last time
> was via a windows machine connected to it with samba that got
> cryptolocker'ed), it just gets synced back across the internet (which is
> a little painful on the internet link for a couple of days). Any backup
> data I have is simply a plus that means that particular chunk of data
> doesnt have to be re-pulled. Unfortunately unless it completes a backup
> it doesnt even record what it managed to backup:
>
> This backup had run for four hours and chewed thru quite a decent chunk
> of data, but it never recorded the files it backed up (though this was
> only a test):
>
> +-------+-----------------------+---------------------+------+-------+--
> --------+--------------+-----------+
> | JobId | Name                  | StartTime           | Type | Level |
>
> JobFiles | JobBytes     | JobStatus |
> +-------+-----------------------+---------------------+------+-------+--
> --------+--------------+-----------+
> | 38    | NAS02-Job             | 2016-11-15 17:33:24 | B    | F     |
>
> 0        | 0            | A         |
>
> Part of the reason why I was hoping to do it with bacula is it'll keep
> some of the history that gets lost when it is reconstructed from bare
> metal (which really isnt important either in reality, just handy to
> have).
>
> One thing i did try was moving the data to be backed up out of the way,
> getting bacula to run (which completed the full backup) then moving the
> data back into place which then meant bacula did the next one as
> incremental, but that really didnt have any success either as it didnt
> complete the incremental and didnt record what it backed up.


Perhaps you could make use of LVM snapshot feature and do the consistent
backup on the snapshot.

Of course, you would need some extra space which should be calculated
from the size and the frequency of the files that get changed during the
expected snapshot life.

--
Josip Deanovic

------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: opportunistic backup?

Josh Fisher
In reply to this post by Paul J R

On 11/16/2016 9:12 AM, Paul J R wrote:

> Hi All,
>
> I have a data set that i'd like to backup thats large and not very
> important. Backing it up is a "nice to have" not a must have. I've been
> trying to find a way to back it up to disk that isnt disruptive to the
> normal flow of backups, but everytime i end up in a place where bacula
> wants to do a full backup of it (which takes too long and ends up
> getting cancelled).
>
> Currently im using 7.0.5 and something i noticed in the 7.4 tree is the
> ability to resume stopped jobs but from my brief testing of it I wont
> quite do what Im after either. Ideally what im trying to achieve is give
> bacula 1 hour a night to backup as much as it can and then stop. Setting
> a time limit doesnt work cause the backup just gets cancelled and it
> forgets everything its backed up already and tries to start from scratch
> again the following night.
>
> VirtualFull doesnt really do what im after either

Why is that? If you could even one time get a full backup, then from
then on it would only require incremental backups. The occasional
virtual full is out-of-band and doesn't require access to the data set.
The incremental backups should be quick enough not to get in the way,
but in any case do not have to be successful every time, since they are
accumulative.

> and i've also tried
> populating the database directly in a way that makes it think its
> already got a full backup (varying results, and none of them fantastic).
> A full backup of the dataset in one hit isnt realistically achievable.
>
> Before I give up though, im curious if anyone has tried doing similar
> and what results/ideas they had that might work?
>
> Thanks in advance!
>
>
>
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Bacula-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: opportunistic backup?

Kern Sibbald
In reply to this post by Paul J R
Hello,

This is an interesting problem, but it is outside the design of Bacula,
because Bacula assumes that you can always make a Full backup, then
thereafter, you can do things like Incremental Forever and Progressive
Virtual Full backups.

One thing you might try is using the "stop" command. I have never used
it in this manner, but it might work for you.  Basically you would start
a Full backup, then when it runs long enough issue a bconsole "stop"
command.  If all works well, Bacula will then record everything it has
backed up, and at some later time, you can restart the job until it
finishes.  If this works and subsequent Incrementals take too long, you
can try the same trick.

If stop works for you, please let me know.  I will then think about how
we might "automate" this stop feature.  I think you need version 7.4.x
to get the stop command as it was a relatively recent addition.

Best regards,
Kern

On 11/17/2016 02:08 PM, Paul J R wrote:

> On 17/11/16 07:26, Phil Stracchino wrote:
>> On 11/16/16 09:12, Paul J R wrote:
>>> Hi All,
>>>
>>> I have a data set that i'd like to backup thats large and not very
>>> important. Backing it up is a "nice to have" not a must have. I've been
>>> trying to find a way to back it up to disk that isnt disruptive to the
>>> normal flow of backups, but everytime i end up in a place where bacula
>>> wants to do a full backup of it (which takes too long and ends up
>>> getting cancelled).
>>>
>>> Currently im using 7.0.5 and something i noticed in the 7.4 tree is the
>>> ability to resume stopped jobs but from my brief testing of it I wont
>>> quite do what Im after either. Ideally what im trying to achieve is give
>>> bacula 1 hour a night to backup as much as it can and then stop. Setting
>>> a time limit doesnt work cause the backup just gets cancelled and it
>>> forgets everything its backed up already and tries to start from scratch
>>> again the following night.
>>>
>>> VirtualFull doesnt really do what im after either and i've also tried
>>> populating the database directly in a way that makes it think its
>>> already got a full backup (varying results, and none of them fantastic).
>>> A full backup of the dataset in one hit isnt realistically achievable.
>>>
>>> Before I give up though, im curious if anyone has tried doing similar
>>> and what results/ideas they had that might work?
>>
>> This is a pretty difficult problem.  To restate the problem, it sounds
>> like you are trying to create a consistent full backup, in piecewise
>> slices an hour or two at a time, of a large dataset that is changing
>> while you're trying to back it up - but without ever actually performing
>> a full backup.  The problem with this is that you need to be able to
>> keep state of a stopped backup for arbitrary periods, and at the same
>> time keep track of whether there have been changes to what you have
>> already backed up, and you don't have a full backup to refer back to.
>>
>>
>> The only thing I can think of is, is the dataset structured such that
>> you could split it [logically] into multiple chunks and back them up as
>> separate individual jobs?
>>
>>
>>
> I dont need a consistent full, any backup it manages to do is a plus.
> I've tried splitting it into multiple job sets but the way the dataset
> changes makes it fairly resistant cause the data changes its name as it
> gets older.
>
> Recovering the data when it gets broken isnt difficult (the last time
> was via a windows machine connected to it with samba that got
> cryptolocker'ed), it just gets synced back across the internet (which is
> a little painful on the internet link for a couple of days). Any backup
> data I have is simply a plus that means that particular chunk of data
> doesnt have to be re-pulled. Unfortunately unless it completes a backup
> it doesnt even record what it managed to backup:
>
> This backup had run for four hours and chewed thru quite a decent chunk
> of data, but it never recorded the files it backed up (though this was
> only a test):
>
> +-------+-----------------------+---------------------+------+-------+----------+--------------+-----------+
> | JobId | Name                  | StartTime           | Type | Level |
> JobFiles | JobBytes     | JobStatus |
> +-------+-----------------------+---------------------+------+-------+----------+--------------+-----------+
> | 38    | NAS02-Job             | 2016-11-15 17:33:24 | B    | F     |
> 0        | 0            | A         |
>
> Part of the reason why I was hoping to do it with bacula is it'll keep
> some of the history that gets lost when it is reconstructed from bare
> metal (which really isnt important either in reality, just handy to have).
>
> One thing i did try was moving the data to be backed up out of the way,
> getting bacula to run (which completed the full backup) then moving the
> data back into place which then meant bacula did the next one as
> incremental, but that really didnt have any success either as it didnt
> complete the incremental and didnt record what it backed up.
>
>
>
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Bacula-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>


------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: opportunistic backup?

Phil Stracchino-2
In reply to this post by Paul J R
On 11/17/16 08:08, Paul J R wrote:
> I dont need a consistent full, any backup it manages to do is a plus.
> I've tried splitting it into multiple job sets but the way the dataset
> changes makes it fairly resistant cause the data changes its name as it
> gets older.


The data *changes its name* as it gets older.

Oy.  Yeah, you have a difficult problem here.

Honestly, I'm not sure that any *file-level* backup is going to do the
job for you here.  You need a *data-level* backup, and that's almost
certainly going to have to be something written specifically for the
purpose.


--
  Phil Stracchino
  Babylon Communications
  [hidden email]
  [hidden email]
  Landline: 603.293.8485

------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Loading...