No Job status returned from FD. Backup fails

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

No Job status returned from FD. Backup fails

Matthias Koch-Schirrmeister
Good morning list, here's my setup:

Director 7.4.2 running on OpenBSD 6.0 (hostname "fafnir")
FD 5.2.6 running on Debian/GNU Linux 7.0 on a remote site (hostname
"perseus")
Connected by a TLS tunnel


I have been using this for about a year now, for backing up both on-site
and off-site machines. The director used to be 7.0.5 on OpenBSD 5.8, the
clients are running Linux and Windows.

A few days ago I set up a new director (see above), and moved the old
configuration files from the old to the new machine. It all worked well
and as I expected it - with one exception.

The volume to back up from the above client is large, about 100GB for a
full backup, and consequently takes up to 20 hours to run. I haven't
been able to run a single full backup yet, as at some point the
connection seems to get lost. Yesterday I managed to run a (probably)
full backup, but apparently "finished" message from the client never got
back to the director, and after a few more hours the connection dropped.

This is what the client reports:

 131  Full   1,007,689    119.0 G  OK       06-Apr-17 14:07 perseus-Backup



Here's the job summary:

06-Apr 18:58 fafnir JobId 131: Error: Bacula fafnir 7.4.2 (06Jun16):
  Build OS:               x86_64-unknown-openbsd6.0 openbsd 6.0
  JobId:                  131
  Job:                    perseus-Backup.2017-04-05_15.39.23_29
  Backup Level:           Full (upgraded from Incremental)
  Client:                 "perseus" 5.2.6 (21Feb12)
x86_64-pc-linux-gnu,debian,7.0
  FileSet:                "Unixoid" 2017-03-29 23:05:01
  Pool:                   "Standard" (From Command input)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "Standard-Device" (From command line)
  Scheduled time:         05-Apr-2017 15:39:21
  Start time:             05-Apr-2017 15:39:26
  End time:               06-Apr-2017 18:58:45
  Elapsed time:           1 day 3 hours 19 mins 19 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       1,007,689
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       119,184,574,492 (119.1 GB)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Snapshot/VSS:           no
  Encryption:             no
  Accurate:               no
  Volume name(s):         FVol-0001|FVol-0013|FVol-0014|FVol-0015
  Volume Session Id:      2
  Volume Session Time:    1491397950
  Last Volume Bytes:      9,574,999,912 (9.574 GB)
  Non-fatal FD errors:    1
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  OK
  Termination:            *** Backup Error ***



Here's the output from bacula.log:

05-Apr 15:39 fafnir JobId 131: No prior Full backup Job record found.
05-Apr 15:39 fafnir JobId 131: No prior or suitable Full backup found in
catalog. Doing FULL backup.
05-Apr 15:39 fafnir JobId 131: Start Backup JobId 131,
Job=perseus-Backup.2017-04-05_15.39.23_29
05-Apr 15:39 fafnir JobId 131: Using Device "HESTIA-files" to write.
06-Apr 01:17 Standard-Device JobId 131: End of medium on Volume
"FVol-0001" Bytes=53,687,041,713 Blocks=832,206 at 06-Apr-2017 01:17.
06-Apr 01:17 Standard-Device JobId 131: Volume "FVol-0013" previously
written, moving to end of data.
06-Apr 01:17 Standard-Device JobId 131: Ready to append to end of Volume
"FVol-0013" size=7,424,802,068
06-Apr 01:17 Standard-Device JobId 131: New volume "FVol-0013" mounted
on device "HESTIA-files" (/var/import/hestia/_bacula-sd) at 06-Apr-2017
01:17.
06-Apr 09:50 Standard-Device JobId 131: End of medium on Volume
"FVol-0013" Bytes=53,687,066,040 Blocks=832,206 at 06-Apr-2017 09:50.
06-Apr 09:50 fafnir JobId 131: Created new Volume="FVol-0014",
Pool="Standard", MediaType="50GB-Medium" in catalog.
06-Apr 09:50 Standard-Device JobId 131: Labeled new Volume "FVol-0014"
on file device "HESTIA-files" (/var/import/hestia/_bacula-sd).
06-Apr 09:50 Standard-Device JobId 131: Wrote label to prelabeled Volume
"FVol-0014" on file device "HESTIA-files" (/var/import/hestia/_bacula-sd)
06-Apr 09:50 Standard-Device JobId 131: New volume "FVol-0014" mounted
on device "HESTIA-files" (/var/import/hestia/_bacula-sd) at 06-Apr-2017
09:50.
06-Apr 12:26 Standard-Device JobId 131: End of medium on Volume
"FVol-0014" Bytes=53,687,079,186 Blocks=832,203 at 06-Apr-2017 12:26.
06-Apr 12:26 fafnir JobId 131: Created new Volume="FVol-0015",
Pool="Standard", MediaType="50GB-Medium" in catalog.
06-Apr 12:26 Standard-Device JobId 131: Labeled new Volume "FVol-0015"
on file device "HESTIA-files" (/var/import/hestia/_bacula-sd).
06-Apr 12:26 Standard-Device JobId 131: Wrote label to prelabeled Volume
"FVol-0015" on file device "HESTIA-files" (/var/import/hestia/_bacula-sd)
06-Apr 12:26 Standard-Device JobId 131: New volume "FVol-0015" mounted
on device "HESTIA-files" (/var/import/hestia/_bacula-sd) at 06-Apr-2017
12:26.
06-Apr 14:07 Standard-Device JobId 131: Elapsed time=22:28:30, Transfer
rate=1.473 M Bytes/second
06-Apr 14:07 Standard-Device JobId 131: Sending spooled attrs to the
Director. Despooling 350,356,996 bytes ...
06-Apr 18:57 fafnir JobId 131: Fatal error: Network error with FD during
Backup: ERR=Connection reset by peer
06-Apr 18:58 fafnir JobId 131: Fatal error: No Job status returned from FD.
06-Apr 18:58 fafnir JobId 131: Error: Bacula fafnir 7.4.2 (06Jun16):


I'm a bit lost atm. I'm backing up a few more remote machines like this,
with not quite the same volume of data but still some. They all run,
like they used to for about a year. Just this one isn't. The only
apparent change to is is moving from OpenBSD 5.8 to 6.0, and from Bacula
7.0.5 to 7.4.2.

One of the remote clients with a bigger volume is running precisely the
same OS and FD version.

TIA
Matthias


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: No Job status returned from FD. Backup fails

Heitor Faria

> Good morning list, here's my setup:

Hello, Matthias,

> Director 7.4.2 running on OpenBSD 6.0 (hostname "fafnir")
> FD 5.2.6 running on Debian/GNU Linux 7.0 on a remote site (hostname
> "perseus")
> Connected by a TLS tunnel
>
>
> I have been using this for about a year now, for backing up both on-site
> and off-site machines. The director used to be 7.0.5 on OpenBSD 5.8, the
> clients are running Linux and Windows.
>
> A few days ago I set up a new director (see above), and moved the old
> configuration files from the old to the new machine. It all worked well
> and as I expected it - with one exception.
>
> The volume to back up from the above client is large, about 100GB for a
> full backup, and consequently takes up to 20 hours to run. I haven't
> been able to run a single full backup yet, as at some point the
> connection seems to get lost. Yesterday I managed to run a (probably)
> full backup, but apparently "finished" message from the client never got
> back to the director, and after a few more hours the connection dropped.
>
> This is what the client reports:
>
> 131  Full   1,007,689    119.0 G  OK       06-Apr-17 14:07 perseus-Backup
>
>
>
> Here's the job summary:
>
> 06-Apr 18:58 fafnir JobId 131: Error: Bacula fafnir 7.4.2 (06Jun16):
>  Build OS:               x86_64-unknown-openbsd6.0 openbsd 6.0
>  JobId:                  131
>  Job:                    perseus-Backup.2017-04-05_15.39.23_29
>  Backup Level:           Full (upgraded from Incremental)
>  Client:                 "perseus" 5.2.6 (21Feb12)
> x86_64-pc-linux-gnu,debian,7.0
>  FileSet:                "Unixoid" 2017-03-29 23:05:01
>  Pool:                   "Standard" (From Command input)
>  Catalog:                "MyCatalog" (From Client resource)
>  Storage:                "Standard-Device" (From command line)
>  Scheduled time:         05-Apr-2017 15:39:21
>  Start time:             05-Apr-2017 15:39:26
>  End time:               06-Apr-2017 18:58:45
>  Elapsed time:           1 day 3 hours 19 mins 19 secs
>  Priority:               10
>  FD Files Written:       0
>  SD Files Written:       1,007,689
>  FD Bytes Written:       0 (0 B)
>  SD Bytes Written:       119,184,574,492 (119.1 GB)
>  Rate:                   0.0 KB/s
>  Software Compression:   None
>  Snapshot/VSS:           no
>  Encryption:             no
>  Accurate:               no
>  Volume name(s):         FVol-0001|FVol-0013|FVol-0014|FVol-0015
>  Volume Session Id:      2
>  Volume Session Time:    1491397950
>  Last Volume Bytes:      9,574,999,912 (9.574 GB)
>  Non-fatal FD errors:    1
>  SD Errors:              0
>  FD termination status:  Error
>  SD termination status:  OK
>  Termination:            *** Backup Error ***
>
>
>
> Here's the output from bacula.log:
>
> 05-Apr 15:39 fafnir JobId 131: No prior Full backup Job record found.
> 05-Apr 15:39 fafnir JobId 131: No prior or suitable Full backup found in
> catalog. Doing FULL backup.
> 05-Apr 15:39 fafnir JobId 131: Start Backup JobId 131,
> Job=perseus-Backup.2017-04-05_15.39.23_29
> 05-Apr 15:39 fafnir JobId 131: Using Device "HESTIA-files" to write.
> 06-Apr 01:17 Standard-Device JobId 131: End of medium on Volume
> "FVol-0001" Bytes=53,687,041,713 Blocks=832,206 at 06-Apr-2017 01:17.
> 06-Apr 01:17 Standard-Device JobId 131: Volume "FVol-0013" previously
> written, moving to end of data.
> 06-Apr 01:17 Standard-Device JobId 131: Ready to append to end of Volume
> "FVol-0013" size=7,424,802,068
> 06-Apr 01:17 Standard-Device JobId 131: New volume "FVol-0013" mounted
> on device "HESTIA-files" (/var/import/hestia/_bacula-sd) at 06-Apr-2017
> 01:17.
> 06-Apr 09:50 Standard-Device JobId 131: End of medium on Volume
> "FVol-0013" Bytes=53,687,066,040 Blocks=832,206 at 06-Apr-2017 09:50.
> 06-Apr 09:50 fafnir JobId 131: Created new Volume="FVol-0014",
> Pool="Standard", MediaType="50GB-Medium" in catalog.
> 06-Apr 09:50 Standard-Device JobId 131: Labeled new Volume "FVol-0014"
> on file device "HESTIA-files" (/var/import/hestia/_bacula-sd).
> 06-Apr 09:50 Standard-Device JobId 131: Wrote label to prelabeled Volume
> "FVol-0014" on file device "HESTIA-files" (/var/import/hestia/_bacula-sd)
> 06-Apr 09:50 Standard-Device JobId 131: New volume "FVol-0014" mounted
> on device "HESTIA-files" (/var/import/hestia/_bacula-sd) at 06-Apr-2017
> 09:50.
> 06-Apr 12:26 Standard-Device JobId 131: End of medium on Volume
> "FVol-0014" Bytes=53,687,079,186 Blocks=832,203 at 06-Apr-2017 12:26.
> 06-Apr 12:26 fafnir JobId 131: Created new Volume="FVol-0015",
> Pool="Standard", MediaType="50GB-Medium" in catalog.
> 06-Apr 12:26 Standard-Device JobId 131: Labeled new Volume "FVol-0015"
> on file device "HESTIA-files" (/var/import/hestia/_bacula-sd).
> 06-Apr 12:26 Standard-Device JobId 131: Wrote label to prelabeled Volume
> "FVol-0015" on file device "HESTIA-files" (/var/import/hestia/_bacula-sd)
> 06-Apr 12:26 Standard-Device JobId 131: New volume "FVol-0015" mounted
> on device "HESTIA-files" (/var/import/hestia/_bacula-sd) at 06-Apr-2017
> 12:26.
> 06-Apr 14:07 Standard-Device JobId 131: Elapsed time=22:28:30, Transfer
> rate=1.473 M Bytes/second
> 06-Apr 14:07 Standard-Device JobId 131: Sending spooled attrs to the
> Director. Despooling 350,356,996 bytes ...
> 06-Apr 18:57 fafnir JobId 131: Fatal error: Network error with FD during
> Backup: ERR=Connection reset by peer
> 06-Apr 18:58 fafnir JobId 131: Fatal error: No Job status returned from FD.
> 06-Apr 18:58 fafnir JobId 131: Error: Bacula fafnir 7.4.2 (06Jun16):
>
>
> I'm a bit lost atm. I'm backing up a few more remote machines like this,
> with not quite the same volume of data but still some. They all run,
> like they used to for about a year. Just this one isn't. The only
> apparent change to is is moving from OpenBSD 5.8 to 6.0, and from Bacula
> 7.0.5 to 7.4.2.
>
> One of the remote clients with a bigger volume is running precisely the
> same OS and FD version.

Having a too old FD (5.2.6) in relation to Director can also result in problems depending on the Bacula backup settings. I'd suggest you upgrading your FD.
Anyhow, this error message usually happens when there is network disruption during backup. Not necessarily a Bacula issue.
In some very peculiar situations enabling Heartbeat directive in involved daemons might help.

> TIA
> Matthias

Regards,
--
===========================================================================
Heitor Medrado de Faria | Bacula do Brasil
• Não seja tarifado pelo tamanho dos seus backups, conheça o Bacula Enterprise: http://www.bacula.com.br/enterprise/ 
• Ministro treinamento e implementação in-company do Bacula Community: http://www.bacula.com.br/in-company/ 
(61) 98268-4220 | www.bacula.com.br
============================================================================
Indicamos também as capacitações complementares:
• Shell básico e Programação em Shell com Julio Neves.
• Zabbix com Adail Host.
============================================================================

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: No Job status returned from FD. Backup fails

Matthias Koch-Schirrmeister

Am 07.04.2017 um 14:27 schrieb Heitor Faria:

> Having a too old FD (5.2.6) in relation to Director can also result in
problems depending on the Bacula backup settings.
> I'd suggest you upgrading your FD.

That would have been my first pcik. I have actually two remote clients,
both of them an the same connection, and both of them running the same
OS and BaculaFD version. Only one of them is giving me trouble. The
other one's running fine.

> Anyhow, this error message usually happens when there is network
disruption during backup. Not necessarily a Bacula issue.
> In some very peculiar situations enabling Heartbeat directive in
involved daemons might help.

I have it enabled now. Waiting for the result of the next run.

Kind regards
Matthias




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users

signature.asc (484 bytes) Download Attachment
RAT
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: No Job status returned from FD. Backup fails

RAT
In reply to this post by Matthias Koch-Schirrmeister
What's the proper way to purge/prune tapes that have exceeded their retention time?
It is not doing it automatically and I must've goofed it up because:
 
 *label storage=tl4000 pool=Tape slots=3 barcodes
Enter autochanger drive[0]:
Connecting to Storage daemon tl4000 at bacula1.usi.edu:9103 ...
3306 Issuing autochanger "slots" command.
Device "tl4000" has 48 slots.
Connecting to Storage daemon tl4000 at bacula1.usi.edu:9103 ...
3306 Issuing autochanger "list" command.
The following Volumes will be labeled:
Slot Volume
==============
3 BAC013L6
Do you want to label these Volumes? (yes|no): yes
Connecting to Storage daemon tl4000 at bacula1.usi.edu:9103 ...
Sending label command for Volume "BAC013L6" Slot 3 ...
3307 Issuing autochanger "unload slot 20, drive 0" command.
3304 Issuing autochanger "load slot 3, drive 0" command.
3305 Autochanger "load slot 3, drive 0", status is OK.
3920 Cannot label Volume because it is already labeled: "BAC013L6"
Label command failed for Volume BAC013L6.
 
*update slots barcode drive=0 storage=tl4000
Connecting to Storage daemon tl4000 at bacula1.usi.edu:9103 ...
3306 Issuing autochanger "slots" command.
Device "tl4000" has 48 slots.
Connecting to Storage daemon tl4000 at bacula1.usi.edu:9103 ...
3306 Issuing autochanger "list" command.
Catalog record for Volume "BAC026L6" is up to date.
Catalog record for Volume "BAC029L6" is up to date.
Volume "BAC013L6" not found in catalog. Slot=3 InChanger set to zero.
Catalog record for Volume "BAC028L6" is up to date.
Catalog record for Volume "BAC035L6" is up to date.
Catalog record for Volume "BAC037L6" is up to date.
Catalog record for Volume "BAC027L6" is up to date.
Catalog record for Volume "BAC036L6" is up to date.
Catalog record for Volume "BAC039L6" is up to date.
Catalog record for Volume "BAC025L6" is up to date.
Catalog record for Volume "BAC024L6" is up to date.
Catalog record for Volume "CLN003L6" is up to date.
Catalog record for Volume "BAC062L6" is up to date.
Catalog record for Volume "BAC073L6" is up to date.
Catalog record for Volume "BAC071L6" is up to date.
Catalog record for Volume "BAC072L6" is up to date.
Catalog record for Volume "BAC063L6" is up to date.
 
 
Robert Threet
http://yesistilluseperl.blogspot.com/


____________________________________________________________
This "Smart Cup" Has the Internet Going Crazy!
howlifeworks.com
http://thirdpartyoffers.netzero.net/TGL3232/58e7dfca7fd2c5fca302bst01duc
SponsoredBy Content.Ad
Good morning list, here's my setup:

Director 7.4.2 running on OpenBSD 6.0 (hostname "fafnir")
FD 5.2.6 running on Debian/GNU Linux 7.0 on a remote site (hostname
"perseus")
Connected by a TLS tunnel


I have been using this for about a year now, for backing up both on-site
and off-site machines. The director used to be 7.0.5 on OpenBSD 5.8, the
clients are running Linux and Windows.

A few days ago I set up a new director (see above), and moved the old
configuration files from the old to the new machine. It all worked well
and as I expected it - with one exception.

The volume to back up from the above client is large, about 100GB for a
full backup, and consequently takes up to 20 hours to run. I haven't
been able to run a single full backup yet, as at some point the
connection seems to get lost. Yesterday I managed to run a (probably)
full backup, but apparently "finished" message from the client never got
back to the director, and after a few more hours the connection dropped.

This is what the client reports:

 131  Full   1,007,689    119.0 G  OK       06-Apr-17 14:07 perseus-Backup



Here's the job summary:

06-Apr 18:58 fafnir JobId 131: Error: Bacula fafnir 7.4.2 (06Jun16):
  Build OS:               x86_64-unknown-openbsd6.0 openbsd 6.0
  JobId:                  131
  Job:                    perseus-Backup.2017-04-05_15.39.23_29
  Backup Level:           Full (upgraded from Incremental)
  Client:                 "perseus" 5.2.6 (21Feb12)
x86_64-pc-linux-gnu,debian,7.0
  FileSet:                "Unixoid" 2017-03-29 23:05:01
  Pool:                   "Standard" (From Command input)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "Standard-Device" (From command line)
  Scheduled time:         05-Apr-2017 15:39:21
  Start time:             05-Apr-2017 15:39:26
  End time:               06-Apr-2017 18:58:45
  Elapsed time:           1 day 3 hours 19 mins 19 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       1,007,689
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       119,184,574,492 (119.1 GB)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Snapshot/VSS:           no
  Encryption:             no
  Accurate:               no
  Volume name(s):         FVol-0001|FVol-0013|FVol-0014|FVol-0015
  Volume Session Id:      2
  Volume Session Time:    1491397950
  Last Volume Bytes:      9,574,999,912 (9.574 GB)
  Non-fatal FD errors:    1
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  OK
  Termination:            *** Backup Error ***



Here's the output from bacula.log:

05-Apr 15:39 fafnir JobId 131: No prior Full backup Job record found.
05-Apr 15:39 fafnir JobId 131: No prior or suitable Full backup found in
catalog. Doing FULL backup.
05-Apr 15:39 fafnir JobId 131: Start Backup JobId 131,
Job=perseus-Backup.2017-04-05_15.39.23_29
05-Apr 15:39 fafnir JobId 131: Using Device "HESTIA-files" to write.
06-Apr 01:17 Standard-Device JobId 131: End of medium on Volume
"FVol-0001" Bytes=53,687,041,713 Blocks=832,206 at 06-Apr-2017 01:17.
06-Apr 01:17 Standard-Device JobId 131: Volume "FVol-0013" previously
written, moving to end of data.
06-Apr 01:17 Standard-Device JobId 131: Ready to append to end of Volume
"FVol-0013" size=7,424,802,068
06-Apr 01:17 Standard-Device JobId 131: New volume "FVol-0013" mounted
on device "HESTIA-files" (/var/import/hestia/_bacula-sd) at 06-Apr-2017
01:17.
06-Apr 09:50 Standard-Device JobId 131: End of medium on Volume
"FVol-0013" Bytes=53,687,066,040 Blocks=832,206 at 06-Apr-2017 09:50.
06-Apr 09:50 fafnir JobId 131: Created new Volume="FVol-0014",
Pool="Standard", MediaType="50GB-Medium" in catalog.
06-Apr 09:50 Standard-Device JobId 131: Labeled new Volume "FVol-0014"
on file device "HESTIA-files" (/var/import/hestia/_bacula-sd).
06-Apr 09:50 Standard-Device JobId 131: Wrote label to prelabeled Volume
"FVol-0014" on file device "HESTIA-files" (/var/import/hestia/_bacula-sd)
06-Apr 09:50 Standard-Device JobId 131: New volume "FVol-0014" mounted
on device "HESTIA-files" (/var/import/hestia/_bacula-sd) at 06-Apr-2017
09:50.
06-Apr 12:26 Standard-Device JobId 131: End of medium on Volume
"FVol-0014" Bytes=53,687,079,186 Blocks=832,203 at 06-Apr-2017 12:26.
06-Apr 12:26 fafnir JobId 131: Created new Volume="FVol-0015",
Pool="Standard", MediaType="50GB-Medium" in catalog.
06-Apr 12:26 Standard-Device JobId 131: Labeled new Volume "FVol-0015"
on file device "HESTIA-files" (/var/import/hestia/_bacula-sd).
06-Apr 12:26 Standard-Device JobId 131: Wrote label to prelabeled Volume
"FVol-0015" on file device "HESTIA-files" (/var/import/hestia/_bacula-sd)
06-Apr 12:26 Standard-Device JobId 131: New volume "FVol-0015" mounted
on device "HESTIA-files" (/var/import/hestia/_bacula-sd) at 06-Apr-2017
12:26.
06-Apr 14:07 Standard-Device JobId 131: Elapsed time=22:28:30, Transfer
rate=1.473 M Bytes/second
06-Apr 14:07 Standard-Device JobId 131: Sending spooled attrs to the
Director. Despooling 350,356,996 bytes ...
06-Apr 18:57 fafnir JobId 131: Fatal error: Network error with FD during
Backup: ERR=Connection reset by peer
06-Apr 18:58 fafnir JobId 131: Fatal error: No Job status returned from FD.
06-Apr 18:58 fafnir JobId 131: Error: Bacula fafnir 7.4.2 (06Jun16):


I'm a bit lost atm. I'm backing up a few more remote machines like this,
with not quite the same volume of data but still some. They all run,
like they used to for about a year. Just this one isn't. The only
apparent change to is is moving from OpenBSD 5.8 to 6.0, and from Bacula
7.0.5 to 7.4.2.

One of the remote clients with a bigger volume is running precisely the
same OS and FD version.

TIA
Matthias


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: No Job status returned from FD. Backup fails

Kern Sibbald

Strictly speaking Bacula does not purge/prune tapes.  It purges and prunes records in the catalog.  When a Volume has no more job records stored on it, and if you have automatic recycling set (default), Bacula will re-use Volumes.


On 04/07/2017 08:50 PM, RAT wrote:
What's the proper way to purge/prune tapes that have exceeded their retention time?
It is not doing it automatically and I must've goofed it up because:
 
 *label storage=tl4000 pool=Tape slots=3 barcodes
Enter autochanger drive[0]:
Connecting to Storage daemon tl4000 at bacula1.usi.edu:9103 ...
3306 Issuing autochanger "slots" command.
Device "tl4000" has 48 slots.
Connecting to Storage daemon tl4000 at bacula1.usi.edu:9103 ...
3306 Issuing autochanger "list" command.
The following Volumes will be labeled:
Slot Volume
==============
3 BAC013L6
Do you want to label these Volumes? (yes|no): yes
Connecting to Storage daemon tl4000 at bacula1.usi.edu:9103 ...
Sending label command for Volume "BAC013L6" Slot 3 ...
3307 Issuing autochanger "unload slot 20, drive 0" command.
3304 Issuing autochanger "load slot 3, drive 0" command.
3305 Autochanger "load slot 3, drive 0", status is OK.
3920 Cannot label Volume because it is already labeled: "BAC013L6"
Label command failed for Volume BAC013L6.
 
*update slots barcode drive=0 storage=tl4000
Connecting to Storage daemon tl4000 at bacula1.usi.edu:9103 ...
3306 Issuing autochanger "slots" command.
Device "tl4000" has 48 slots.
Connecting to Storage daemon tl4000 at bacula1.usi.edu:9103 ...
3306 Issuing autochanger "list" command.
Catalog record for Volume "BAC026L6" is up to date.
Catalog record for Volume "BAC029L6" is up to date.
Volume "BAC013L6" not found in catalog. Slot=3 InChanger set to zero.
Catalog record for Volume "BAC028L6" is up to date.
Catalog record for Volume "BAC035L6" is up to date.
Catalog record for Volume "BAC037L6" is up to date.
Catalog record for Volume "BAC027L6" is up to date.
Catalog record for Volume "BAC036L6" is up to date.
Catalog record for Volume "BAC039L6" is up to date.
Catalog record for Volume "BAC025L6" is up to date.
Catalog record for Volume "BAC024L6" is up to date.
Catalog record for Volume "CLN003L6" is up to date.
Catalog record for Volume "BAC062L6" is up to date.
Catalog record for Volume "BAC073L6" is up to date.
Catalog record for Volume "BAC071L6" is up to date.
Catalog record for Volume "BAC072L6" is up to date.
Catalog record for Volume "BAC063L6" is up to date.
 
 


____________________________________________________________
This "Smart Cup" Has the Internet Going Crazy!
howlifeworks.com
http://thirdpartyoffers.netzero.net/TGL3232/58e7dfca7fd2c5fca302bst01duc
SponsoredBy Content.Ad

Good morning list, here's my setup:

Director 7.4.2 running on OpenBSD 6.0 (hostname "fafnir")
FD 5.2.6 running on Debian/GNU Linux 7.0 on a remote site (hostname
"perseus")
Connected by a TLS tunnel


I have been using this for about a year now, for backing up both on-site
and off-site machines. The director used to be 7.0.5 on OpenBSD 5.8, the
clients are running Linux and Windows.

A few days ago I set up a new director (see above), and moved the old
configuration files from the old to the new machine. It all worked well
and as I expected it - with one exception.

The volume to back up from the above client is large, about 100GB for a
full backup, and consequently takes up to 20 hours to run. I haven't
been able to run a single full backup yet, as at some point the
connection seems to get lost. Yesterday I managed to run a (probably)
full backup, but apparently "finished" message from the client never got
back to the director, and after a few more hours the connection dropped.

This is what the client reports:

 131  Full   1,007,689    119.0 G  OK       06-Apr-17 14:07 perseus-Backup



Here's the job summary:

06-Apr 18:58 fafnir JobId 131: Error: Bacula fafnir 7.4.2 (06Jun16):
  Build OS:               x86_64-unknown-openbsd6.0 openbsd 6.0
  JobId:                  131
  Job:                    perseus-Backup.2017-04-05_15.39.23_29
  Backup Level:           Full (upgraded from Incremental)
  Client:                 "perseus" 5.2.6 (21Feb12)
x86_64-pc-linux-gnu,debian,7.0
  FileSet:                "Unixoid" 2017-03-29 23:05:01
  Pool:                   "Standard" (From Command input)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "Standard-Device" (From command line)
  Scheduled time:         05-Apr-2017 15:39:21
  Start time:             05-Apr-2017 15:39:26
  End time:               06-Apr-2017 18:58:45
  Elapsed time:           1 day 3 hours 19 mins 19 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       1,007,689
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       119,184,574,492 (119.1 GB)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Snapshot/VSS:           no
  Encryption:             no
  Accurate:               no
  Volume name(s):         FVol-0001|FVol-0013|FVol-0014|FVol-0015
  Volume Session Id:      2
  Volume Session Time:    1491397950
  Last Volume Bytes:      9,574,999,912 (9.574 GB)
  Non-fatal FD errors:    1
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  OK
  Termination:            *** Backup Error ***



Here's the output from bacula.log:

05-Apr 15:39 fafnir JobId 131: No prior Full backup Job record found.
05-Apr 15:39 fafnir JobId 131: No prior or suitable Full backup found in
catalog. Doing FULL backup.
05-Apr 15:39 fafnir JobId 131: Start Backup JobId 131,
Job=perseus-Backup.2017-04-05_15.39.23_29
05-Apr 15:39 fafnir JobId 131: Using Device "HESTIA-files" to write.
06-Apr 01:17 Standard-Device JobId 131: End of medium on Volume
"FVol-0001" Bytes=53,687,041,713 Blocks=832,206 at 06-Apr-2017 01:17.
06-Apr 01:17 Standard-Device JobId 131: Volume "FVol-0013" previously
written, moving to end of data.
06-Apr 01:17 Standard-Device JobId 131: Ready to append to end of Volume
"FVol-0013" size=7,424,802,068
06-Apr 01:17 Standard-Device JobId 131: New volume "FVol-0013" mounted
on device "HESTIA-files" (/var/import/hestia/_bacula-sd) at 06-Apr-2017
01:17.
06-Apr 09:50 Standard-Device JobId 131: End of medium on Volume
"FVol-0013" Bytes=53,687,066,040 Blocks=832,206 at 06-Apr-2017 09:50.
06-Apr 09:50 fafnir JobId 131: Created new Volume="FVol-0014",
Pool="Standard", MediaType="50GB-Medium" in catalog.
06-Apr 09:50 Standard-Device JobId 131: Labeled new Volume "FVol-0014"
on file device "HESTIA-files" (/var/import/hestia/_bacula-sd).
06-Apr 09:50 Standard-Device JobId 131: Wrote label to prelabeled Volume
"FVol-0014" on file device "HESTIA-files" (/var/import/hestia/_bacula-sd)
06-Apr 09:50 Standard-Device JobId 131: New volume "FVol-0014" mounted
on device "HESTIA-files" (/var/import/hestia/_bacula-sd) at 06-Apr-2017
09:50.
06-Apr 12:26 Standard-Device JobId 131: End of medium on Volume
"FVol-0014" Bytes=53,687,079,186 Blocks=832,203 at 06-Apr-2017 12:26.
06-Apr 12:26 fafnir JobId 131: Created new Volume="FVol-0015",
Pool="Standard", MediaType="50GB-Medium" in catalog.
06-Apr 12:26 Standard-Device JobId 131: Labeled new Volume "FVol-0015"
on file device "HESTIA-files" (/var/import/hestia/_bacula-sd).
06-Apr 12:26 Standard-Device JobId 131: Wrote label to prelabeled Volume
"FVol-0015" on file device "HESTIA-files" (/var/import/hestia/_bacula-sd)
06-Apr 12:26 Standard-Device JobId 131: New volume "FVol-0015" mounted
on device "HESTIA-files" (/var/import/hestia/_bacula-sd) at 06-Apr-2017
12:26.
06-Apr 14:07 Standard-Device JobId 131: Elapsed time=22:28:30, Transfer
rate=1.473 M Bytes/second
06-Apr 14:07 Standard-Device JobId 131: Sending spooled attrs to the
Director. Despooling 350,356,996 bytes ...
06-Apr 18:57 fafnir JobId 131: Fatal error: Network error with FD during
Backup: ERR=Connection reset by peer
06-Apr 18:58 fafnir JobId 131: Fatal error: No Job status returned from FD.
06-Apr 18:58 fafnir JobId 131: Error: Bacula fafnir 7.4.2 (06Jun16):


I'm a bit lost atm. I'm backing up a few more remote machines like this,
with not quite the same volume of data but still some. They all run,
like they used to for about a year. Just this one isn't. The only
apparent change to is is moving from OpenBSD 5.8 to 6.0, and from Bacula
7.0.5 to 7.4.2.

One of the remote clients with a bigger volume is running precisely the
same OS and FD version.

TIA
Matthias



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: No Job status returned from FD. Backup fails

Matthias Koch-Schirrmeister
In reply to this post by Matthias Koch-Schirrmeister
And another error:

Fatal error: append.c:183 Error reading data header from FD. n=-2
msglen=0 ERR=Connection reset by peer

Don't know how to read it, though.

This time a CentOS client running FD 5.2.13 is affected.

I have no reasons to believe that we actually lost physical connection
to the remote machine - we would get a load of other errors then. I've
had had ssh sessions open to these machines over the weekend and they
didn't disconnect. I'll fire up the old (Director 7.0.5, as shipped
w/OpenBSD 5.8) machine with the same configuration then, and see what's
happening. Both directors, by the way, are running on the same ESXi cluster.

Matthias


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: No Job status returned from FD. Backup fails

Matthias Koch-Schirrmeister
I have put the jobs in question back on the old 7.0.5 director and
they've been running fine over the weekend. I figure it must be
something between the client (5.2.x) and director versions, because all
other parameters, connection and everything, are identical. I wonder if
there's a way to further narrow the problem down, because I wouldn't
like to have two backup systems running.

Matthias


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: No Job status returned from FD. Backup fails

Matthias Koch-Schirrmeister
I have managed to keep a long backup running (125GB /23 hours). While
the remote FD reports a success

 JobId  Level    Files      Bytes   Status   Finished        Name
======================================================================
   940  Full   1,022,421    125.2 G  OK       20-Jun-17 15:01 backup


the director reports a failure:


Standard-Device Elapsed time=23:34:45, Transfer rate=1.478 M Bytes/second
 Sending spooled attrs to the Director. Despooling 355,532,878
fafnir Fatal error: Network error with FD during Backup: ERR=Connection
reset by peer
fafnir Fatal error: No Job status returned from FD.

Error: Bacula fafnir 7.4.2 (06Jun16):
  Build OS:               x86_64-unknown-openbsd6.0 openbsd 6.0
  JobId:                  940
  Job:                    backup.2017-06-19_15.26.55_07
  Backup Level:           Full
  Client:                 "<client>" 5.2.6 (21Feb12)
x86_64-pc-linux-gnu,debian,7.0
  FileSet:                "Unixoid" 2017-03-29 23:05:01
  Pool:                   "Standard" (From Command input)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "Standard-Device" (From command line)
  Scheduled time:         19-Jun-2017 15:26:50
  Start time:             19-Jun-2017 15:26:58
  End time:               20-Jun-2017 18:57:39
  Elapsed time:           1 day 3 hours 30 mins 41 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       1,022,421
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       125,469,539,182 (125.4 GB)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Snapshot/VSS:           no
  Encryption:             no
  Accurate:               no
  Volume name(s):         FVol-0002|FVol-0005|FVol-0001|FVol-0006
  Volume Session Id:      40
  Volume Session Time:    1497446317
  Last Volume Bytes:      40,602,488,821 (40.60 GB)
  Non-fatal FD errors:    1
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  OK
  Termination:            *** Backup Error ***


I noticed that the director still considered the job "running" while the
FD had already reported it as finished. The db backend seemed to be
quite busy for about an hour. The last message I got from the director was

 Sending spooled attrs to the Director. Despooling 355,532,878


until finally the error message appeared.

Matthias


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: No Job status returned from FD. Backup fails

Matthias Koch-Schirrmeister
It is remarkable that the FD reports 15:01 as finishing time, while the
director reported a failure at 18:57. This probably means that the
director waited for almost four hours for something that never happened,
and gave up at 18:57.

Matthias


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: No Job status returned from FD. Backup fails

Kern Sibbald
In reply to this post by Matthias Koch-Schirrmeister

Hello Matthias,

While we attempt to fully support older FDs, in your case, your FD is four years older than the Director.  I would suggest that this is not an ideal combination -- i.e. I recommend that you upgrade your FD, and if you still have problems, report it again.   I suspect that the connection was probably dropped because you do not have Heartbeat Interval set on the FD and it waited a very long time for the Director.  If I am not mistaken, by default the heartbeat interval is turned on in newer FDs.  At least that would be something to check.

Best regards,

Kern



On 06/21/2017 11:56 AM, Matthias Koch-Schirrmeister wrote:
I have managed to keep a long backup running (125GB /23 hours). While
the remote FD reports a success

 JobId  Level    Files      Bytes   Status   Finished        Name
======================================================================
   940  Full   1,022,421    125.2 G  OK       20-Jun-17 15:01 backup


the director reports a failure:


Standard-Device Elapsed time=23:34:45, Transfer rate=1.478 M Bytes/second
 Sending spooled attrs to the Director. Despooling 355,532,878
fafnir Fatal error: Network error with FD during Backup: ERR=Connection
reset by peer
fafnir Fatal error: No Job status returned from FD.

Error: Bacula fafnir 7.4.2 (06Jun16):
  Build OS:               x86_64-unknown-openbsd6.0 openbsd 6.0
  JobId:                  940
  Job:                    backup.2017-06-19_15.26.55_07
  Backup Level:           Full
  Client:                 "<client>" 5.2.6 (21Feb12)
x86_64-pc-linux-gnu,debian,7.0
  FileSet:                "Unixoid" 2017-03-29 23:05:01
  Pool:                   "Standard" (From Command input)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "Standard-Device" (From command line)
  Scheduled time:         19-Jun-2017 15:26:50
  Start time:             19-Jun-2017 15:26:58
  End time:               20-Jun-2017 18:57:39
  Elapsed time:           1 day 3 hours 30 mins 41 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       1,022,421
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       125,469,539,182 (125.4 GB)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Snapshot/VSS:           no
  Encryption:             no
  Accurate:               no
  Volume name(s):         FVol-0002|FVol-0005|FVol-0001|FVol-0006
  Volume Session Id:      40
  Volume Session Time:    1497446317
  Last Volume Bytes:      40,602,488,821 (40.60 GB)
  Non-fatal FD errors:    1
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  OK
  Termination:            *** Backup Error ***


I noticed that the director still considered the job "running" while the
FD had already reported it as finished. The db backend seemed to be
quite busy for about an hour. The last message I got from the director was

 Sending spooled attrs to the Director. Despooling 355,532,878


until finally the error message appeared.

Matthias



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: No Job status returned from FD. Backup fails

Matthias Koch-Schirrmeister
Am 21.06.2017 um 16:57 schrieb Kern Sibbald:
At least that would be something to check.

Hi Kern, thanks for the reply. The heartbeat directive is set:

FileDaemon {
  ...
  Heartbeat Interval = 60
}


and

Director {
  ...
  Heartbeat Interval = 60
}


respectively.

While the job in question is by far the largest, I have other jobs with
up to 50GB that run without issues, even with rather aged Windows FDs.

The trouble is that there are no newer packages available and I wouldn't
want to begin compiling things on productional machines.

Kind regards
Matthias


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users

signature.asc (484 bytes) Download Attachment
Loading...