frequent segfaults

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

frequent segfaults

Craig Shiroma
Hello All,

Within the last 3-4 weeks, our Bacula 7.0.5 on RHEL 6 has experienced bacula-dir segfaults once or twice a week.  Before that everything ran fine.

Memory and CPU look fine on our Director server.  Nothing has changed on the server before the segfaults started happening as far as I can remember other than possibly a few more jobs being added.  

What is the best way to troubleshoot the cause of these segfaults?

Best regards,
Craig

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: frequent segfaults

Craig Shiroma
This is what appears in /var/log/messages:

Jan 23 22:42:39 bup05 bacula-dir: Bacula interrupted by signal 11: Segmentation violation


On Tue, Jan 24, 2017 at 8:44 AM, Craig Shiroma <[hidden email]> wrote:
Hello All,

Within the last 3-4 weeks, our Bacula 7.0.5 on RHEL 6 has experienced bacula-dir segfaults once or twice a week.  Before that everything ran fine.

Memory and CPU look fine on our Director server.  Nothing has changed on the server before the segfaults started happening as far as I can remember other than possibly a few more jobs being added.  

What is the best way to troubleshoot the cause of these segfaults?

Best regards,
Craig


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: frequent segfaults

Dimitri Maziuk
On 01/24/2017 12:57 PM, Craig Shiroma wrote:
> This is what appears in /var/log/messages:
>
> Jan 23 22:42:39 bup05 bacula-dir: Bacula interrupted by signal 11:
> Segmentation violation

Try googling for sig11?

You may have a DIMM or some other hardware going bad.

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users

signature.asc (197 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: frequent segfaults

Craig Shiroma
Thanks, Dimitri!

I checked the hardware server logs (not /var/log/messages) and don't see any memory alerts.

-craig

On Tue, Jan 24, 2017 at 9:09 AM, Dimitri Maziuk <[hidden email]> wrote:
On 01/24/2017 12:57 PM, Craig Shiroma wrote:
> This is what appears in /var/log/messages:
>
> Jan 23 22:42:39 bup05 bacula-dir: Bacula interrupted by signal 11:
> Segmentation violation

Try googling for sig11?

You may have a DIMM or some other hardware going bad.

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: frequent segfaults

Josip Deanovic
On Tuesday 2017-01-24 09:48:25 Craig Shiroma wrote:
> Thanks, Dimitri!
>
> I checked the hardware server logs (not /var/log/messages) and don't see
> any memory alerts.

Have you tried to put server offline and run the memtest86+ out of the
system (from a bootable CD or USB stick) for 12+ hours?

Faulty RAM could also lead to file system (and other databases)
corruption.

I would also check the integrity of the bacula package if your packaging
system is capable of doing this.

Also, check the disk drives using their S.M.A.R.T. capability (use the
long test because short test in most cases could miss bad blocks).

--
Josip Deanovic

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: frequent segfaults

Josh Fisher
In reply to this post by Craig Shiroma

On 1/24/2017 1:57 PM, Craig Shiroma wrote:
This is what appears in /var/log/messages:

Jan 23 22:42:39 bup05 bacula-dir: Bacula interrupted by signal 11: Segmentation violation

If you compiled from source, then check your compiler flags or try a RPM package. Otherwise, it is likely a hardware issue. Many things could cause it, including bad RAM, overheated CPU, bad swap disk, not enough virtual memory, VM with incorrect CPU parameters or insufficient memory allocation, etc.



On Tue, Jan 24, 2017 at 8:44 AM, Craig Shiroma <[hidden email]> wrote:
Hello All,

Within the last 3-4 weeks, our Bacula 7.0.5 on RHEL 6 has experienced bacula-dir segfaults once or twice a week.  Before that everything ran fine.

Memory and CPU look fine on our Director server.  Nothing has changed on the server before the segfaults started happening as far as I can remember other than possibly a few more jobs being added.  

What is the best way to troubleshoot the cause of these segfaults?

Best regards,
Craig



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot


_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: frequent segfaults

Dimitri Maziuk
On 2017-01-25 07:19, Josh Fisher wrote:

> If you compiled from source, then check your compiler flags

If it ran fine before, that's not impossible, but rather unlikely to be
the cause. You'd expect at least a major libc update or something to
break it...

One thing I forgot to mention is just re-seating the DIMMs *might* fix
it. That's the easiest thing to try, plus when you crack it open you
might see something. Like a gunked-up cpu fan.

Dima


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: frequent segfaults

Craig Shiroma
Hi All,

Thank you very much for all the help!

Sorry for the delayed reply.  What I've done so far is move the Director VM to a different ESXi host.  So far, no segfaults although I'll give it a week before raising the all clear flag.

There are normally many VMs on the ESXi host that the Director VM was running on that experienced the segfault.  Is it out of the ordinary that only Bacula has experienced problems so far?  Other apps on the host have not experienced any problems that I know of.  However, like I mentioned so far no segfaults since I v-motioned it to a different ESXi host, so it's probably something about the machine like many of you suggested.  I'll have to check with our VMware folks to see when the host can be put in maintenance mode so the memtest can be run.  BTW, as far as I know we did not compile from source although I'll have to check with the person who setup our Bacula installation to be positive (who now works for different company).

I'll post back in a week...earlier if a segfault occurs.

Again, much thanks for the help!

Craig


On Wed, Jan 25, 2017 at 5:58 AM, Dimitri Maziuk <[hidden email]> wrote:
On 2017-01-25 07:19, Josh Fisher wrote:

> If you compiled from source, then check your compiler flags

If it ran fine before, that's not impossible, but rather unlikely to be
the cause. You'd expect at least a major libc update or something to
break it...

One thing I forgot to mention is just re-seating the DIMMs *might* fix
it. That's the easiest thing to try, plus when you crack it open you
might see something. Like a gunked-up cpu fan.

Dima


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: frequent segfaults

Josh Fisher

On 1/27/2017 4:04 AM, Craig Shiroma wrote:
Hi All,

Thank you very much for all the help!

Sorry for the delayed reply.  What I've done so far is move the Director VM to a different ESXi host.  So far, no segfaults although I'll give it a week before raising the all clear flag.


Are the VM's configured exactly the same on both ESXi hosts? A difference in VCPU, memory limit, etc. in the VM config could potentially reveal the problem.

There are normally many VMs on the ESXi host that the Director VM was running on that experienced the segfault.  Is it out of the ordinary that only Bacula has experienced problems so far?  Other apps on the host have not experienced any problems that I know of.  However, like I mentioned so far no segfaults since I v-motioned it to a different ESXi host, so it's probably something about the machine like many of you suggested.  I'll have to check with our VMware folks to see when the host can be put in maintenance mode so the memtest can be run.  BTW, as far as I know we did not compile from source although I'll have to check with the person who setup our Bacula installation to be positive (who now works for different company).

I'll post back in a week...earlier if a segfault occurs.

Again, much thanks for the help!

Craig


On Wed, Jan 25, 2017 at 5:58 AM, Dimitri Maziuk <[hidden email]> wrote:
On 2017-01-25 07:19, Josh Fisher wrote:

> If you compiled from source, then check your compiler flags

If it ran fine before, that's not impossible, but rather unlikely to be
the cause. You'd expect at least a major libc update or something to
break it...

One thing I forgot to mention is just re-seating the DIMMs *might* fix
it. That's the easiest thing to try, plus when you crack it open you
might see something. Like a gunked-up cpu fan.

Dima


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot


_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: frequent segfaults

Craig Shiroma
Hi Josh,

>Are the VM's configured exactly the same on both ESXi hosts? A difference in VCPU, memory limit, etc. in the VM config could potentially reveal the problem.

The VM that bacula-dir is running on is configured exactly the same.  No changes were made to it before I vMotioned it to a different esxi host.

Thanks for the help!
-craig


On Fri, Jan 27, 2017 at 3:00 AM, Josh Fisher <[hidden email]> wrote:

On 1/27/2017 4:04 AM, Craig Shiroma wrote:
Hi All,

Thank you very much for all the help!

Sorry for the delayed reply.  What I've done so far is move the Director VM to a different ESXi host.  So far, no segfaults although I'll give it a week before raising the all clear flag.


Are the VM's configured exactly the same on both ESXi hosts? A difference in VCPU, memory limit, etc. in the VM config could potentially reveal the problem.


There are normally many VMs on the ESXi host that the Director VM was running on that experienced the segfault.  Is it out of the ordinary that only Bacula has experienced problems so far?  Other apps on the host have not experienced any problems that I know of.  However, like I mentioned so far no segfaults since I v-motioned it to a different ESXi host, so it's probably something about the machine like many of you suggested.  I'll have to check with our VMware folks to see when the host can be put in maintenance mode so the memtest can be run.  BTW, as far as I know we did not compile from source although I'll have to check with the person who setup our Bacula installation to be positive (who now works for different company).

I'll post back in a week...earlier if a segfault occurs.

Again, much thanks for the help!

Craig


On Wed, Jan 25, 2017 at 5:58 AM, Dimitri Maziuk <[hidden email]> wrote:
On 2017-01-25 07:19, Josh Fisher wrote:

> If you compiled from source, then check your compiler flags

If it ran fine before, that's not impossible, but rather unlikely to be
the cause. You'd expect at least a major libc update or something to
break it...

One thing I forgot to mention is just re-seating the DIMMs *might* fix
it. That's the easiest thing to try, plus when you crack it open you
might see something. Like a gunked-up cpu fan.

Dima


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot


_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: frequent segfaults

compdoc

On 01/27/2017 10:29 PM, Craig Shiroma wrote:

The VM that bacula-dir is running on is configured exactly the same. 

In these discussions, I don't think Ive seen you mention checking SMART info.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: frequent segfaults

Craig Shiroma
Hello All,

My apologies for the late reply.  Been super busy with other work.

I haven't been able to run the memory test yet.  However, no segfaults have occurred since moving the director VM to a different ESXi host.  So, it's obviously something with the host machine.  The only odd thing is no other VMs are experiencing problems, but still, moving the VM off seemed to have fixed the problem.  Thank you all for the help!

If I ever get the chance to run the memory test, I will report back.

Craig

On Sat, Jan 28, 2017 at 4:43 AM, compdoc <[hidden email]> wrote:

On 01/27/2017 10:29 PM, Craig Shiroma wrote:

The VM that bacula-dir is running on is configured exactly the same. 

In these discussions, I don't think Ive seen you mention checking SMART info.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bacula-users
Loading...