Nbpem Connection Errors in NetBackup 7.7.x and 8.0

Problem: Nbpem may crash after changing policy schedule type from Calendar to Frequency

 Solution: The workaround is to be aware of the appropriate values in the schedule. Ultimately, delete the policy to get nbpem to load.

Here is a step by step recap:

8 RMAN jobs were failing with status 25 (cant connect) connection errors while bprd connected successfully. Found no transmission control protocol, host resolution, or network issues.

RMAN log showed that it was failing after sending policy info


0:13:09.334 [3020] <2> async_connect: [vnet_connect.c:1477] connect in progress 1 0x1

00:13:09.334 [3020] <2> vnet_pbxConnect: pbxConnectEx Succeeded

00:13:09.334 [3020] <2> do_pbx_service: [vnet_connect.c:2116] via PBX bprd CONNECT FROM TO fd = 15

00:13:09.334 [3020] <2> async_connect: [vnet_connect.c:1644] connect async CONNECT FROM TO fd = 15

00:13:09.334 [3020] <2> connect_to_service: connect succeeded STATUS (0) SUCCESS FROM TO bprd VIA pbx

00:13:09.335 [3020] <2> logconnections: BPRD CONNECT FROM TO fd = 15

00:13:09.335 [3020] <2> vnet_check_vxss_client_magic_with_info: [vnet_vxss_helper.c:833] VxSS not supported 0 0x0

00:13:09.335 [3020] <4> serverResponse: entering serverResponse.

00:13:09.335 [3020] <4> serverResponse: initial client_read_timeout = <1800>

00:13:09.335 [3020] <4> readCommMessages: Entering readCommMessages

00:13:10.335 [3020] <4> serverResponse: read comm file:<00:13:09 Initiating backup>

00:13:10.335 [3020] <4> serverResponse: read comm file:<00:13:10 INF - Server status = 25>

00:13:10.335 [3020] <16> serverResponse: ERR - server exited with status 25: cannot connect on socket

00:13:10.335 [3020] <16> CreateNewImage: ERR - serverResponse() failed

00:13:10.335 [3020] <4> closeApi: entering closeApi.

00:13:10.335 [3020] <4> closeApi: INF - EXIT STATUS 6: the backup failed to back up the requested files

A shared memory and nbpem error was in the messages log that started intermittently at the same time the rman jobs started failing.

May  4 15:52:45 server01 SQLAnywhere(nb_servers01): Disconnecting shared memory client, process id not found

May  4 15:52:45 server01 SQLAnywhere(nb_servers01): Disconnected SharedMemory client's AppInfo: IP=;;OSUSER=root;OS='Linux 2.6.32-504.30.3.el6.x86_64 #1 SMP Thu Jul 9 15:20:47 EDT 2015 x';EXE=/usr/openv/netbackup/bin/nbpem;PID=0x2f92;THREAD=0x7f51c5a17700;VERSION=;API=ODBC;TIMEZONEADJUSTMENT=-360

After rebooting the appliance, we discovered a NBPEM process coredumping with an assertion error in the nbpem log.

Cleaned NBPEM cache: Procedure for clearing the NetBackup Policy Execution Manager (nbpem) cache on NetBackup 7.x

Then discovered this technote regarding NBPEM assertion errors causing status 25 (cant connect): NetBackup 7.1 NBPEM process is core dumping on start with assertion error after LastBackupData::getWindowTimes and anything connecting to nbpem gets status 25.

Disabled poilicies and restarted services and nbpem stayed up.

One of the NBU admins admitted to copying a policy and a schedule issue at the time the nbpem shared memory error started.

We deleted the corrupt policy; restarted services; restarted jobs.

