Problem: Nbpem may crash after changing policy schedule type from Calendar to Frequency
Solution: The workaround is to be aware of the appropriate values in the schedule. Ultimately, delete the policy to get nbpem to load.
Here is a step by step recap:
8 RMAN jobs were failing with status 25 (cant connect) connection errors while bprd connected successfully. Found no transmission control protocol, host resolution, or network issues.
RMAN log showed that it was failing after sending policy info
0:13:09.334  <2> async_connect: [vnet_connect.c:1477] connect in progress 1 0x1
00:13:09.334  <2> vnet_pbxConnect: pbxConnectEx Succeeded
00:13:09.334  <2> do_pbx_service: [vnet_connect.c:2116] via PBX bprd CONNECT FROM 10.162.244.101.61071 TO 10.162.244.242.1556 fd = 15
00:13:09.334  <2> async_connect: [vnet_connect.c:1644] connect async CONNECT FROM 10.162.244.101.61071 TO 10.162.244.242.1556 fd = 15
00:13:09.334  <2> connect_to_service: connect succeeded STATUS (0) SUCCESS FROM 0.0.0.0 TO server01.domain.com 10.162.244.242 bprd VIA pbx
00:13:09.335  <2> logconnections: BPRD CONNECT FROM 10.162.244.101.61071 TO 10.162.244.242.1556 fd = 15
00:13:09.335  <2> vnet_check_vxss_client_magic_with_info: [vnet_vxss_helper.c:833] VxSS not supported 0 0x0
00:13:09.335  <4> serverResponse: entering serverResponse.
00:13:09.335  <4> serverResponse: initial client_read_timeout = <1800>
00:13:09.335  <4> readCommMessages: Entering readCommMessages
00:13:10.335  <4> serverResponse: read comm file:<00:13:09 Initiating backup>
00:13:10.335  <4> serverResponse: read comm file:<00:13:10 INF - Server status = 25>
00:13:10.335  <16> serverResponse: ERR - server exited with status 25: cannot connect on socket
00:13:10.335  <16> CreateNewImage: ERR - serverResponse() failed
00:13:10.335  <4> closeApi: entering closeApi.
00:13:10.335  <4> closeApi: INF - EXIT STATUS 6: the backup failed to back up the requested files
A shared memory and nbpem error was in the messages log that started intermittently at the same time the rman jobs started failing.
May 4 15:52:45 server01 SQLAnywhere(nb_servers01): Disconnecting shared memory client, process id not found
May 4 15:52:45 server01 SQLAnywhere(nb_servers01): Disconnected SharedMemory client's AppInfo: IP=192.168.229.233;HOST=server01.domain.com;OSUSER=root;OS='Linux 2.6.32-504.30.3.el6.x86_64 #1 SMP Thu Jul 9 15:20:47 EDT 2015 x';EXE=/usr/openv/netbackup/bin/nbpem;PID=0x2f92;THREAD=0x7f51c5a17700;VERSION=220.127.116.114;API=ODBC;TIMEZONEADJUSTMENT=-360
After rebooting the appliance, we discovered a NBPEM process coredumping with an assertion error in the nbpem log.
Cleaned NBPEM cache: Procedure for clearing the NetBackup Policy Execution Manager (nbpem) cache on NetBackup 7.x
Then discovered this technote regarding NBPEM assertion errors causing status 25 (cant connect): NetBackup 7.1 NBPEM process is core dumping on start with assertion error after LastBackupData::getWindowTimes and anything connecting to nbpem gets status 25.
Disabled poilicies and restarted services and nbpem stayed up.
One of the NBU admins admitted to copying a policy and a schedule issue at the time the nbpem shared memory error started.
We deleted the corrupt policy; restarted services; restarted jobs.
For more information, go to http://www.veritas.com/docs/000125511