MS Status is DOWN & Ping is SUCCESS in Exadata.

Fatal Error Message

Message=ORPRD_CELLSRVR10 is down. MS Status is DOWN and Ping Status is SUCCESS.

Below is the step by step procedure for verifying alert by using dcli command in Exadata server.

Verification :
status of Cell services on Exadata Server

dcli command on uptime and output

[root@ORPRD_SRVR01 ~]# dcli -l root -g all_group “uptime”

output from compute nodes

ORPRD_SRVR01: 02:39:30 up 316 days, 10:46, 3 users, load average: 2.98, 2.97, 2.95
ORPRD_SRVR02: 02:39:31 up 316 days, 10:52, 0 users, load average: 5.99, 4.82, 4.32
ORPRD_SRVR03: 02:39:30 up 316 days, 11:04, 0 users, load average: 2.89, 2.84, 2.91
ORPRD_SRVR04: 02:39:30 up 374 days, 23:20, 0 users, load average: 2.17, 2.28, 2.24
ORPRD_SRVR05: 02:39:30 up 374 days, 22:49, 0 users, load average: 2.43, 2.28, 2.24
ORPRD_SRVR06: 02:39:30 up 33 days, 23:40, 0 users, load average: 2.03, 1.91, 1.77
ORPRD_SRVR07: 02:39:30 up 374 days, 22:46, 0 users, load average: 179.46, 178.36, 178.11
ORPRD_SRVR08: 02:39:30 up 374 days, 22:26, 0 users, load average: 1.94, 2.03, 2.02

output from cell nodes

ORPRD_CELLSRVR01: 02:39:30 up 686 days, 9:31, 0 users, load average: 0.71, 0.77, 0.84
ORPRD_CELLSRVR02: 02:39:30 up 686 days, 9:27, 0 users, load average: 0.99, 0.93, 0.88
ORPRD_CELLSRVR03: 02:39:30 up 686 days, 9:32, 0 users, load average: 0.97, 0.91, 0.89
ORPRD_CELLSRVR04: 02:39:30 up 686 days, 9:34, 0 users, load average: 0.96, 1.05, 0.96
ORPRD_CELLSRVR05: 02:39:30 up 507 days, 14:31, 0 users, load average: 1.00, 0.93, 0.85
ORPRD_CELLSRVR06: 02:39:30 up 686 days, 9:32, 0 users, load average: 0.72, 0.89, 0.93
ORPRD_CELLSRVR07: 02:39:30 up 686 days, 9:36, 0 users, load average: 0.77, 0.92, 0.93
ORPRD_CELLSRVR08: 02:39:30 up 686 days, 9:39, 0 users, load average: 1.07, 0.91, 0.91
ORPRD_CELLSRVR09: 02:39:30 up 686 days, 9:37, 1 user, load average: 1.24, 1.01, 0.97
ORPRD_CELLSRVR10: 02:39:30 up 686 days, 9:15, 0 users, load average: 0.85, 0.90, 0.92
ORPRD_CELLSRVR11: 02:39:30 up 521 days, 20:11, 0 users, load average: 0.81, 0.85, 0.85
ORPRD_CELLSRVR12: 02:39:30 up 686 days, 9:21, 0 users, load average: 1.06, 1.00, 0.98
ORPRD_CELLSRVR13: 02:39:30 up 686 days, 9:19, 0 users, load average: 0.76, 0.89, 0.86
ORPRD_CELLSRVR14: 02:39:30 up 686 days, 9:21, 0 users, load average: 1.30, 1.04, 0.96

dcli command on “service celld status” and output

[root@ORPRD_SRVR01 ~]# dcli -l root -g cell_group “service celld status”

ORPRD_CELLSRVR01: rsStatus: running
ORPRD_CELLSRVR01: msStatus: running
ORPRD_CELLSRVR01: cellsrvStatus: running
ORPRD_CELLSRVR02: rsStatus: running
ORPRD_CELLSRVR02: msStatus: running
ORPRD_CELLSRVR02: cellsrvStatus: running
ORPRD_CELLSRVR03: rsStatus: running
ORPRD_CELLSRVR03: msStatus: running
ORPRD_CELLSRVR03: cellsrvStatus: running
ORPRD_CELLSRVR04: rsStatus: running
ORPRD_CELLSRVR04: msStatus: running
ORPRD_CELLSRVR04: cellsrvStatus: running
ORPRD_CELLSRVR05: rsStatus: running
ORPRD_CELLSRVR05: msStatus: running
ORPRD_CELLSRVR05: cellsrvStatus: running
ORPRD_CELLSRVR06: rsStatus: running
ORPRD_CELLSRVR06: msStatus: running
ORPRD_CELLSRVR06: cellsrvStatus: running
ORPRD_CELLSRVR07: rsStatus: running
ORPRD_CELLSRVR07: msStatus: running
ORPRD_CELLSRVR07: cellsrvStatus: running
ORPRD_CELLSRVR08: rsStatus: running
ORPRD_CELLSRVR08: msStatus: running
ORPRD_CELLSRVR08: cellsrvStatus: running
ORPRD_CELLSRVR09: rsStatus: running
ORPRD_CELLSRVR09: msStatus: running
ORPRD_CELLSRVR09: cellsrvStatus: running
ORPRD_CELLSRVR10: rsStatus: running
ORPRD_CELLSRVR10: msStatus: running
ORPRD_CELLSRVR10: cellsrvStatus: running
ORPRD_CELLSRVR11: rsStatus: running
ORPRD_CELLSRVR11: msStatus: running
ORPRD_CELLSRVR11: cellsrvStatus: running
ORPRD_CELLSRVR12: rsStatus: running
ORPRD_CELLSRVR12: msStatus: running
ORPRD_CELLSRVR12: cellsrvStatus: running
ORPRD_CELLSRVR13: rsStatus: running
ORPRD_CELLSRVR13: msStatus: running
ORPRD_CELLSRVR13: cellsrvStatus: running
ORPRD_CELLSRVR14: rsStatus: running
ORPRD_CELLSRVR14: msStatus: running
ORPRD_CELLSRVR14: cellsrvStatus: running

dcli command on “service celld status” count and output

[root@ORPRD_SRVR01 ~]# dcli -l root -g cell_group “service celld status |wc -l”

ORPRD_CELLSRVR01: 3
ORPRD_CELLSRVR02: 3
ORPRD_CELLSRVR03: 3
ORPRD_CELLSRVR04: 3
ORPRD_CELLSRVR05: 3
ORPRD_CELLSRVR06: 3
ORPRD_CELLSRVR07: 3
ORPRD_CELLSRVR08: 3
ORPRD_CELLSRVR09: 3
ORPRD_CELLSRVR10: 3
ORPRD_CELLSRVR11: 3
ORPRD_CELLSRVR12: 3
ORPRD_CELLSRVR13: 3
ORPRD_CELLSRVR14: 3

Verification in alert logfile

All services are up and Running fine and could see only below Enteries in Cell alertlog file.

[RS] Process /opt/oracle/cell/cellsrv/bin/cellrsmmt (pid: 23167) received clean shutdown signal from pid: 22903, uid: 0

[RS] Stopped Service MS

[RS] Started monitoring process /opt/oracle/cell/cellsrv/bin/cellrsmmt with pid 2556

[RS] Started Service MS with pid 2629

[RS] Process /opt/oracle/cell/cellsrv/bin/cellrsmmt (pid: 4442) received clean shutdown signal from pid: 6616, uid: 0

[RS] Stopped Service MS

[RS] Started monitoring process /opt/oracle/cell/cellsrv/bin/cellrsmmt with pid 7010

[RS] Started Service MS with pid 7079

dcli command on “list physicaldisk” count and output

[root@ORPRD_SRVR01 ~]# dcli -l root -g ~/cell_group ‘cellcli -e list physicaldisk | grep normal | wc -l’

ORPRD_CELLSRVR01: 16
ORPRD_CELLSRVR02: 16
ORPRD_CELLSRVR03: 16
ORPRD_CELLSRVR04: 16
ORPRD_CELLSRVR05: 16
ORPRD_CELLSRVR06: 16
ORPRD_CELLSRVR07: 16
ORPRD_CELLSRVR08: 16
ORPRD_CELLSRVR09: 16
ORPRD_CELLSRVR10: 16
ORPRD_CELLSRVR11: 16
ORPRD_CELLSRVR12: 16
ORPRD_CELLSRVR13: 16
ORPRD_CELLSRVR14: 16

dcli command on “list griddisk attributes asmmodestatus” and output

[root@ORPRD_SRVR01 ~]# dcli -l root -g ~/cell_group ‘cellcli -e list griddisk attributes asmmodestatus | grep ONLINE |wc -l’

ORPRD_CELLSRVR01: 34
ORPRD_CELLSRVR02: 34
ORPRD_CELLSRVR03: 34
ORPRD_CELLSRVR04: 34
ORPRD_CELLSRVR05: 34
ORPRD_CELLSRVR06: 34
ORPRD_CELLSRVR07: 34
ORPRD_CELLSRVR08: 34
ORPRD_CELLSRVR09: 34
ORPRD_CELLSRVR10: 34
ORPRD_CELLSRVR11: 34
ORPRD_CELLSRVR12: 34
ORPRD_CELLSRVR13: 34
ORPRD_CELLSRVR14: 34

dcli command on ” list griddisk attributes asmdeactivationoutcome”

[root@ORPRD_SRVR01 ~]# dcli -l root -g ~/cell_group ‘cellcli -e list griddisk attributes asmdeactivationoutcome | grep Yes |wc -l’

dcli command on “list metriccurrent” and output

[root@ORPRD_SRVR01 ~]# dcli -l root -g /root/cell_group “cellcli -e list metriccurrent | grep CL_MEMUT_MS | grep -v grep”

ORPRD_CELLSRVR01: CL_MEMUT_MS ORPRD_CELLSRVR01 0.5 %
ORPRD_CELLSRVR02: CL_MEMUT_MS ORPRD_CELLSRVR02 0.5 %
ORPRD_CELLSRVR03: CL_MEMUT_MS ORPRD_CELLSRVR03 0.5 %
ORPRD_CELLSRVR04: CL_MEMUT_MS ORPRD_CELLSRVR04 0.5 %
ORPRD_CELLSRVR05: CL_MEMUT_MS ORPRD_CELLSRVR05 0.5 %
ORPRD_CELLSRVR06: CL_MEMUT_MS ORPRD_CELLSRVR06 0.5 %
ORPRD_CELLSRVR07: CL_MEMUT_MS ORPRD_CELLSRVR07 0.5 %
ORPRD_CELLSRVR08: CL_MEMUT_MS ORPRD_CELLSRVR08 0.5 %
ORPRD_CELLSRVR09: CL_MEMUT_MS ORPRD_CELLSRVR09 0.5 %
ORPRD_CELLSRVR10: CL_MEMUT_MS ORPRD_CELLSRVR10 0.5 %
ORPRD_CELLSRVR11: CL_MEMUT_MS ORPRD_CELLSRVR11 0.5 %
ORPRD_CELLSRVR12: CL_MEMUT_MS ORPRD_CELLSRVR12 0.5 %
ORPRD_CELLSRVR13: CL_MEMUT_MS ORPRD_CELLSRVR13 0.5 %
ORPRD_CELLSRVR14: CL_MEMUT_MS ORPRD_CELLSRVR14 0.5 %
[root@ORPRD_SRVR01 ~]#

Action taken:

Services were stopped and auto restarted, can be ignored.

Complete Original Message from OEM alert looks like below.

Subject: EM Event: Fatal:ORPRD_CELLSRVR10 – ORPRD_CELLSRVR10 is down. MS Status is DOWN and Ping Status is SUCCESS.

Host=ORPRD_SRVR08
Target type=Oracle Exadata Storage Server
Target name=ORPRD_CELLSRVR10
Categories=Availability
Message=ORPRD_CELLSRVR10 is down. MS Status is DOWN and Ping Status is SUCCESS.
Severity=Fatal
Operating System=Linux
Platform=x86_64
Associated Incident Id=149882
Associated Incident Status=New
Associated Incident Owner=
Associated Incident Acknowledged By Owner=No
Associated Incident Priority=None
Associated Incident Escalation Level=0
Event Type=Target Availability
Event name=Status
Availability status=Down
Root Cause Analysis Status=Cause
Causal analysis result=Identified as a cause to 1 symptoms
Rule Name=Incident management rule set for all targets,Incident creation rule for a Target Down availability status
Rule Owner=System Generated

Alert – Exadata MS Status is DOWN and ping Status is SUCCESS in Exadata.txt
Displaying Alert – Exadata MS Status is DOWN and ping Status is SUCCESS in Exadata.txt.

See Also:

dcli on exadata, trace file generation is huge
my oracle support https://support.oracle.com
https://oracle.com

MS Status is DOWN & Ping is SUCCESS in Exadata.

MS Status is DOWN and Ping Status is SUCCESS in Exadata.

Fatal Error Message

dcli command on uptime and output

output from compute nodes

output from cell nodes

dcli command on “service celld status” and output

dcli command on “service celld status” count and output

Verification in alert logfile

dcli command on “list griddisk attributes asmmodestatus” and output

dcli command on ” list griddisk attributes asmdeactivationoutcome”

dcli command on “list metriccurrent” and output

Action taken:

Complete Original Message from OEM alert looks like below.

Footer Links

Subscribe

All Technologies

News

MS Status is DOWN & Ping is SUCCESS in Exadata.

Fatal Error Message

dcli command on uptime and output

output from compute nodes

output from cell nodes

dcli command on “service celld status” and output

dcli command on “service celld status” count and output

Verification in alert logfile

dcli command on “list griddisk attributes asmmodestatus” and output

dcli command on ” list griddisk attributes asmdeactivationoutcome”

dcli command on “list metriccurrent” and output

Action taken:

Complete Original Message from OEM alert looks like below.

Footer Links

Subscribe

All Technologies