Data hard disk entered predictive failure status

Data hard disk entered predictive failure status.

EM Alert: EM Event: Critical:excellnode02 – Data hard disk entered predictive failure status.

Details :

Cell Node : 02
Status : WARNING – PREDICTIVE FAILURE,
POOR PERFORMANCE Manufacturer :
Size : 4.0TB
Firmware : A3A0
Slot Number : 5
Cell Disk : CD_05_excellnode02
Grid Disk : DATAC1_CD_05_excellnode02, DBFS_DG_CD_05_excellnode02, RECOC1_CD_05_excellnode02

Verification

CELLCLI>list celldisk CD_05_excellnode02 detail
         name:                   CD_05_excellnode02
         comment:
         creationTime:           2015-07-29T20:03:26+00:00
         deviceName:             /dev/sdn
         devicePartition:        /dev/sdf
         diskType:               HardDisk
         errorCount:             212
         freeSpace:              0
         interleaving:           none
         lun:                    0_5
         physicalDisk:           E9LMRX
         raidLevel:              0
         size:                   3.637969970703125T
         status:                 proactive failure

Actions taken :

Got the hard disk Replaced by field engineer.

Post steps after replacing Harddisk

Followed  below steps after FE Replaced Failed Disk on 2nd Cellnode in 5th Slot.
  1. Login to compute node – excompnode01
  2. sudo to root user  – sudo su – root ( it wont prompt password)
  3.  cat /home/oracle/cell_group
  4.  login to 2nd cellnode.  ssh excellnode02
  5.  run following command  – cellcli  and it takes to cellcli prompt
  6.  list celldisk CD_05_excellnode02 detail
    •       =========== the creationtime should show present date and time stamp and status should show as normal
  7.  list griddisk  attributes name,asmmodestatus,size where celldisk=CD_05_excellnode02
    • ================== all disks should show as online
  8.  list physicaldisk 8:5 detail    ======== the physicalinsertime should show present date and time stamp and status should show as normal
  9.  Now come to first compute node and login to asm instance and run below query
  10.  Sqlplus / as sysasm
  11.  Select * from gv$asm_operation;
  • ======> This query should return the rows stating rebalancing is happening.
Once this result is showing, Feild Engineer can leave DC

Sample  alert looks like below :

Subject: EM Event: Critical:excellnode02 – Data hard disk entered predictive failure status.
Host=excompnode08
Target type=Oracle Exadata Storage Server
Target name=excellnode02
Categories=Fault
Message=Data hard disk entered predictive failure status. Status : WARNING – PREDICTIVE FAILURE, POOR PERFORMANCE Severity=Critical
Operating System=Linux
Platform=x86_64
Associated Incident Id=205877
Associated Incident Status=New
Associated Incident Owner=
Associated Incident Acknowledged By Owner=No
Associated Incident Priority=None
Associated Incident Escalation Level=0
Event Type=Metric Alert
Event name=Cell_Generated_Alert:alerttype
Metric Group=Cell Generated Alert
Metric=Alert Type
Metric value=Stateful
Key Column 1=Alert Name
Key Column 1 Value=Hardware
Key Column 2=Alert Sequence
Key Column 2 Value=6
Key Column 3 Value=
Key Column 4 Value=
Key Column 5 Value=
Key Column 6 Value=
Key Column 7 Value=
Rule Name=Incident management rule set for all targets,Create incident for critical metric alerts
Rule Owner=System Generated
Update Details:
Data hard disk entered predictive failure status. Status : WARNING – PREDICTIVE FAILURE,
 POOR PERFORMANCE Manufacturer :

See also: