Data hard disk entered predictive failure status

Data hard disk entered predictive failure status.
EM Alert: EM Event: Critical:excellnode02 – Data hard disk entered predictive failure status.
Details :
Cell Node : 02
Status : WARNING – PREDICTIVE FAILURE,
POOR PERFORMANCE Manufacturer :
Size : 4.0TB
Firmware : A3A0
Slot Number : 5
Cell Disk : CD_05_excellnode02
Grid Disk : DATAC1_CD_05_excellnode02, DBFS_DG_CD_05_excellnode02, RECOC1_CD_05_excellnode02
Verification
CELLCLI>list celldisk CD_05_excellnode02 detail
name: CD_05_excellnode02
comment:
creationTime: 2015-07-29T20:03:26+00:00
deviceName: /dev/sdn
devicePartition: /dev/sdf
diskType: HardDisk
errorCount: 212
freeSpace: 0
interleaving: none
lun: 0_5
physicalDisk: E9LMRX
raidLevel: 0
size: 3.637969970703125T
status: proactive failure
|
Actions taken :
Got the hard disk Replaced by field engineer.
Post steps after replacing Harddisk
Followed below steps after FE Replaced Failed Disk on 2nd Cellnode in 5th Slot.
- Login to compute node – excompnode01
- sudo to root user – sudo su – root ( it wont prompt password)
- cat /home/oracle/cell_group
- login to 2nd cellnode. ssh excellnode02
- run following command – cellcli and it takes to cellcli prompt
- list celldisk CD_05_excellnode02 detail
- =========== the creationtime should show present date and time stamp and status should show as normal
- list griddisk attributes name,asmmodestatus,size where celldisk=CD_05_excellnode02
- ================== all disks should show as online
- list physicaldisk 8:5 detail ======== the physicalinsertime should show present date and time stamp and status should show as normal
- Now come to first compute node and login to asm instance and run below query
- Sqlplus / as sysasm
- Select * from gv$asm_operation;
- ======> This query should return the rows stating rebalancing is happening.
Once this result is showing, Feild Engineer can leave DC
Sample alert looks like below :
Subject: EM Event: Critical:excellnode02 – Data hard disk entered predictive failure status.
Host=excompnode08
Target type=Oracle Exadata Storage Server
Target name=excellnode02
Categories=Fault
Message=Data hard disk entered predictive failure status. Status : WARNING – PREDICTIVE FAILURE, POOR PERFORMANCE Severity=Critical
Operating System=Linux
Platform=x86_64
Associated Incident Id=205877
Associated Incident Status=New
Associated Incident Owner=
Associated Incident Acknowledged By Owner=No
Associated Incident Priority=None
Associated Incident Escalation Level=0
Event Type=Metric Alert
Event name=Cell_Generated_Alert:alerttype
Metric Group=Cell Generated Alert
Metric=Alert Type
Metric value=Stateful
Key Column 1=Alert Name
Key Column 1 Value=Hardware
Key Column 2=Alert Sequence
Key Column 2 Value=6
Key Column 3 Value=
Key Column 4 Value=
Key Column 5 Value=
Key Column 6 Value=
Key Column 7 Value=
Rule Name=Incident management rule set for all targets,Create incident for critical metric alerts
Rule Owner=System Generated
Update Details:
Data hard disk entered predictive failure status. Status : WARNING – PREDICTIVE FAILURE,
POOR PERFORMANCE Manufacturer :