Data hard disk entered predictive failure status.

EM Alert: EM Event: Critical:excellnode02 – Data hard disk entered predictive failure status.

Details :

Cell Node : 02

Status : WARNING – PREDICTIVE FAILURE,

POOR PERFORMANCE Manufacturer :

Size : 4.0TB

Firmware : A3A0

Slot Number : 5

Cell Disk : CD_05_excellnode02

Grid Disk : DATAC1_CD_05_excellnode02, DBFS_DG_CD_05_excellnode02, RECOC1_CD_05_excellnode02

Verification

CELLCLI>list celldisk CD_05_excellnode02 detail

name: CD_05_excellnode02

comment:

creationTime: 2015-07-29T20:03:26+00:00

deviceName: /dev/sdn

devicePartition: /dev/sdf

diskType: HardDisk

errorCount: 212

freeSpace: 0

interleaving: none

lun: 0_5

physicalDisk: E9LMRX

raidLevel: 0

size: 3.637969970703125T

status: proactive failure

Actions taken :

Got the hard disk Replaced by field engineer.

Post steps after replacing Harddisk

Followed below steps after FE Replaced Failed Disk on 2nd Cellnode in 5th Slot.

Login to compute node – excompnode01
sudo to root user – sudo su – root ( it wont prompt password)
cat /home/oracle/cell_group
login to 2nd cellnode. ssh excellnode02
run following command – cellcli and it takes to cellcli prompt
list celldisk CD_05_excellnode02 detail
- =========== the creationtime should show present date and time stamp and status should show as normal
list griddisk attributes name,asmmodestatus,size where celldisk=CD_05_excellnode02
- ================== all disks should show as online
list physicaldisk 8:5 detail ======== the physicalinsertime should show present date and time stamp and status should show as normal
Now come to first compute node and login to asm instance and run below query
Sqlplus / as sysasm
Select * from gv$asm_operation;

======> This query should return the rows stating rebalancing is happening.

Once this result is showing, Feild Engineer can leave DC

Sample alert looks like below :

Subject: EM Event: Critical:excellnode02 – Data hard disk entered predictive failure status.

Host=excompnode08

Target type=Oracle Exadata Storage Server

Target name=excellnode02

Categories=Fault

Message=Data hard disk entered predictive failure status. Status : WARNING – PREDICTIVE FAILURE, POOR PERFORMANCE Severity=Critical

Operating System=Linux

Platform=x86_64

Associated Incident Id=205877

Associated Incident Status=New

Associated Incident Owner=

Associated Incident Acknowledged By Owner=No

Associated Incident Priority=None

Associated Incident Escalation Level=0

Event Type=Metric Alert

Event name=Cell_Generated_Alert:alerttype

Metric Group=Cell Generated Alert

Metric=Alert Type

Metric value=Stateful

Key Column 1=Alert Name

Key Column 1 Value=Hardware

Key Column 2=Alert Sequence

Key Column 2 Value=6

Key Column 3 Value=

Key Column 4 Value=

Key Column 5 Value=

Key Column 6 Value=

Key Column 7 Value=

Rule Name=Incident management rule set for all targets,Create incident for critical metric alerts

Rule Owner=System Generated

Update Details:

Data hard disk entered predictive failure status. Status : WARNING – PREDICTIVE FAILURE,

POOR PERFORMANCE Manufacturer :

Data hard disk entered predictive failure status

Data hard disk entered predictive failure status.

Details :

Verification

Actions taken :

Post steps after replacing Harddisk

Sample alert looks like below :

See also:

Footer Links

Subscribe

All Technologies

News

Data hard disk entered predictive failure status

Data hard disk entered predictive failure status.

Details :

Verification

Actions taken :

Post steps after replacing Harddisk

Sample alert looks like below :

See also:

Footer Links

Subscribe

All Technologies