Последнее время стала произвольно вываливаться, так как сервер расположен удаленно и квм нет, адекватные данных получить было сложно. После аппаратного ребута со стороны саппорта никаких видимых проблем по логам не наблюдалось.
После последнего инцидента, наконец в логах появились записи:
var/log/messages
Код: Выделить всё
Oct 16 16:41:08 obra smartd[940]: Device: /dev/ad8, 4 Currently unreadable (pending) sectors
Oct 16 16:41:16 obra kernel: ad10: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=78637 3695
Oct 16 16:41:16 obra kernel: ad10: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=786373695
Oct 16 16:41:16 obra kernel: g_vfs_done():ad10s1g[WRITE(offset=368813015040, length=16384)]error = 5
Код: Выделить всё
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 100 100 015 Pre-fail Always - 3328
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 19
5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 253 253 015 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 7772
10 Spin_Retry_Count 0x0033 253 253 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 253 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 19
13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 11157546
187 Reported_Uncorrect 0x0032 253 253 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 253 253 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 077 051 000 Old_age Always - 23
194 Temperature_Celsius 0x0022 166 088 000 Old_age Always - 24
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 11157546
196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0
202 TA_Increase_Count 0x0032 253 253 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 3
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 3 occurred at disk power-on lifetime: 7552 hours (314 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 80 9f 2a 1a e3 Error: ICRC, ABRT at LBA = 0x031a2a9f = 52046495
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 80 9f 2a 1a e3 00 40d+22:46:33.938 WRITE DMA
ca 00 80 1f 2a 1a e3 00 40d+22:46:33.938 WRITE DMA
ca 00 80 9f 29 1a e3 00 40d+22:46:33.938 WRITE DMA
ca 00 80 1f 29 1a e3 00 40d+22:46:33.938 WRITE DMA
ca 00 80 9f 28 1a e3 00 40d+22:46:33.938 WRITE DMA