Проблем несколько:
1. Сервак начал часто и спонтанно ребутится. (иногда несколько раз в день)
2. Рейд постоянно разваливается после ребутов
3. smart ругается на ad10 "Currently unreadable (pending) sectors"
Но такое чувство что все эти проблемы связаны между собой...
Вот такое видно после очередного ребута:
т.е. ругань смарта предшествует. Короткий self test тоже не проходит по этому винту, шля репорт о error 'Device: /dev/ad10, 2 Currently unreadable (pending) sectors'
# dmesg
Код: Выделить всё
Apr 22 05:19:28 srv smartd[847]: Device: /dev/ad10, 2 Currently unreadable (pending) sectors
Apr 22 05:24:28 srv smartd[847]: Device: /dev/ad10, 2 Currently unreadable (pending) sectors
Apr 22 05:29:28 srv smartd[847]: Device: /dev/ad10, 2 Currently unreadable (pending) sectors
Apr 22 05:39:27 srv last message repeated 2 times
Apr 22 05:49:28 srv last message repeated 2 times
Apr 22 05:59:29 srv last message repeated 2 times
Apr 22 06:20:54 srv syslogd: kernel boot file is /boot/kernel/kernel
При этом рейд заканомерно разваливается: (типа ad8 is stale)
# dmesg:
Код: Выделить всё
ad8: 305245MB <SAMSUNG HD321KJ CP100-12> at ata4-master SATA150
GEOM_MIRROR: Device gm0 created (id=1752941073).
GEOM_MIRROR: Device gm0: provider ad8 detected.
ad10: 305245MB <SAMSUNG HD321KJ CP100-12> at ata5-master SATA150
GEOM_MIRROR: Device gm0: provider ad10 detected.
GEOM_MIRROR: Device gm0: provider ad10 activated.
GEOM_MIRROR: Device gm0: provider ad8 is stale.
GEOM_MIRROR: Device gm0: provider mirror/gm0 launched.
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/mirror/gm0s1a
WARNING: / was not properly dismounted
Код: Выделить всё
Geom name: gm0
State: DEGRADED
Components: 2
Balance: round-robin
Slice: 4096
Flags: NOAUTOSYNC
GenID: 4
SyncID: 17
ID: 1752941073
Providers:
1. Name: mirror/gm0
Mediasize: 320072932864 (298G)
Sectorsize: 512
Mode: r6w6e7
Consumers:
1. Name: ad8
Mediasize: 320072933376 (298G)
Sectorsize: 512
Mode: r1w1e1
State: STALE
Priority: 0
Flags: SYNCHRONIZING
GenID: 4
SyncID: 16
ID: 1021375928
2. Name: ad10
Mediasize: 320072933376 (298G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: DIRTY
GenID: 4
SyncID: 17
ID: 3314192503
Мне очень непонятно, если таки проблемы с винтом ad10, то почему не он выпадает из рейда, а ад8 ?
# smartctl -A /dev/ad8
Код: Выделить всё
smartctl version 5.37 [i386-portbld-freebsd6.1] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 78
3 Spin_Up_Time 0x0007 100 100 015 Pre-fail Always - 2112
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 20
5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 253 253 015 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 4047
10 Spin_Retry_Count 0x0033 253 253 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 1
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 20
13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 464770596
187 Unknown_Attribute 0x0032 099 099 000 Old_age Always - 8716290
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 3
190 Temperature_Celsius 0x0022 083 061 000 Old_age Always - 17
194 Temperature_Celsius 0x0022 187 121 000 Old_age Always - 17
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 464770596
196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 253 253 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0
202 TA_Increase_Count 0x0032 253 253 000 Old_age Always - 0
Код: Выделить всё
smartctl version 5.37 [i386-portbld-freebsd6.1] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 108
3 Spin_Up_Time 0x0007 100 100 015 Pre-fail Always - 2112
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 17
5 Reallocated_Sector_Ct 0x0033 253 253 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 253 253 015 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 4045
10 Spin_Retry_Count 0x0033 253 253 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 253 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 17
13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 824253489
187 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 786433
188 Unknown_Attribute 0x0032 253 253 000 Old_age Always - 0
190 Temperature_Celsius 0x0022 083 068 000 Old_age Always - 17
194 Temperature_Celsius 0x0022 184 142 000 Old_age Always - 18
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 824253489
196 Reallocated_Event_Count 0x0032 253 253 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 2
198 Offline_Uncorrectable 0x0030 253 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0
202 TA_Increase_Count 0x0032 253 253 000 Old_age Always - 0