Проблем несколько:
1. Сервак начал часто и спонтанно ребутится. (иногда несколько раз в день)
2. Рейд постоянно разваливается после ребутов
3. smart ругается на ad10 "Currently unreadable (pending) sectors"
Но такое чувство что все эти проблемы связаны между собой...
Вот такое видно после очередного ребута:
т.е. ругань смарта предшествует. Короткий self test тоже не проходит по этому винту, шля репорт о error 'Device: /dev/ad10, 2 Currently unreadable (pending) sectors'
# dmesg
Код: Выделить всё
Apr 22 05:19:28 srv smartd[847]: Device: /dev/ad10, 2 Currently unreadable (pending) sectors
Apr 22 05:24:28 srv smartd[847]: Device: /dev/ad10, 2 Currently unreadable (pending) sectors
Apr 22 05:29:28 srv smartd[847]: Device: /dev/ad10, 2 Currently unreadable (pending) sectors
Apr 22 05:39:27 srv last message repeated 2 times
Apr 22 05:49:28 srv last message repeated 2 times
Apr 22 05:59:29 srv last message repeated 2 times
Apr 22 06:20:54 srv syslogd: kernel boot file is /boot/kernel/kernelПри этом рейд заканомерно разваливается: (типа ad8 is stale)
# dmesg:
Код: Выделить всё
ad8: 305245MB <SAMSUNG HD321KJ CP100-12> at ata4-master SATA150
GEOM_MIRROR: Device gm0 created (id=1752941073).
GEOM_MIRROR: Device gm0: provider ad8 detected.
ad10: 305245MB <SAMSUNG HD321KJ CP100-12> at ata5-master SATA150
GEOM_MIRROR: Device gm0: provider ad10 detected.
GEOM_MIRROR: Device gm0: provider ad10 activated.
GEOM_MIRROR: Device gm0: provider ad8 is stale.
GEOM_MIRROR: Device gm0: provider mirror/gm0 launched.
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/mirror/gm0s1a
WARNING: / was not properly dismountedКод: Выделить всё
Geom name: gm0
State: DEGRADED
Components: 2
Balance: round-robin
Slice: 4096
Flags: NOAUTOSYNC
GenID: 4
SyncID: 17
ID: 1752941073
Providers:
1. Name: mirror/gm0
   Mediasize: 320072932864 (298G)
   Sectorsize: 512
   Mode: r6w6e7
Consumers:
1. Name: ad8
   Mediasize: 320072933376 (298G)
   Sectorsize: 512
   Mode: r1w1e1
   State: STALE
   Priority: 0
   Flags: SYNCHRONIZING
   GenID: 4
   SyncID: 16
   ID: 1021375928
2. Name: ad10
   Mediasize: 320072933376 (298G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 4
   SyncID: 17
   ID: 3314192503Мне очень непонятно, если таки проблемы с винтом ad10, то почему не он выпадает из рейда, а ад8 ?
# smartctl -A /dev/ad8
Код: Выделить всё
smartctl version 5.37 [i386-portbld-freebsd6.1] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       78
  3 Spin_Up_Time            0x0007   100   100   015    Pre-fail  Always       -       2112
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       20
  5 Reallocated_Sector_Ct   0x0033   253   253   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   253   253   015    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       4047
 10 Spin_Retry_Count        0x0033   253   253   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       1
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       20
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       464770596
187 Unknown_Attribute       0x0032   099   099   000    Old_age   Always       -       8716290
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       3
190 Temperature_Celsius     0x0022   083   061   000    Old_age   Always       -       17
194 Temperature_Celsius     0x0022   187   121   000    Old_age   Always       -       17
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       464770596
196 Reallocated_Event_Count 0x0032   253   253   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   253   253   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0
202 TA_Increase_Count       0x0032   253   253   000    Old_age   Always       -       0Код: Выделить всё
smartctl version 5.37 [i386-portbld-freebsd6.1] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       108
  3 Spin_Up_Time            0x0007   100   100   015    Pre-fail  Always       -       2112
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       17
  5 Reallocated_Sector_Ct   0x0033   253   253   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   253   253   015    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       4045
 10 Spin_Retry_Count        0x0033   253   253   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   253   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       17
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       824253489
187 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       786433
188 Unknown_Attribute       0x0032   253   253   000    Old_age   Always       -       0
190 Temperature_Celsius     0x0022   083   068   000    Old_age   Always       -       17
194 Temperature_Celsius     0x0022   184   142   000    Old_age   Always       -       18
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       824253489
196 Reallocated_Event_Count 0x0032   253   253   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0
202 TA_Increase_Count       0x0032   253   253   000    Old_age   Always       -       0


