проблемка с hdd smartd Offline uncorrectable sectors

Есть и такой ОС.

Модератор: weec

Правила форума
Убедительная просьба юзать теги [cоde] при оформлении листингов.
Сообщения не оформленные должным образом имеют все шансы быть незамеченными.
itrab
проходил мимо
Сообщения: 6
Зарегистрирован: 2011-08-23 10:16:29

проблемка с hdd smartd Offline uncorrectable sectors

Непрочитанное сообщение itrab » 2011-08-23 10:26:35

На бэкапном сервере стало сыпаться в лог вот это:

Код: Выделить всё

Aug 23 08:22:35 bkp-01 smartd[2324]: Sending warning via mail to root ...
Aug 23 08:22:35 bkp-01 smartd[2324]: Warning via mail to root: successful
Aug 23 08:22:35 bkp-01 smartd[2324]: Device: /dev/sda, 1 Offline uncorrectable sectors
Aug 23 08:52:36 bkp-01 smartd[2324]: Device: /dev/sda, 1 Currently unreadable (pending) sectors
Aug 23 08:52:36 bkp-01 smartd[2324]: Device: /dev/sda, 1 Offline uncorrectable sectors
Aug 23 09:22:35 bkp-01 smartd[2324]: Device: /dev/sda, 1 Currently unreadable (pending) sectors
Aug 23 09:22:35 bkp-01 smartd[2324]: Device: /dev/sda, 1 Offline uncorrectable sectors
Aug 23 09:52:35 bkp-01 smartd[2324]: Device: /dev/sda, 1 Currently unreadable (pending) sectors
Aug 23 09:52:35 bkp-01 smartd[2324]: Device: /dev/sda, 1 Offline uncorrectable sectors
просмотрел жесткий с помощью smartctl и немного ужаснулся:

Код: Выделить всё

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD20EARS-00MVWB0
Serial Number:    WD-WCAZA1785755
Firmware Version: 51.0AB51
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Tue Aug 23 10:21:58 2011 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (39600) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   235   235   021    Pre-fail  Always       -       3208
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       11
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       5113
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       8
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       7
193 Load_Cycle_Count        0x0032   005   005   000    Old_age   Always       -       585994
194 Temperature_Celsius     0x0022   114   109   000    Old_age   Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      5110         -
# 2  Short offline       Completed: read failure       80%      5086         1846349410
# 3  Extended offline    Completed: read failure       60%      5065         1846349408
# 4  Short offline       Completed: read failure       80%      5062         1846349412
# 5  Short offline       Completed: read failure       90%      5038         1846349408
# 6  Short offline       Completed: read failure       80%      5014         1846349408
# 7  Short offline       Completed: read failure       80%      4990         1846349408
# 8  Short offline       Completed: read failure       90%      4966         1846349408
# 9  Short offline       Completed: read failure       80%      4942         1846349408
#10  Short offline       Completed: read failure       10%      4918         1846349408
#11  Extended offline    Completed: read failure       60%      4898         1846349408
#12  Short offline       Completed: read failure       10%      4894         1846349408
#13  Short offline       Completed: read failure       10%      4870         1846349408
#14  Short offline       Completed without error       00%      4846         -
#15  Short offline       Completed without error       00%      4822         -
#16  Short offline       Completed without error       00%      4798         -
#17  Short offline       Completed: read failure       80%      4774         1846344544
#18  Short offline       Completed: read failure       80%      4750         1846344544
#19  Extended offline    Completed: read failure       90%      4727         1846344544
#20  Short offline       Completed: read failure       80%      4726         1846344544
#21  Short offline       Completed: read failure       80%      4702         1846344544

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Неужели жесткому кердык? Жесткий там один на 2ТБ, к сожалению получил в счастье в таком состоянии, пока разбирался в инфраструктуре, появились ошибки.
Ребята, подскажите действия: нужно ли сразу бэкапить всё на отдельный винт и менять винт или можно всё-таки как-то это исправить и жить дальше?

Хостинговая компания Host-Food.ru
Хостинг HostFood.ru
 

Услуги хостинговой компании Host-Food.ru

Хостинг HostFood.ru

Тарифы на хостинг в России, от 12 рублей: https://www.host-food.ru/tariffs/hosting/
Тарифы на виртуальные сервера (VPS/VDS/KVM) в РФ, от 189 руб.: https://www.host-food.ru/tariffs/virtualny-server-vps/
Выделенные сервера, Россия, Москва, от 2000 рублей (HP Proliant G5, Intel Xeon E5430 (2.66GHz, Quad-Core, 12Mb), 8Gb RAM, 2x300Gb SAS HDD, P400i, 512Mb, BBU):
https://www.host-food.ru/tariffs/vydelennyi-server-ds/
Недорогие домены в популярных зонах: https://www.host-food.ru/domains/

lap
лейтенант
Сообщения: 608
Зарегистрирован: 2010-08-13 23:39:29
Откуда: Moscow
Контактная информация:

Re: проблемка с hdd smartd Offline uncorrectable sectors

Непрочитанное сообщение lap » 2011-08-23 11:00:37

Если тебе данные нужны с него - сделай бэкап. Если ненужны - пусть живет.

Код: Выделить всё

Jul 28 13:10:35 nfs smartd[550]: Device: /dev/ad4, 2 Currently unreadable (pending) sectors
Jul 28 13:10:35 nfs smartd[550]: Device: /dev/ad6, 4 Currently unreadable (pending) sectors
Jul 28 13:10:35 nfs smartd[550]: Device: /dev/ad6, 1 Offline uncorrectable sectors
Jul 28 13:40:35 nfs smartd[550]: Device: /dev/ad4, 2 Currently unreadable (pending) sectors
Jul 28 13:40:35 nfs smartd[550]: Device: /dev/ad6, 4 Currently unreadable (pending) sectors
Jul 28 13:40:35 nfs smartd[550]: Device: /dev/ad6, 1 Offline uncorrectable sectors
Jul 28 14:10:36 nfs smartd[550]: Device: /dev/ad4, 2 Currently unreadable (pending) sectors
Jul 28 14:10:36 nfs smartd[550]: Device: /dev/ad6, 4 Currently unreadable (pending) sectors
Jul 28 14:10:36 nfs smartd[550]: Device: /dev/ad6, 1 Offline uncorrectable sectors
Aug 23 11:56:52 nfs smartd[52947]: Device: /dev/ad4, 2 Currently unreadable (pending) sectors
Aug 23 11:56:52 nfs smartd[52947]: Device: /dev/ad6, 4 Currently unreadable (pending) sectors
Aug 23 11:56:52 nfs smartd[52947]: Device: /dev/ad6, 1 Offline uncorrectable sectors
Aug 23 11:56:52 nfs smartd[52947]: Device: /dev/ad6, previous self-test completed with error (read test element)
у меня такая ботва уже несколько лет валится, но у меня без содержимого этих дисков вселенная не остановится. (переыв с Jul 28 по Aug 23 из-за обновления и дальнейшего незапуска смартд =) )
Не сломалось - не чини.

itrab
проходил мимо
Сообщения: 6
Зарегистрирован: 2011-08-23 10:16:29

Re: проблемка с hdd smartd Offline uncorrectable sectors

Непрочитанное сообщение itrab » 2011-08-23 11:10:19

бэкапятся 2 сайта с трех виртуалок и конфиги виртуалок и сами виртуалки раз в неделю.

может как-нить проверить еще винт этот?