Эта команда показывала тоже самое, кроме "scan: scrub". С scrub отдельная история. В непонимание, что произошло, я решил запустить "scrub", и вот что произошло: проверило буквально 102мб и процесс остановился. После отмены "scrub" я не смог запустить даже "zpool status".
Код: Выделить всё
root@ds1:/home/xfile # zpool status -v arhat_storage
pool: arhat_storage
state: ONLINE
scan: scrub repaired 0 in 3h11m with 0 errors on Thu Feb 12 03:32:51 2015
config:
NAME STATE READ WRITE CKSUM
arhat_storage ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
errors: No known data errors
Alex Keda писал(а): На ровном месте случилось?
Ну не совсем на ровном... Сервер стоял до этого без манипуляций более недели, а вчера менеджер сообщил, что не может загрузить на сайт файлы. Когда произошел сбой не известно. И что интересное, всё это время файлы из этого пула спокойно читались...
Electronik писал(а): сколько оперативы на сервере?
Отправлено спустя 19 минут 43 секунды:
что показывает смарт дисков?
Код: Выделить всё
root@ds1:/home/xfile # grep memory /var/run/dmesg.boot
real memory = 17179869184 (16384 MB)
avail memory = 16533565440 (15767 MB)
Код: Выделить всё
root@ds1:/home/xfile # smartctl -a /dev/ada1
smartctl 6.3 2014-07-26 r3976 [FreeBSD 10.1-STABLE amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Constellation ES.2 (SATA 6Gb/s)
Device Model: ST33000650NS
Serial Number: Z295QHWX
LU WWN Device Id: 5 000c50 04f84c459
Firmware Version: 0004
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Feb 12 17:27:39 2015 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 440) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x10bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 083 063 044 Pre-fail Always - 212248898
3 Spin_Up_Time 0x0003 092 091 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 44
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 068 060 030 Pre-fail Always - 68826729363
9 Power_On_Hours 0x0032 081 081 000 Old_age Always - 17053
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 45
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 058 036 045 Old_age Always In_the_past 42 (Min/Max 41/46 #37)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 22
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 44
194 Temperature_Celsius 0x0022 042 064 000 Old_age Always - 42 (0 21 0 0 0)
195 Hardware_ECC_Recovered 0x001a 028 006 000 Old_age Always - 212248898
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Код: Выделить всё
root@ds1:/home/xfile # smartctl -a /dev/ada2
smartctl 6.3 2014-07-26 r3976 [FreeBSD 10.1-STABLE amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Constellation ES.2 (SATA 6Gb/s)
Device Model: ST33000650NS
Serial Number: Z295SL6V
LU WWN Device Id: 5 000c50 04f81d686
Firmware Version: 0004
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Feb 12 17:28:27 2015 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 609) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 450) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x10bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 063 063 044 Pre-fail Always - 2619860
3 Spin_Up_Time 0x0003 092 091 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 44
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 071 060 030 Pre-fail Always - 38761980084
9 Power_On_Hours 0x0032 081 081 000 Old_age Always - 17053
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 45
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 061 039 045 Old_age Always In_the_past 39 (Min/Max 38/42 #16)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 22
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 44
194 Temperature_Celsius 0x0022 039 061 000 Old_age Always - 39 (0 18 0 0 0)
195 Hardware_ECC_Recovered 0x001a 024 010 000 Old_age Always - 2619860
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
В полном понимание смарт я не силен, но даный вывод меня не сильно смущает. Правда Raw_Read_Error_Rate мне не очень нравится...
Как выше указано, диски в зеркале.
Диски работают уже более года, точно не вспомню. чтение информации из них производиться непрерывно... Прериваеться чтение только тогда, когда запускают видео-вещание в реальном времени. В этом случае на диски информация записывается. Ну и бывает, когда на сайт просто загружают видеоматериал.
Кстати, вспомнил одну вещь. При переходе на ZFS на этом сервере я не проводил тюнинг sysctl... Эт может вызвать подобную проблему?
Дополнительно, если это поможет:
Код: Выделить всё
root@ds1:/home/xfile # sysctl vfs.zfs
vfs.zfs.arc_max: 1073741824
vfs.zfs.arc_min: 134217728
vfs.zfs.arc_average_blocksize: 8192
vfs.zfs.arc_shrink_shift: 5
vfs.zfs.arc_free_target: 28160
vfs.zfs.arc_meta_used: 142079552
vfs.zfs.arc_meta_limit: 268435456
vfs.zfs.l2arc_write_max: 8388608
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_headroom: 2
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_noprefetch: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_norw: 1
vfs.zfs.anon_size: 18432
vfs.zfs.anon_metadata_lsize: 0
vfs.zfs.anon_data_lsize: 0
vfs.zfs.mru_size: 169247232
vfs.zfs.mru_metadata_lsize: 124400128
vfs.zfs.mru_data_lsize: 42729472
vfs.zfs.mru_ghost_size: 902555648
vfs.zfs.mru_ghost_metadata_lsize: 152711168
vfs.zfs.mru_ghost_data_lsize: 749844480
vfs.zfs.mfu_size: 890765312
vfs.zfs.mfu_metadata_lsize: 0
vfs.zfs.mfu_data_lsize: 889061376
vfs.zfs.mfu_ghost_size: 169280512
vfs.zfs.mfu_ghost_metadata_lsize: 126551040
vfs.zfs.mfu_ghost_data_lsize: 42729472
vfs.zfs.l2c_only_size: 0
vfs.zfs.dedup.prefetch: 1
vfs.zfs.nopwrite_enabled: 1
vfs.zfs.mdcomp_disable: 0
vfs.zfs.dirty_data_max: 1707272192
vfs.zfs.dirty_data_max_max: 4294967296
vfs.zfs.dirty_data_max_percent: 10
vfs.zfs.dirty_data_sync: 67108864
vfs.zfs.delay_min_dirty_percent: 60
vfs.zfs.delay_scale: 500000
vfs.zfs.prefetch_disable: 1
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.block_cap: 256
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.top_maxinflight: 32
vfs.zfs.resilver_delay: 2
vfs.zfs.scrub_delay: 4
vfs.zfs.scan_idle: 50
vfs.zfs.scan_min_time_ms: 1000
vfs.zfs.free_min_time_ms: 1000
vfs.zfs.resilver_min_time_ms: 3000
vfs.zfs.no_scrub_io: 0
vfs.zfs.no_scrub_prefetch: 0
vfs.zfs.free_max_blocks: 18446744073709551615
vfs.zfs.metaslab.gang_bang: 131073
vfs.zfs.metaslab.fragmentation_threshold: 70
vfs.zfs.metaslab.debug_load: 0
vfs.zfs.metaslab.debug_unload: 0
vfs.zfs.metaslab.df_alloc_threshold: 131072
vfs.zfs.metaslab.df_free_pct: 4
vfs.zfs.metaslab.min_alloc_size: 10485760
vfs.zfs.metaslab.load_pct: 50
vfs.zfs.metaslab.unload_delay: 8
vfs.zfs.metaslab.preload_limit: 3
vfs.zfs.metaslab.preload_enabled: 1
vfs.zfs.metaslab.fragmentation_factor_enabled: 1
vfs.zfs.metaslab.lba_weighting_enabled: 1
vfs.zfs.metaslab.bias_enabled: 1
vfs.zfs.condense_pct: 200
vfs.zfs.mg_noalloc_threshold: 0
vfs.zfs.mg_fragmentation_threshold: 85
vfs.zfs.check_hostid: 1
vfs.zfs.spa_load_verify_maxinflight: 10000
vfs.zfs.spa_load_verify_metadata: 1
vfs.zfs.spa_load_verify_data: 1
vfs.zfs.recover: 0
vfs.zfs.deadman_synctime_ms: 1000000
vfs.zfs.deadman_checktime_ms: 5000
vfs.zfs.deadman_enabled: 1
vfs.zfs.spa_asize_inflation: 24
vfs.zfs.spa_slop_shift: 5
vfs.zfs.space_map_blksz: 4096
vfs.zfs.txg.timeout: 5
vfs.zfs.vdev.metaslabs_per_vdev: 200
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.cache.size: 20971520
vfs.zfs.vdev.cache.bshift: 16
vfs.zfs.vdev.trim_on_init: 1
vfs.zfs.vdev.mirror.rotating_inc: 0
vfs.zfs.vdev.mirror.rotating_seek_inc: 5
vfs.zfs.vdev.mirror.rotating_seek_offset: 1048576
vfs.zfs.vdev.mirror.non_rotating_inc: 0
vfs.zfs.vdev.mirror.non_rotating_seek_inc: 1
vfs.zfs.vdev.async_write_active_min_dirty_percent: 30
vfs.zfs.vdev.async_write_active_max_dirty_percent: 60
vfs.zfs.vdev.max_active: 1000
vfs.zfs.vdev.sync_read_min_active: 10
vfs.zfs.vdev.sync_read_max_active: 10
vfs.zfs.vdev.sync_write_min_active: 10
vfs.zfs.vdev.sync_write_max_active: 10
vfs.zfs.vdev.async_read_min_active: 1
vfs.zfs.vdev.async_read_max_active: 3
vfs.zfs.vdev.async_write_min_active: 1
vfs.zfs.vdev.async_write_max_active: 10
vfs.zfs.vdev.scrub_min_active: 1
vfs.zfs.vdev.scrub_max_active: 2
vfs.zfs.vdev.trim_min_active: 1
vfs.zfs.vdev.trim_max_active: 64
vfs.zfs.vdev.aggregation_limit: 131072
vfs.zfs.vdev.read_gap_limit: 32768
vfs.zfs.vdev.write_gap_limit: 4096
vfs.zfs.vdev.bio_flush_disable: 0
vfs.zfs.vdev.bio_delete_disable: 0
vfs.zfs.vdev.trim_max_bytes: 2147483648
vfs.zfs.vdev.trim_max_pending: 64
vfs.zfs.max_auto_ashift: 13
vfs.zfs.min_auto_ashift: 9
vfs.zfs.zil_replay_disable: 0
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zio.use_uma: 1
vfs.zfs.zio.exclude_metadata: 0
vfs.zfs.sync_pass_deferred_free: 2
vfs.zfs.sync_pass_dont_compress: 5
vfs.zfs.sync_pass_rewrite: 2
vfs.zfs.snapshot_list_prefetch: 0
vfs.zfs.super_owner: 0
vfs.zfs.debug: 0
vfs.zfs.version.ioctl: 4
vfs.zfs.version.acl: 1
vfs.zfs.version.spa: 5000
vfs.zfs.version.zpl: 5
vfs.zfs.vol.mode: 1
vfs.zfs.vol.unmap_enabled: 1
vfs.zfs.trim.enabled: 1
vfs.zfs.trim.txg_delay: 32
vfs.zfs.trim.timeout: 30
vfs.zfs.trim.max_interval: 1
И на всякий повторюсь,
после перезагрузки всё работает нормально. Но повторения подобной ситуации я не хочу, хочу понять причину и устранить...