Intel x520-da2 и Twinax cable sfp+ (пропускает только 5Гб/с)

pautina · Непрочитанное сообщение **pautina** » 2016-03-25 22:43:31

Добрый день, уважаемые форумчани!
Вот уже какой месяц мучаюсь с такой проблемой.
Есть сервер Supermicro с Freebsd 10.3 prerelease на борту, сетевой картой Intel x520-da2 (2SFP+) и сетевой коммутатор компании ZTE c 4-мя SFP+ портами, и есть второй такой же сервер.Сервера выполняют роль маршрутизаторов.
Так вот: не могу IPerf3 прокачать между этими железками более 5 Гб/сек.
Вот примерная схема соединения:

Код: Выделить всё

Supermicro 1 (Freebsd 10-Intel x520-DA2)------>5Гб/с----ZTE---5Гб/с<------Supermicro 2 (Freebsd 10-Intel x520-DA2)

Вот некоторые параметры sysctl dev.ix

Код: Выделить всё

dev.ix.0.queue0.interrupt_rate: 500000
dev.ix.0.link_irq: 9
dev.ix.0.watchdog_events: 0
dev.ix.0.mbuf_defrag_failed: 0
dev.ix.0.dropped: 0
dev.ix.0.thermal_test: 0
dev.ix.0.advertise_speed: 0
dev.ix.0.enable_aim: 1
dev.ix.0.fc: 3
dev.ix.0.tx_processing_limit: 512
dev.ix.0.rx_processing_limit: 512
dev.ix.0.%parent: pci2
dev.ix.0.%pnpinfo: vendor=0x8086 device=0x10fb subvendor=0x8086 subdevice=0x7a11 class=0x020000
dev.ix.0.%location: pci0:2:0:0 handle=\_SB_.PCI0.BR3A.H000
dev.ix.0.%driver: ix
dev.ix.0.%desc: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k
dev.ix.%parent:

dmesg.boot

Код: Выделить всё

ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xe020-0xe03f mem 0xfb280000-0xfb2fffff,0xfb304000-0xfb307fff irq 40 at device 0.0 on pci2
ix0: Using MSIX interrupts with 9 vectors
ix0: Advertised speed can only be set on copper or multispeed fiber media types.
ix0: Ethernet address: 00:1b:21:60:ff:c8
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xe000-0xe01f mem 0xfb180000-0xfb1fffff,0xfb300000-0xfb303fff irq 44 at device 0.1 on pci2
ix1: Using MSIX interrupts with 9 vectors
ix1: Advertised speed can only be set on copper or multispeed fiber media types.
ix1: Ethernet address: 00:1b:21:60:ff:c9
ix1: PCI Express Bus: Speed 5.0GT/s Width x8

ifconfig

Код: Выделить всё

ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=c000b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TXCSUM_IPV6>
        ether e4:11:5b:9b:72:b5
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)
        status: active
ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=c000b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TXCSUM_IPV6>
        ether e4:11:5b:9b:72:b4
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)
        status: active

Кто сталкивался с подобным, ваши мысли и предложения?

Хостинг HostFood.ru · **Хостинг HostFood.ru**

Тарифы на хостинг в России, от 12 рублей: https://www.host-food.ru/tariffs/hosting/
Тарифы на виртуальные сервера (VPS/VDS/KVM) в РФ, от 189 руб.: https://www.host-food.ru/tariffs/virtualny-server-vps/
Выделенные сервера, Россия, Москва, от 2000 рублей (HP Proliant G5, Intel Xeon E5430 (2.66GHz, Quad-Core, 12Mb), 8Gb RAM, 2x300Gb SAS HDD, P400i, 512Mb, BBU):
https://www.host-food.ru/tariffs/vydelennyi-server-ds/
Недорогие домены в популярных зонах: https://www.host-food.ru/domains/

Alex Keda

а в момент когда 5 гигабит прокачивается - загрузка проца часом не 100% ?

pautina · Непрочитанное сообщение **pautina** » 2016-03-28 13:48:56

Нет, проц чувствует себя нормально.
При загрузке 2-3 Гб - 10-15 %.
При загрузке 5 Гб - 15-18 %.

icb · Непрочитанное сообщение **icb** » 2016-03-29 10:10:59

Попросил знакомого прогнать тест на такой же карте:
udp - 1 Mbit, увеличивает потоки - увеличивается скорость.
tcp - 1.9-2.3 Gbit, увеличивает потоки - получает максимальную скорость 4.7-6.2 Gbit, при этом передающий iperf жрет 68% cpu при la 1.84, приемный 718% при la 19.21
Т.е. iperf явно не заточен под такие скорости.

Neus · Непрочитанное сообщение **Neus** » 2016-03-29 10:21:31

icb писал(а): iperf явно не заточен под такие скорости

тестил между 2 FreeBSD запущенными в KVM -- 10 гбит гонит одновременно в двух направлениях нормально. (iperf3)

pautina · Непрочитанное сообщение **pautina** » 2016-03-29 13:21:40

Вот результаты теста в 3 потока. Суммарное значение мне уже более приемлемо. Возможно действительно что-то с iperf.

Код: Выделить всё

- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  2.33 GBytes  2.00 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  2.33 GBytes  2.00 Gbits/sec                  receiver
[  6]   0.00-10.00  sec  4.41 GBytes  3.79 Gbits/sec    0             sender
[  6]   0.00-10.00  sec  4.40 GBytes  3.78 Gbits/sec                  receiver
[  8]   0.00-10.00  sec  2.32 GBytes  1.99 Gbits/sec    0             sender
[  8]   0.00-10.00  sec  2.31 GBytes  1.99 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  9.05 GBytes  7.77 Gbits/sec    0             sender
[SUM]   0.00-10.00  sec  9.04 GBytes  7.76 Gbits/sec                  receiver

Но все же у меня есть сомнения. А как на счет jumbo фреймов, может ли поменяться скорость в один поток и суммарно?

icb · Непрочитанное сообщение **icb** » 2016-03-30 12:57:41

Тут посмотрел статистику systsat - там числа другие совсем.
Может iperf показывает информационную скорость? Так она конечно меньше канальной скорости.
Мне systat выдает 800 MB/s, что практически 9 Гбит - т.е. почти полностью 10G утилизирована.

pautina · Непрочитанное сообщение **pautina** » 2016-03-30 15:52:58

Все, мужики. Всем спасибо. Смог я прокачать 9.6 гига. Проблема все-таки в Iperf. В 3 потока с загрузкой ядра Iperf в 30% и загрузкой каждого ядра сетевкой в 25% - прокач был 9.6 гига, стабильно.

Neus · Непрочитанное сообщение **Neus** » 2016-03-30 16:02:32

iperf или iperf3?

icb · Непрочитанное сообщение **icb** » 2016-03-30 16:59:18

Какие конкретно параметры были у сервера и клиента?
Еще был какой-либо тюнинг?

pautina · Непрочитанное сообщение **pautina** » 2016-03-30 19:06:54

Neus писал(а):iperf или iperf3?

iperf3

Отправлено спустя 9 минут 9 секунд:

icb писал(а):Какие конкретно параметры были у сервера и клиента?
Еще был какой-либо тюнинг?

/boot/loader.conf

Код: Выделить всё

hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1

dev.igb.0.rx_processing_limit="4096"  # (default 100)
dev.igb.1.rx_processing_limit="4096"  # (default 100)

# FreeBSD Tuning https://wiki.freebsd.org/10gFreeBSD/Intel10G
hw.ix.tx_process_limit=512
hw.ix.rx_process_limit=512
hw.ix.rxd=4096
hw.ix.txd=4096

# H-TCP Congestion Control for a more aggressive increase in speed on higher
# latency, high bandwidth networks with some packet loss.
cc_htcp_load="YES"

net.link.ifqmaxlen="8192"  # (default 50)

# qlimit for igmp, arp, ether and ip6 queues only (netstat -Q) (default 256)
net.isr.defaultqlimit="4096" # (default 256)

# maximum number of interrupts per second generated by single igb(4) (default
# 8000). FreeBSD 10 supports the new drivers which reduces interrupts
# significantly.
hw.igb.max_interrupt_rate="32000" # (default 8000)

# Intel igb(4): The maximum number of packets to process at Recieve End Of
# Frame (RxEOF). A frame is a data packet on Layer 2 of the OSI mode and "the
# unit of transmission in a link layer protocol consisting of a link-layer
# header followed by a packet." "-1" means unlimited. The default of "100" can
# processes around 500K pps, so only set to -1 is you are accepting 500K
# packets per second or more. Test with "netstat -ihw 1" and look at packets
# recieved per second.
# http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_an_ibm_system_x3550_m3_with_intel_82580
hw.igb.rx_process_limit="-1"  # (default 100)
# Intel igb(4): Intel PRO 1000 network chipsets support a maximum of 4096 Rx
# and 4096 Tx descriptors. Two cases when you could change the amount of
# descriptors are: 1) Low RAM and 2) CPU or bus saturation. If the system RAM
# is too low you can drop the amount of descriptors to 128, but the system may
# drop packets if it can not proceeses the packets fast enough. If you have a
# large number of packets incoming and they are being processed too slowly then
# you can increase to the descriptors up to 4096. Increasing descriptors is
# only a hack because the system is too slow to processes the packets in a
# timely manner. You should look into getting a faster CPU with a wider bus or
# identifying why the recieving application is so slow. Use "netstat -ihw 1"
# and look for idrops. Note that each received packet requires one Receive
# Descriptor, and each descriptor uses 2 KB of memory.
# https://fasterdata.es.net/host-tuning/nic-tuning/
hw.igb.rxd="4096"  # (default 1024)
hw.igb.txd="4096"  # (default 1024)

# increase the number of network mbufs the system is willing to allocate.  Each
# cluster represents approximately 2K of memory, so a value of 524288
# represents 1GB of kernel memory reserved for network buffers. (default
# 492680)
kern.ipc.nmbclusters="5242880"
kern.ipc.nmbjumbop="2621440"

# maximum number of interrupts per second on any interrupt level (vmstat -i for
# total rate). If you still see Interrupt Storm detected messages, increase the
# limit to a higher number and look for the culprit.  For 10gig NIC's set to
# 9000 and use large MTU. (default 1000)
hw.intr_storm_threshold="9000"

# Size of the syncache hash table, must be a power of 2 (default 512)
net.inet.tcp.syncache.hashsize="1024"
# Limit the number of entries permitted in each bucket of the hash table. (default 30)
net.inet.tcp.syncache.bucketlimit="100"

/etc/sysctl.conf

Код: Выделить всё

# $FreeBSD: stable/10/etc/sysctl.conf 112200 2003-03-13 18:43:50Z mux $
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0

#dev.ix.0.rx_processing_limit=4096
#dev.ix.0.tx_processing_limit=4096

#dev.ix.1.rx_processing_limit=4096
#dev.ix.1.tx_processing_limit=4096

# FreeBSD Tuning https://wiki.freebsd.org/10gFreeBSD/Intel10G
net.inet.tcp.tso=0
net.inet.ip.fastforwarding=1

# The processing limit, harvests, and TSO/LRO settings are ones that could ease CPU usage, though we are very light on inbound so TSO is likely more of a possibility. The rest are performance related.
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.interrupt=0

# https://calomel.org/freebsd_network_tuning.html
kern.ipc.maxsockbuf=33554432

# set auto tuning maximums to the same value as the kern.ipc.maxsockbuf above.
# Use at least 16MB for 10GE hosts with RTT of less then 100ms. For 10GE hosts
# with RTT of greater then 100ms set buf_max to 150MB.
net.inet.tcp.sendbuf_max=33554432  # (default 2097152)
net.inet.tcp.recvbuf_max=33554432  # (default 2097152)
# maximum segment size (MSS) specifies the largest payload of data in a single
# TCP segment not including TCP headers or options. mssdflt is also called MSS
# clamping. With an interface MTU of 1500 bytes we suggest an
# net.inet.tcp.mssdflt of 1460 bytes. 1500 MTU minus 20 byte IP header minus 20
# byte TCP header is 1460. With net.inet.tcp.rfc1323 enabled, tcp timestamps
# are added to the packets and the mss is automatically reduced from 1460 bytes
# to 1448 bytes total payload. Note: if you are using PF with an outgoing scrub
# rule then PF will re-package the data using an MTU of 1460 by default, thus
# overriding this mssdflt setting and Pf scrub might slow down the network.
# http://www.wand.net.nz/sites/default/files/mss_ict11.pdf
net.inet.tcp.mssdflt=1460  # (default 536)

# minimum, maximum segment size (mMSS) specifies the smallest payload of data
# in a single TCP segment our system will agree to send when negotiating with
# the client. By default, FreeBSD limits the maximum segment size to no lower
# then 216 bytes. RFC 791 defines the minimum IP packet size as 68 bytes, but
# in RFC 793 the minimum MSS is specified to be 536 bytes which is the same
# value Windows Vista uses.  The attack vector is when a malicious client sets
# the negotiated MSS to a small value this may cause a packet flood DoS attack
# from our server. The attack scales with the available bandwidth and quickly
# saturates the CPU and network interface with packet generation and
# transmission.  By default, if the client asks for a one(1) megabyte file with
# an MSS of 216 we have to send back 4,630 packets. If the minimum MSS is set
# to 1300 we send back only 769 packets which is six times more efficient. For
# standard Internet connections we suggest a minimum mss of 1300 bytes. 1300
# will even work on networks making a VOIP (RTP) call using a TCP connection with
# TCP options over IPSEC though a GRE tunnel on a mobile cellular network with
# the DF (don't fragment) bit set.
net.inet.tcp.minmss=1300   # (default 216)

# H-TCP congestion control: The Hamilton TCP (HighSpeed-TCP) algorithm is a
# packet loss based congestion control and is more aggressive pushing up to max
# bandwidth (total BDP) and favors hosts with lower TTL / VARTTL then the
# default "newreno". Understand "newreno" works well in most conditions and
# enabling HTCP may only gain a you few percentage points of throughput.
# http://www.sigcomm.org/sites/default/files/ccr/papers/2008/July/1384609-1384613.pdf
# make sure to also add 'cc_htcp_load="YES"' to /boot/loader.conf then check
# available congestion control options with "sysctl net.inet.tcp.cc.available"
net.inet.tcp.cc.algorithm=htcp  # (default newreno)
# H-TCP congestion control: adaptive backoff will increase bandwidth
# utilization by adjusting the additive-increase/multiplicative-decrease (AIMD)
# backoff parameter according to the amount of buffers available on the path.
# adaptive backoff ensures no queue along the path will remain completely empty
# after a packet loss event which increases buffer efficiency.
net.inet.tcp.cc.htcp.adaptive_backoff=1 # (default 0 ; disabled)

# H-TCP congestion control: RTT scaling will increase the fairness between
# competing TCP flows traversing different RTT paths through a common
# bottleneck. rtt_scaling increases the Congestion Window Size (CWND)
# independent of path round-trip time (RTT) leading to lower latency for
# interactive sessions when the connection is saturated by bulk data
# transfers. Default is 0 (disabled)
net.inet.tcp.cc.htcp.rtt_scaling=1 # (default 0 ; disabled)

# Reduce the amount of SYN/ACKs the server will re-transmit to an ip address
# whom did not respond to the first SYN/ACK. On a client's initial connection
# our server will always send a SYN/ACK in response to the client's initial
# SYN. Limiting retranstited SYN/ACKS reduces local syn cache size and a "SYN
# flood" DoS attack's collateral damage by not sending SYN/ACKs back to spoofed
# ips, multiple times. If we do continue to send SYN/ACKs to spoofed IPs they
# may send RST's back to us and an "amplification" attack would begin against
# our host. If you do not wish to send retransmits at all then set to zero(0)
# especially if you are under a SYN attack. If our first SYN/ACK gets dropped
# the client will re-send another SYN if they still want to connect. Also set
# "net.inet.tcp.msl" to two(2) times the average round trip time of a client,
# but no lower then 2000ms (2s). Test with "netstat -s -p tcp" and look under
# syncache entries.
# http://people.freebsd.org/~jlemon/papers/syncache.pdf
# http://www.ouah.org/spank.txt
net.inet.tcp.syncache.rexmtlimit=0  # (default 3)
# Spoofed packet attacks may be used to overload the kernel route cache. A
# spoofed packet attack uses random source IPs to cause the kernel to generate
# a temporary cached route in the route table, Route cache is an extraneous
# caching layer mapping interfaces to routes to IPs and saves a lookup to the
# Forward Information Base (FIB); a routing table within the network stack. The
# IPv4 routing cache was intended to eliminate a FIB lookup and increase
# performance. While a good idea in principle, unfortunately it provided a very
# small performance boost in less than 10% of connections and opens up the
# possibility of a DoS vector. Setting rtexpire and rtminexpire to ten(10)
# seconds should be sufficient to protect the route table from attack.
# http://www.es.freebsd.org/doc/handbook/securing-freebsd.html
net.inet.ip.rtexpire=10      # (default 3600)
#net.inet.ip.rtminexpire=10  # (default 10  )
#net.inet.ip.rtmaxcache=128  # (default 128 )

# Syncookies have a certain number of advantages and disadvantages. Syncookies
# are useful if you are being DoS attacked as this method helps filter the
# proper clients from the attack machines. But, since the TCP options from the
# initial SYN are not saved in syncookies, the tcp options are not applied to
# the connection, precluding use of features like window scale, timestamps, or
# exact MSS sizing. As the returning ACK establishes the connection, it may be
# possible for an attacker to ACK flood a machine in an attempt to create a
# connection. Another benefit to overflowing to the point of getting a valid
# SYN cookie is the attacker can include data payload. Now that the attacker
# can send data to a FreeBSD network daemon, even using a spoofed source IP
# address, they can have FreeBSD do processing on the data which is not
# something the attacker could do without having SYN cookies. Even though
# syncookies are helpful during a DoS, we are going to disable them at this
# time.
net.inet.tcp.syncookies=0  # (default 1)

# TCP segmentation offload (TSO), also called large segment offload (LSO),
# should be disabled on NAT firewalls and routers. TSO/LSO works by queuing up
# large buffers and letting the network interface card (NIC) split them into
# separate packets. The problem is the NIC can build a packet that is the wrong
# size and would be dropped by a switch or the recieving machine, like for NFS
# fragmented traffic. If the packet is dropped the overall sending bandwidth is
# reduced significantly. You can also disable TSO in /etc/rc.conf using the
# "-tso" directive after the network card configuration; for example,
# ifconfig_igb0="inet 10.10.10.1 netmask 255.255.255.0 -tso". Verify TSO is off
# on the hardware by making sure TSO4 and TSO6 are not seen in the "options="
# section using ifconfig.
# http://www.peerwisdom.org/2013/04/03/large-send-offload-and-network-performance/
net.inet.tcp.tso=0   # (default 1)

# General Security and DoS mitigation
#net.bpf.optimize_writers=0           # bpf are write-only unless program explicitly specifies the read filter (default 0)
#net.bpf.zerocopy_enable=0            # zero-copy BPF buffers, breaks dhcpd ! (default 0)
net.inet.ip.check_interface=1         # verify packet arrives on correct interface (default 0)
#net.inet.ip.portrange.randomized=1   # randomize outgoing upper ports (default 1)
net.inet.ip.process_options=0         # ignore IP options in the incoming packets (default 1)
net.inet.ip.random_id=1               # assign a random IP_ID to each packet leaving the system (default 0)
net.inet.ip.redirect=0                # do not send IP redirects (default 1)
#net.inet.ip.accept_sourceroute=0     # drop source routed packets since they can not be trusted (default 0)
#net.inet.ip.sourceroute=0            # if source routed packets are accepted the route data is ignored (default 0)
#net.inet.ip.stealth=1                # do not reduce the TTL by one(1) when a packets goes through the firewall (default 0)
#net.inet.icmp.bmcastecho=0           # do not respond to ICMP packets sent to IP broadcast addresses (default 0)
#net.inet.icmp.maskfake=0             # do not fake reply to ICMP Address Mask Request packets (default 0)
#net.inet.icmp.maskrepl=0             # replies are not sent for ICMP address mask requests (default 0)
#net.inet.icmp.log_redirect=0         # do not log redirected ICMP packet attempts (default 0)
net.inet.icmp.drop_redirect=1         # no redirected ICMP packets (default 0)
#net.inet.icmp.icmplim=200            # number of ICMP/TCP RST packets/sec, increase for bittorrent or many clients. (default 200)
#net.inet.icmp.icmplim_output=1       # show "Limiting open port RST response" messages (default 1)
#net.inet.tcp.abc_l_var=2             # increment the slow-start Congestion Window (cwnd) after two(2) segments (default 2)
net.inet.tcp.always_keepalive=0       # disable tcp keep alive detection for dead peers, keepalive can be spoofed (default 1)
net.inet.tcp.drop_synfin=1            # SYN/FIN packets get dropped on initial connection (default 0)
net.inet.tcp.ecn.enable=1             # explicit congestion notification (ecn) warning: some ISP routers may abuse ECN (default 0)
net.inet.tcp.fast_finwait2_recycle=1  # recycle FIN/WAIT states quickly (helps against DoS, but may cause false RST) (default 0)
net.inet.tcp.icmp_may_rst=0           # icmp may not send RST to avoid spoofed icmp/udp floods (default 1)
#net.inet.tcp.maxtcptw=50000          # max number of tcp time_wait states for closing connections (default ~27767)
net.inet.tcp.msl=5000                 # Maximum Segment Lifetime is the time a TCP segment can exist on the network and is
                                      # used to determine the TIME_WAIT interval, 2*MSL (default 30000 which is 60 seconds)
net.inet.tcp.path_mtu_discovery=0     # disable MTU discovery since many hosts drop ICMP type 3 packets (default 1)
#net.inet.tcp.rfc3042=1               # on packet loss trigger the fast retransmit algorithm instead of tcp timeout (default 1)
net.inet.udp.blackhole=1              # drop udp packets destined for closed sockets (default 0)
net.inet.tcp.blackhole=2              # drop tcp packets destined for closed ports (default 0)

dmesg

Код: Выделить всё

CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz (2394.70-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xe020-0xe03f mem 0xfb280000-0xfb2fffff,0xfb304000-0xfb307fff irq 40 at device 0.0 on pci2
ix0: Using MSIX interrupts with 9 vectors
ix0: Advertised speed can only be set on copper or multispeed fiber media types.
ix0: Ethernet address: 00:1b:21:60:ff:c8
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xe000-0xe01f mem 0xfb180000-0xfb1fffff,0xfb300000-0xfb303fff irq 44 at device 0.1 on pci2
ix1: Using MSIX interrupts with 9 vectors
ix1: Advertised speed can only be set on copper or multispeed fiber media types.
ix1: Ethernet address: 00:1b:21:60:ff:c9
ix1: PCI Express Bus: Speed 5.0GT/s Width x8
real memory  = 34358689792 (32767 MB)

Память LRDIM DDR4
/etc/rc.conf

Код: Выделить всё

ifconfig_ix0="up -rxcsum -tso -txcsum -lro -vlanhwtso"
ifconfig_ix1="up -rxcsum -tso -txcsum -lro -vlanhwtso"
##########################################################################################################################
# POWER DAEMOS AND TUNNING
##########################################################################################################################
powerd_enable="YES"
powerd_flags="-n hadp"
performance_cx_lowest="C2"
economy_cx_lowest="C2"
performance_cpu_freq="HIGH"

Непрочитанное сообщение **FreeBSP** » 2016-04-01 1:38:51

если не секрет, подо что будет использоваться 10 гбит?

pautina · Непрочитанное сообщение **pautina** » 2016-04-01 8:14:06

FreeBSP писал(а):если не секрет, подо что будет использоваться 10 гбит?

BGP маршрутизатор

produmnet

Получилось пропустить трафик больше 5Гб? Какие настройки в итоге применили?

pautina · Непрочитанное сообщение **pautina** » 2017-01-24 21:53:24

pautina писал(а):

Neus писал(а):iperf или iperf3?

iperf3

Отправлено спустя 9 минут 9 секунд:

icb писал(а):Какие конкретно параметры были у сервера и клиента?
Еще был какой-либо тюнинг?

/boot/loader.conf

Код: Выделить всё

hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1

dev.igb.0.rx_processing_limit="4096"  # (default 100)
dev.igb.1.rx_processing_limit="4096"  # (default 100)

# FreeBSD Tuning https://wiki.freebsd.org/10gFreeBSD/Intel10G
hw.ix.tx_process_limit=512
hw.ix.rx_process_limit=512
hw.ix.rxd=4096
hw.ix.txd=4096

# H-TCP Congestion Control for a more aggressive increase in speed on higher
# latency, high bandwidth networks with some packet loss.
cc_htcp_load="YES"

net.link.ifqmaxlen="8192"  # (default 50)

# qlimit for igmp, arp, ether and ip6 queues only (netstat -Q) (default 256)
net.isr.defaultqlimit="4096" # (default 256)

# maximum number of interrupts per second generated by single igb(4) (default
# 8000). FreeBSD 10 supports the new drivers which reduces interrupts
# significantly.
hw.igb.max_interrupt_rate="32000" # (default 8000)

# Intel igb(4): The maximum number of packets to process at Recieve End Of
# Frame (RxEOF). A frame is a data packet on Layer 2 of the OSI mode and "the
# unit of transmission in a link layer protocol consisting of a link-layer
# header followed by a packet." "-1" means unlimited. The default of "100" can
# processes around 500K pps, so only set to -1 is you are accepting 500K
# packets per second or more. Test with "netstat -ihw 1" and look at packets
# recieved per second.
# http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_an_ibm_system_x3550_m3_with_intel_82580
hw.igb.rx_process_limit="-1"  # (default 100)
# Intel igb(4): Intel PRO 1000 network chipsets support a maximum of 4096 Rx
# and 4096 Tx descriptors. Two cases when you could change the amount of
# descriptors are: 1) Low RAM and 2) CPU or bus saturation. If the system RAM
# is too low you can drop the amount of descriptors to 128, but the system may
# drop packets if it can not proceeses the packets fast enough. If you have a
# large number of packets incoming and they are being processed too slowly then
# you can increase to the descriptors up to 4096. Increasing descriptors is
# only a hack because the system is too slow to processes the packets in a
# timely manner. You should look into getting a faster CPU with a wider bus or
# identifying why the recieving application is so slow. Use "netstat -ihw 1"
# and look for idrops. Note that each received packet requires one Receive
# Descriptor, and each descriptor uses 2 KB of memory.
# https://fasterdata.es.net/host-tuning/nic-tuning/
hw.igb.rxd="4096"  # (default 1024)
hw.igb.txd="4096"  # (default 1024)

# increase the number of network mbufs the system is willing to allocate.  Each
# cluster represents approximately 2K of memory, so a value of 524288
# represents 1GB of kernel memory reserved for network buffers. (default
# 492680)
kern.ipc.nmbclusters="5242880"
kern.ipc.nmbjumbop="2621440"

# maximum number of interrupts per second on any interrupt level (vmstat -i for
# total rate). If you still see Interrupt Storm detected messages, increase the
# limit to a higher number and look for the culprit.  For 10gig NIC's set to
# 9000 and use large MTU. (default 1000)
hw.intr_storm_threshold="9000"

# Size of the syncache hash table, must be a power of 2 (default 512)
net.inet.tcp.syncache.hashsize="1024"
# Limit the number of entries permitted in each bucket of the hash table. (default 30)
net.inet.tcp.syncache.bucketlimit="100"

/etc/sysctl.conf

Код: Выделить всё

# $FreeBSD: stable/10/etc/sysctl.conf 112200 2003-03-13 18:43:50Z mux $
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0

#dev.ix.0.rx_processing_limit=4096
#dev.ix.0.tx_processing_limit=4096

#dev.ix.1.rx_processing_limit=4096
#dev.ix.1.tx_processing_limit=4096

# FreeBSD Tuning https://wiki.freebsd.org/10gFreeBSD/Intel10G
net.inet.tcp.tso=0
net.inet.ip.fastforwarding=1

# The processing limit, harvests, and TSO/LRO settings are ones that could ease CPU usage, though we are very light on inbound so TSO is likely more of a possibility. The rest are performance related.
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.interrupt=0

# https://calomel.org/freebsd_network_tuning.html
kern.ipc.maxsockbuf=33554432

# set auto tuning maximums to the same value as the kern.ipc.maxsockbuf above.
# Use at least 16MB for 10GE hosts with RTT of less then 100ms. For 10GE hosts
# with RTT of greater then 100ms set buf_max to 150MB.
net.inet.tcp.sendbuf_max=33554432  # (default 2097152)
net.inet.tcp.recvbuf_max=33554432  # (default 2097152)
# maximum segment size (MSS) specifies the largest payload of data in a single
# TCP segment not including TCP headers or options. mssdflt is also called MSS
# clamping. With an interface MTU of 1500 bytes we suggest an
# net.inet.tcp.mssdflt of 1460 bytes. 1500 MTU minus 20 byte IP header minus 20
# byte TCP header is 1460. With net.inet.tcp.rfc1323 enabled, tcp timestamps
# are added to the packets and the mss is automatically reduced from 1460 bytes
# to 1448 bytes total payload. Note: if you are using PF with an outgoing scrub
# rule then PF will re-package the data using an MTU of 1460 by default, thus
# overriding this mssdflt setting and Pf scrub might slow down the network.
# http://www.wand.net.nz/sites/default/files/mss_ict11.pdf
net.inet.tcp.mssdflt=1460  # (default 536)

# minimum, maximum segment size (mMSS) specifies the smallest payload of data
# in a single TCP segment our system will agree to send when negotiating with
# the client. By default, FreeBSD limits the maximum segment size to no lower
# then 216 bytes. RFC 791 defines the minimum IP packet size as 68 bytes, but
# in RFC 793 the minimum MSS is specified to be 536 bytes which is the same
# value Windows Vista uses.  The attack vector is when a malicious client sets
# the negotiated MSS to a small value this may cause a packet flood DoS attack
# from our server. The attack scales with the available bandwidth and quickly
# saturates the CPU and network interface with packet generation and
# transmission.  By default, if the client asks for a one(1) megabyte file with
# an MSS of 216 we have to send back 4,630 packets. If the minimum MSS is set
# to 1300 we send back only 769 packets which is six times more efficient. For
# standard Internet connections we suggest a minimum mss of 1300 bytes. 1300
# will even work on networks making a VOIP (RTP) call using a TCP connection with
# TCP options over IPSEC though a GRE tunnel on a mobile cellular network with
# the DF (don't fragment) bit set.
net.inet.tcp.minmss=1300   # (default 216)

# H-TCP congestion control: The Hamilton TCP (HighSpeed-TCP) algorithm is a
# packet loss based congestion control and is more aggressive pushing up to max
# bandwidth (total BDP) and favors hosts with lower TTL / VARTTL then the
# default "newreno". Understand "newreno" works well in most conditions and
# enabling HTCP may only gain a you few percentage points of throughput.
# http://www.sigcomm.org/sites/default/files/ccr/papers/2008/July/1384609-1384613.pdf
# make sure to also add 'cc_htcp_load="YES"' to /boot/loader.conf then check
# available congestion control options with "sysctl net.inet.tcp.cc.available"
net.inet.tcp.cc.algorithm=htcp  # (default newreno)
# H-TCP congestion control: adaptive backoff will increase bandwidth
# utilization by adjusting the additive-increase/multiplicative-decrease (AIMD)
# backoff parameter according to the amount of buffers available on the path.
# adaptive backoff ensures no queue along the path will remain completely empty
# after a packet loss event which increases buffer efficiency.
net.inet.tcp.cc.htcp.adaptive_backoff=1 # (default 0 ; disabled)

# H-TCP congestion control: RTT scaling will increase the fairness between
# competing TCP flows traversing different RTT paths through a common
# bottleneck. rtt_scaling increases the Congestion Window Size (CWND)
# independent of path round-trip time (RTT) leading to lower latency for
# interactive sessions when the connection is saturated by bulk data
# transfers. Default is 0 (disabled)
net.inet.tcp.cc.htcp.rtt_scaling=1 # (default 0 ; disabled)

# Reduce the amount of SYN/ACKs the server will re-transmit to an ip address
# whom did not respond to the first SYN/ACK. On a client's initial connection
# our server will always send a SYN/ACK in response to the client's initial
# SYN. Limiting retranstited SYN/ACKS reduces local syn cache size and a "SYN
# flood" DoS attack's collateral damage by not sending SYN/ACKs back to spoofed
# ips, multiple times. If we do continue to send SYN/ACKs to spoofed IPs they
# may send RST's back to us and an "amplification" attack would begin against
# our host. If you do not wish to send retransmits at all then set to zero(0)
# especially if you are under a SYN attack. If our first SYN/ACK gets dropped
# the client will re-send another SYN if they still want to connect. Also set
# "net.inet.tcp.msl" to two(2) times the average round trip time of a client,
# but no lower then 2000ms (2s). Test with "netstat -s -p tcp" and look under
# syncache entries.
# http://people.freebsd.org/~jlemon/papers/syncache.pdf
# http://www.ouah.org/spank.txt
net.inet.tcp.syncache.rexmtlimit=0  # (default 3)
# Spoofed packet attacks may be used to overload the kernel route cache. A
# spoofed packet attack uses random source IPs to cause the kernel to generate
# a temporary cached route in the route table, Route cache is an extraneous
# caching layer mapping interfaces to routes to IPs and saves a lookup to the
# Forward Information Base (FIB); a routing table within the network stack. The
# IPv4 routing cache was intended to eliminate a FIB lookup and increase
# performance. While a good idea in principle, unfortunately it provided a very
# small performance boost in less than 10% of connections and opens up the
# possibility of a DoS vector. Setting rtexpire and rtminexpire to ten(10)
# seconds should be sufficient to protect the route table from attack.
# http://www.es.freebsd.org/doc/handbook/securing-freebsd.html
net.inet.ip.rtexpire=10      # (default 3600)
#net.inet.ip.rtminexpire=10  # (default 10  )
#net.inet.ip.rtmaxcache=128  # (default 128 )

# Syncookies have a certain number of advantages and disadvantages. Syncookies
# are useful if you are being DoS attacked as this method helps filter the
# proper clients from the attack machines. But, since the TCP options from the
# initial SYN are not saved in syncookies, the tcp options are not applied to
# the connection, precluding use of features like window scale, timestamps, or
# exact MSS sizing. As the returning ACK establishes the connection, it may be
# possible for an attacker to ACK flood a machine in an attempt to create a
# connection. Another benefit to overflowing to the point of getting a valid
# SYN cookie is the attacker can include data payload. Now that the attacker
# can send data to a FreeBSD network daemon, even using a spoofed source IP
# address, they can have FreeBSD do processing on the data which is not
# something the attacker could do without having SYN cookies. Even though
# syncookies are helpful during a DoS, we are going to disable them at this
# time.
net.inet.tcp.syncookies=0  # (default 1)

# TCP segmentation offload (TSO), also called large segment offload (LSO),
# should be disabled on NAT firewalls and routers. TSO/LSO works by queuing up
# large buffers and letting the network interface card (NIC) split them into
# separate packets. The problem is the NIC can build a packet that is the wrong
# size and would be dropped by a switch or the recieving machine, like for NFS
# fragmented traffic. If the packet is dropped the overall sending bandwidth is
# reduced significantly. You can also disable TSO in /etc/rc.conf using the
# "-tso" directive after the network card configuration; for example,
# ifconfig_igb0="inet 10.10.10.1 netmask 255.255.255.0 -tso". Verify TSO is off
# on the hardware by making sure TSO4 and TSO6 are not seen in the "options="
# section using ifconfig.
# http://www.peerwisdom.org/2013/04/03/large-send-offload-and-network-performance/
net.inet.tcp.tso=0   # (default 1)

# General Security and DoS mitigation
#net.bpf.optimize_writers=0           # bpf are write-only unless program explicitly specifies the read filter (default 0)
#net.bpf.zerocopy_enable=0            # zero-copy BPF buffers, breaks dhcpd ! (default 0)
net.inet.ip.check_interface=1         # verify packet arrives on correct interface (default 0)
#net.inet.ip.portrange.randomized=1   # randomize outgoing upper ports (default 1)
net.inet.ip.process_options=0         # ignore IP options in the incoming packets (default 1)
net.inet.ip.random_id=1               # assign a random IP_ID to each packet leaving the system (default 0)
net.inet.ip.redirect=0                # do not send IP redirects (default 1)
#net.inet.ip.accept_sourceroute=0     # drop source routed packets since they can not be trusted (default 0)
#net.inet.ip.sourceroute=0            # if source routed packets are accepted the route data is ignored (default 0)
#net.inet.ip.stealth=1                # do not reduce the TTL by one(1) when a packets goes through the firewall (default 0)
#net.inet.icmp.bmcastecho=0           # do not respond to ICMP packets sent to IP broadcast addresses (default 0)
#net.inet.icmp.maskfake=0             # do not fake reply to ICMP Address Mask Request packets (default 0)
#net.inet.icmp.maskrepl=0             # replies are not sent for ICMP address mask requests (default 0)
#net.inet.icmp.log_redirect=0         # do not log redirected ICMP packet attempts (default 0)
net.inet.icmp.drop_redirect=1         # no redirected ICMP packets (default 0)
#net.inet.icmp.icmplim=200            # number of ICMP/TCP RST packets/sec, increase for bittorrent or many clients. (default 200)
#net.inet.icmp.icmplim_output=1       # show "Limiting open port RST response" messages (default 1)
#net.inet.tcp.abc_l_var=2             # increment the slow-start Congestion Window (cwnd) after two(2) segments (default 2)
net.inet.tcp.always_keepalive=0       # disable tcp keep alive detection for dead peers, keepalive can be spoofed (default 1)
net.inet.tcp.drop_synfin=1            # SYN/FIN packets get dropped on initial connection (default 0)
net.inet.tcp.ecn.enable=1             # explicit congestion notification (ecn) warning: some ISP routers may abuse ECN (default 0)
net.inet.tcp.fast_finwait2_recycle=1  # recycle FIN/WAIT states quickly (helps against DoS, but may cause false RST) (default 0)
net.inet.tcp.icmp_may_rst=0           # icmp may not send RST to avoid spoofed icmp/udp floods (default 1)
#net.inet.tcp.maxtcptw=50000          # max number of tcp time_wait states for closing connections (default ~27767)
net.inet.tcp.msl=5000                 # Maximum Segment Lifetime is the time a TCP segment can exist on the network and is
                                      # used to determine the TIME_WAIT interval, 2*MSL (default 30000 which is 60 seconds)
net.inet.tcp.path_mtu_discovery=0     # disable MTU discovery since many hosts drop ICMP type 3 packets (default 1)
#net.inet.tcp.rfc3042=1               # on packet loss trigger the fast retransmit algorithm instead of tcp timeout (default 1)
net.inet.udp.blackhole=1              # drop udp packets destined for closed sockets (default 0)
net.inet.tcp.blackhole=2              # drop tcp packets destined for closed ports (default 0)

dmesg

Код: Выделить всё

CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz (2394.70-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xe020-0xe03f mem 0xfb280000-0xfb2fffff,0xfb304000-0xfb307fff irq 40 at device 0.0 on pci2
ix0: Using MSIX interrupts with 9 vectors
ix0: Advertised speed can only be set on copper or multispeed fiber media types.
ix0: Ethernet address: 00:1b:21:60:ff:c8
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xe000-0xe01f mem 0xfb180000-0xfb1fffff,0xfb300000-0xfb303fff irq 44 at device 0.1 on pci2
ix1: Using MSIX interrupts with 9 vectors
ix1: Advertised speed can only be set on copper or multispeed fiber media types.
ix1: Ethernet address: 00:1b:21:60:ff:c9
ix1: PCI Express Bus: Speed 5.0GT/s Width x8
real memory  = 34358689792 (32767 MB)

Память LRDIM DDR4
/etc/rc.conf

Код: Выделить всё

ifconfig_ix0="up -rxcsum -tso -txcsum -lro -vlanhwtso"
ifconfig_ix1="up -rxcsum -tso -txcsum -lro -vlanhwtso"
##########################################################################################################################
# POWER DAEMOS AND TUNNING
##########################################################################################################################
powerd_enable="YES"
powerd_flags="-n hadp"
performance_cx_lowest="C2"
economy_cx_lowest="C2"
performance_cpu_freq="HIGH"

Вот эти настройки, кабель Twinax

forum.lissyara.su