【PFC】PFC设置 H3C交换机设置和主机设置–编辑中

作者:bandaoyu,本文原始连接:【PFC】PFC设置 H3C交换机设置和主机设置--编辑中_bandaoyu的note-CSDN博客

即可即用

以下设置是在L3层做PFC:

交换机设置

交换机设置端口HundredGigE1/0/2的PFC

1、配置优先级信任模式为DSCP

[H3C]sys

[H3C]interface HundredGigE1/0/2

[H3C-HundredGigE1/0/2] qos trust  dscp

验证:[H3C-HundredGigE1/0/2] display qos trust int HundredGigE1/0/2

# 在以太网接口HundredGigE1/0/2上开启PFC功能,并开启802.1p优先级1PFC功能:

(如果不知道消息映射到的优先级是第几个,则可以简单粗暴的开启所有优先级上的PFC)

[H3C]sys

[H3C]interface HundredGigE1/0/2

[H3C-HundredGigE1/0/2] priority-flow-control enable

[H3C-HundredGigE1/0/2] priority-flow-control no-drop dot1p 1

一次配多个:

[H3C-HundredGigE1/0/2] priority-flow-control no-drop dot1p 1,2,3

验证:display priority-flow-control interface

(如果不知道消息映射到的优先级是第几个,则可以简单粗暴的开启所有优先级(0-7)上的PFC)

[H3C-HundredGigE1/0/2] priority-flow-control no-drop dot1p 0,1,2,3,4,5,6,7

主机设置

mellonx网卡(驱动)

设置在L3层(dscp)做PFC 

 mlnx_qos -i eth2 --trust=dscp

会打印出

DCBX mode: OS controlled
Priority trust state: dscp
dscp2prio mapping:
prio:0 dscp:07,06,05,04,03,02,01,00,
prio:1 dscp:15,14,13,12,11,10,09,08,
prio:2 dscp:23,22,21,20,19,18,17,16,
prio:3 dscp:31,30,29,28,27,26,25,24,
prio:4 dscp:39,38,37,36,35,34,33,32,
prio:5 dscp:47,46,45,44,43,42,41,40,
prio:6 dscp:55,54,53,52,51,50,49,48,
prio:7 dscp:63,62,61,60,59,58,57,56,

根据消息的dscp 映射到的prio等级,设置对应等级的PFC,如:

mlnx_qos -i eth2 -f   0,0,0,1,0,0,0,0     #0-1-2-3-4-5-6-7

如果不清楚dscp和映射到的等级,可以粗暴的所有prio全启用

mlnx_qos -i eth2 -f   1,1,1,1,1,1,1,1   

intel网卡(驱动)

配置简述

H3C交换机设置

(http://www.h3c.com/cn/d_202104/1397802_30005_0.htm)

在L3层做PFC,交换机 设置

例如如果我们两台机器连接交换机的端口是: E1/0/2、E1/0/4、E1/0/6。

所以设置这3个端口的trust type为 dscp,方法如下:

登录交换机

1、配置优先级信任模式为DSCP

[H3C]sys

[H3C]interface HundredGigE1/0/2

[H3C-HundredGigE1/0/2] qos trust  dscp

验证:[H3C-HundredGigE1/0/2] display qos trust int HundredGigE1/0/2

*配置信任模式为DSCP,交换机才会使用 报文自带的DSCP做映射。
设置信任模式为DSCP,则进入交换机的报文优先级映射会涉及到3个表:
--> 映射,
dscp-dot1p    #入端口报文为dscp会被交换机映射到lp队列
dscp-dp       #入端口报文为dscp会被交换机映射到dp队列
dscp-dscp     #入端口报文的dscp会被交换机改为dscp转发
(优先级可分为两类:报文携带优先级和设备调度优先级。
设备调度优先级是指报文在设备内转发时所使用的优先级,只对当前设备自身有效。
设备调度优先 级包括以下几种: 
• 
本地优先级(LP):设备为报文分配的一种具有本地意义的优先级,每个本地优先级对应一 个队列,本地优先级值越大的报文,进入的队列优先级越高,从而能够获得优先的调度。
• 丢弃优先级(DP):在进行报文丢弃时参考的参数,丢弃优先级值越大的报文越被优先丢弃。)

使能PFC功能,并配置使用PFC功能的802.1p优先级

交换机使用802.1p优先级做PFC配置流控,所以我们要做计算一下映射关系:

我们在prio1 上做PFC,Prio 1 ====> dscp=8 ====> dot1p = 1.

所以只需开启配置交换机端口的dot1p 1等级使用PFC即可。

示例:

# 在以太网接口HundredGigE1/0/2上开启PFC功能,并开启802.1p优先级1PFC功能:

[H3C]sys

[H3C]interface HundredGigE1/0/2

[H3C-HundredGigE1/0/2] priority-flow-control enable

[H3C-HundredGigE1/0/2] priority-flow-control no-drop dot1p 1

一次配多个:

[H3C-HundredGigE1/0/2] priority-flow-control no-drop dot1p 1,2,3

验证:display priority-flow-control interface

关闭交换机PFC

[H3C]sys

[H3C]interface HundredGigE1/0/4

[H3C-HundredGigE1/0/4] undo priority-flow-control

主机设置

mellonx RDMA网卡

https://blog.csdn.net/bandaoyu/article/details/117715099

intel RDMA网卡

---------------------

文档《Intel® Ethernet 800 Series Linux Flow Control》

X722:
The X722 adapter supports only link-level flow control (LFC).

E810:
E810控制器支持链路级流量控制(LFC)和优先级
流量控制(PFC)。使用rocev2模式的E810 时强烈建议启用流量控制。

--- Link Level Flow Control (LFC) (E810 and X722)

To enable link-level flow control on E810 or X722, use "ethtool -A".
For example, to enable LFC in both directions (rx and tx):
    ethtool -A DEVNAME rx on tx on

Confirm the setting with "ethtool -a":
    ethtool -a DEVNAME

Sample output:
    Pause parameters for interface:
    Autonegotiate: on
    RX: on
    TX: on
    RX negotiated:  on
    TX negotiated:  on

Full enablement of LFC requires the switch or link partner be configured for
rx and tx pause frames. Refer to switch vendor documentation for more details.

---优先级流量控制(PFC)(仅限E810)

优先流控制(PFC)在E810上支持两种模式:willing 和non-willing 模式.

E810还有两种数据中心桥接(DCB)模式:software和firmware。

有关软件和固件模式的更多背景信息,请参阅E810 ice驱动程序README。

- For PFC willing mode, firmware DCB is recommended.
- For PFC non-willing mode, software DCB must be used.

注意:E810最多支持4个流量类(TCs),其中一个可以启用PFC。(E810 supports a maximum of 4 traffic classes (TCs), one of which may
      have PFC enabled.)

*** PFC willing mode

在willing模式下,E810“willing”接受来自其链路伙伴的DCB设置。DCB配置在链路伙伴(通常是交换机)上,并且
E810将自动发现DCB设置并将其应用到自己的端口。这简化了更大集群中的DCB配置,并消除了需要在链路两侧独立配置DCB。

在E810上以willing模式启用PFC,请使用Ethtool启用固件(firmware )DCB。
启用固件DCB自动置NIC在willing 模式下:
ethtool --set-priv-flags devname fw-lldp-agent

要确认设置,请使用以下命令:
ethtool  -  show-priv-flags devname

期望输出:
  fw-lldp-agent     :on

注意:启用固件DCB时,E810 NIC可能会遇到适配器范围重置,

因为DCBX willing的配置根据链接伙伴传播(过来的配置)修改,删除了启用RDMA的流量类(TC)。
这通常发生删除与优先级0(RDMA priority 0,默认优先级)关联的TC。重置导致适配器重新初始化而暂时的连接中断。

(Note: When firmware DCB is enabled, the E810 NIC may experience an adapter-wide
      reset when the DCBX willing configuration change propagated from the link
      partner removes an RDMA-enabled traffic class (TC). This typically occurs
      when removing a TC associated with priority 0 (the default priority for
      RDMA). The reset results in a temporary loss of connectivity as the
      adapter re-initializes.)

Switch DCB and PFC configuration syntax varies by vendor. Consult your switch
manual for details. Sample Arista switch configuration commands:

交换机DCB和PFC配置语法因供应商而异。请您参阅交换机手册获取有关详细信息。Arista交换机配置命令示例:
-  示例:在交换机端口21上为优先级0(priority 0)启用PFC
*进入交换机端口21的配置模式:
         switch#configure
         switch(config)#interface ethernet 21/1
 *打开PFC:
         switch(config-if-Et21/1)#priority-flow-control mode on
  *为“no-drop”设置优先级0(即PFC启用):
         switch(config-if-Et21/1)#priority-flow-control priority 0 no-drop
*验证交换机端口PFC配置:
         switch(config-if-Et21/1)#show priority-flow-control
-示例:在交换机端口21上启用DCBX
*在IEEE模式下启用DCBX:
         switch(config-if-Et21/1)#dcbx mode ieee
*显示DCBX设置(包括邻居端口设置):
         switch(config-if-Et21/1)#show dcbx

*** PFC non-willing mode

In non-willing mode, DCB settings must be configured on both E810 and its link
partner. Non-willing mode is software-based. OpenLLDP (lldpad and lldptool) is
recommended.

To enable non-willing PFC on E810:
  1. Disable firmware DCB. Firmware DCB is always willing. If enabled, it
     will override any software settings.
         ethtool --set-priv-flags DEVNAME fw-lldp-agent off
  2. Install OpenLLDP
         yum install lldpad
  3. Start the Open LLDP daemon:
        lldpad -d
  4. Verify functionality by showing current DCB settings on the NIC:
        lldptool -ti
  5. Configure your desired DCB settings, including traffic classes,
     bandwidth allocations, and PFC.
     The following example enables PFC on priority 0, maps all priorities to
     traffic class (TC) 0, and allocates all bandwidth to TC0.
     This simple configuration is suitable for enabling PFC for all traffic,
     which may be useful for back-to-back benchmarking. Datacenters will
     typically use a more complex configuration to ensure quality-of-service
     (QoS).
     a. Enable PFC for priority 0:
           lldptool -Ti -V PFC willing=no enabled=0
     b. Map all priorities to TC0 and allocate all bandwidth to TC0:
           lldptool -Ti -V ETS-CFG willing=no \
           up2tc=0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0 \
           tsa=0:ets,1:strict,2:strict,3:strict,4:strict,5:strict,6:strict,7:strict \
           tcbw=100,0,0,0,0,0,0,0
  6. Verify output of "lldptool -ti ":
        Chassis ID TLV
            MAC: 68:05:ca:a3:89:78
        Port ID TLV
            MAC: 68:05:ca:a3:89:78
        Time to Live TLV
            120
        IEEE 8021QAZ ETS Configuration TLV
            Willing: no
            CBS: not supported
            MAX_TCS: 8
            PRIO_MAP: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
            TC Bandwidth: 100% 0% 0% 0% 0% 0% 0% 0%
            TSA_MAP: 0:ets 1:strict 2:strict 3:strict 4:strict 5:strict 6:strict 7:strict
        IEEE 8021QAZ PFC TLV
            Willing: no
            MACsec Bypass Capable: no
            PFC capable traffic classes: 8
            PFC enabled: 0
        End of LLDPDU LTV
  7. Configure the same settings on the link partner.

完全启用PFC要求为PFC暂停帧配置交换机或链接伙伴。有关更多详细信息,请参阅交换机供应商文档。
 

---将RDMA流量指向a traffic class

当使用PFC时,可以将业务(流量)定向到一个或多个业务类别- traffic classes(tc)。
因为RDMA流量绕过内核,Linux流量控制方法无法使用tc, cgroups, or egress-qos-map 。取而代之的(方法是)在您的应用程序命令行设置Type of Service
(ToS) 字段。ToS-to-priority 映射是

Linux中的硬编码如下:

  ToS   Priority
  ---   --------
   0       0
   8       2
  24       4
  16       6

然后使用lldptool或switch工具使用ETS将优先级(Priority)映射到traffic classes。(Priorities are then mapped to traffic classes using ETS using lldptool or switch utilities.)

在应用程序中设置ToS 16的示例:(Examples of setting ToS 16 in an application:)
  ucmatose -t 16
  ib_write_bw -t 16

Alternatively, for RoCEv2, ToS may be set for all RoCEv2 traffic using configfs. For example, to set ToS 16 on device rdma, port 1:
  mkdir /sys/kernel/config/rdma_cm/rdma
  echo 16 > /sys/kernel/config/rdma_cm/rdma/ports/1/default_roce_tos

或者,对于Rocev2,可以使用configf为所有Rocev2流量设置ToS。例如,要在设备 rdma,端口1 上设置ToS 16:

mkdir  / sys / kernel / config / rdma_cm / rdma
echo 16> / sys / kernel / config / rdma_cm / rdma / ports / 1 / default_roce_tos

参考:https://blog.csdn.net/bandaoyu/article/details/116203690


版权声明:本文为bandaoyu原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
THE END
< <上一篇
下一篇>>