-
Notifications
You must be signed in to change notification settings - Fork 391
Smart-NIC support phase2 #2042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smart-NIC support phase2 #2042
Conversation
799690f
to
0fb940a
Compare
0fb940a
to
b0cfffe
Compare
9c95ea7
to
f9048d6
Compare
f9e6ea9
to
74e36b4
Compare
74e36b4
to
e6588cf
Compare
07388bb
to
6332707
Compare
@trozet @girishmg @zshi-redhat I have added Gateway support for Smart-NICs on top of existing management port support in this PR. PTAL Tested locally on a Smart-NIC setup and flows are working as expected |
go-controller/vendor/github.com/Mellanox/sriovnet/sriovnet_switchdev.go
Outdated
Show resolved
Hide resolved
Tested this PR locally, hit the issue that
|
f3d0c77
to
7dcf357
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none of these comments are necessarily blocking
stdout, stderr, err := util.RunOVSVsctl( | ||
"--", "--may-exist", "add-port", "br-int", types.K8sMgmtIntfName, | ||
"--", "set", "interface", types.K8sMgmtIntfName, | ||
"external-ids:iface-id="+types.K8sPrefix+mp.nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically identical to the non-smart-nic version (and has to be, because lots of other code would fail if we didn't use exactly the same naming conventions). So it seems like this ought to be done by shared code, rather than by having an exact duplicate of the other RunOVSVsctl
invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed conceptually they are the same, but not exactly.
- it does not deal (and does not need to deal) with legacy interface names
- it is not an internal port
- MTU is set on the VF representor and not via ovs
here is the bit of code for the non-smart-nic version:
legacyMgmtIntfName := util.GetLegacyK8sMgmtIntfName(mp.nodeName)
stdout, stderr, err := util.RunOVSVsctl(
"--", "--if-exists", "del-port", "br-int", legacyMgmtIntfName,
"--", "--may-exist", "add-port", "br-int", types.K8sMgmtIntfName,
"--", "set", "interface", types.K8sMgmtIntfName,
"type=internal", "mtu_request="+fmt.Sprintf("%d", config.Default.MTU),
"external-ids:iface-id="+types.K8sPrefix+mp.nodeName)
we can probably make a generic enough method for the "add-port" part to serve both cases but i dont think it would benefit code readability much, so id prefer to leave it as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm... maybe there should just be more comments clarifying the ways that "smart-nic"+"smart-nic-host" is still different from "full"?
The main point of my comment is that I worry that someone will make a change to the "full" implementation and not notice that a very similar change may be needed in the "smart-nic" or "smart-nic-host" implementation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
longer term i think the best way is to have some CI running on this hardware.
Im also working on design doc which depicts the changes which im happy to then upload to docs
.
Reviewers would hopefully be aware of this to reduce chance of breakage.
I can add some comments explaining these implementations with another comment in management-port.go
saying something of the sort: "if you change stuff here, changes may also be needed in management-port-smartnic.go
as well"
WDYT ?
return err | ||
} | ||
|
||
if config.OvnKubeNode.Mode == types.NodeModeSmartNICHost { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in this file has gotten completely... nonlogical. There really needs to be a refactoring to get rid of all the constant config.OvnKubeNode.Mode
checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not too happy with this either, but lets address this in a separate PR.
once this gets merged ill open an issue and followup on it.
also open to suggestions on how to achieve this with a good balance between complexity and logic duplication.
This commit adds a new ovnkube node configuration to set a netdev which will back the management port. When Smart-NIC is used, to allow host to access pod and service network it is needed to back the management port by a VF (and a VF representor on the Smart-NIC) We introduce `ovnkube-node-mgmt-port-netdev` configuration exposed via flag to allow this information to be passed to ovnkube node. On Non-Smart-NIC hosts this configuration parameter carries no meaning. On Smart-NIC host it specifies the netdev to be used as management port. On Smart-NIC it specifies the represetnor netdev to be used as management port. Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
Executes a callable in a periodic manner to allow code reuse by ovnkube node to call healthcheck and monitoring methods: - node.checkForStaleOVSInterfaces() - ManagementPort.CheckManagementPortHealth() Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
This commit introduces a new ManagementPort interface which provides: - Create method to create and setup the management port - A healthcheck method to perform healthcheck on the management port Create 3 types of implementations for the ManagementPort interface 1. managementPort : management port creation and healthcheck as it is today. 2. managementPortSmartNIC: management port creation and health monitoring on smart-NIC side. consists of OVS configuration of a VF-representor backed management port. 3. managementPortSmartNICHost: VF backed manangement port creation and health monitoring for the host side management port. This split of logic is needed to support Smart-NICs where IPTables logic is required to be performed only on host and OVS logic is required to be performed only on the Smart-NIC embedded host. Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
This commit adjusts ovnkube Node.Run() logic to use the newly defined management port interface according to the defined ovnkube-node-mode. Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
Update sriovnet lib to support new functionality needed for supporting shared gateway for smart-NICs The new sriovnet lib relies on new devlink functionality added to netlink. Run go mod tidy and go mod vendor Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
Extend sriovnet ops with functionality required to support shared gateway in Smart-NICs - GetRepresentorPeerMacAddress(): Get The MAC address of the Host interface associated with the provided port representor. - GetRepresentorPortFlavour(): Returns the representor port flavour e.g whether the given interface is a VF or PF representor netdev Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
This method will list the link devices in the system. This is required as in Smart-NIC host we need to find the Interface which is associated with the K8s Node IP so the proper routes/nat may be programmed on the interface. Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
This commit extends utils package with a method that will later be utilized in the series to support shared gateway in Smart-NICs. In addition add a helper method to allow code reuse. - GetSmartNICHostInterface(): returns the host representor netdev which is connected to the provided OVS bridge. - getBridgePortsInterfaces(): helper method to get an OVS bridge's ports and their associated interfaces Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
gateway-router-subnet configuration parameters allows to specify the subnet to be used for the GR when traffic Egresses OVN to External/Underlay. In Smart-NICs Nodes the GR is located on the Smart-NIC embedded CPU and may not share the same subnet as the host. This parameter allows to explicitly configure the GR subnet instead of derriving it form the machine where the Gateway is located. The subnet must match the kube node ip address. Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
In Smart-NICs OVN/OVS runs on the NIC's embedded CPU. There are several changes that are required to support Shared Gateway For Smart-NIC nodes. 1. k8s node l3 gateway annotations should be updated with the host's IP address, subnet and MAC so that ovn-kubernetes master may create the GR with the correct external port information. Before the change this commit proposes, the GR external port was created with the smart-NIC uplink attributes instead of the hosts uplink (or in our case host PF) attributes 2. update OpenflowManager rules to create rules on the physical bridge based on the host's OVS port. while in non Smart-NICs its simply the bridge's LOCAL port. when Smart-NICs are used it is the hosts PF representor that is connected to the bridge. 3. iptable rules for node ports should be created on the host while openflow rules on the smart-nic 4. loadbalancer health check should run on the host This commit implements the above required changes. - Split shared gateway initialization logic based on Ovnkube node mode. - Initialize L3 Gateway annotation with host information. - Add method to initialize shared gateway on Smart-NIC hosts. This method performs a minimal Initialization on the host to enable gateway functionality. - OpenFlowManager rules for smart-NICs to use host PF representor and host MAC instead of LOCAL port when forwarding traffic to/from Smart-NIC host. - Move shared gateway NodePort watcher IPTables logic to its own object so it may be used separately. - Add method to initialize shared gateway on Smart-NIC hosts. This method performs minimal initalization on the host to enable gateway functionality. Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
- Get K8s node IP address and pass it to gateway init functions - Initialize gateway on Smart-NIC host - Do not set up routes on Smart-NIC when creating gateway - Reuse Service route initialization method from gateway_init.go Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
prevent ovn syncNodeGateway from failing on missing host-addresses annotations. Instead, proceeed with syncing ovn gateway without additional addresses. Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
- config tests - Fix OvnKubeNodeMode tests - Add tests for OvnKubeNode.MgmtPortNetdev config - Add tests for Gateway.Router config - gateway_init_linux tests - update call to newSharedGateway() with new param - add unit test for getSmartNICHostPrimaryIPAddresses() - add unit test for getInterfaceByIP() - add unit test for configureSvcRouteViaInterface() - add test for initGatewaySmartNicHost() - add test for newSharedGateway() in smart-nic mode - management-port tests - Add unit tests for managementPortSmartNIC() - Add unit tests for managementPortSmartNICHost() - Add unit tests for NewManagementPort() - util unit tests - Add test for RunPeriodicallyUntilStop() Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
During CI runs, when attempting to get the ofport of the patch, ofport is reported as "". gatewayReady method in this case reports gateway as ready. gateway init then fails as patch port is not yet available. Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
Remove util.RunPeriodicallyUntilStop() method as k8s.io/apimachinery/pkg/util/wait.Until provides an identical API. Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
4cb6486
to
f82d300
Compare
@trozet @danwinship rebased code and addressed comments PTAL. Hoping we can converge to merge this week. |
This commit exists in order to merge the External TrafficPolicy feature downstream without the [smartnicv2](ovn-kubernetes/ovn-kubernetes#2042) work Once the smartnicv2 work is pulled downstream this commit shoudl be dropped Signed-off-by: astoycos <astoycos@redhat.com>
This commit exists in order to merge the External TrafficPolicy feature downstream without the [smartnicv2](ovn-kubernetes/ovn-kubernetes#2042) work Once the smartnicv2 work is pulled downstream this commit shoudl be dropped Signed-off-by: astoycos <astoycos@redhat.com>
This commit exists in order to merge the External TrafficPolicy feature downstream without the [smartnicv2](ovn-kubernetes/ovn-kubernetes#2042) work Once the smartnicv2 work is pulled downstream this commit shoudl be dropped Signed-off-by: astoycos <astoycos@redhat.com>
This commit exists in order to merge the External TrafficPolicy feature downstream without the [smartnicv2](ovn-kubernetes/ovn-kubernetes#2042) work Once the smartnicv2 work is pulled downstream this commit shoudl be dropped Signed-off-by: astoycos <astoycos@redhat.com>
This commit exists in order to merge the External TrafficPolicy feature downstream without the [smartnicv2](ovn-kubernetes/ovn-kubernetes#2042) work Once the smartnicv2 work is pulled downstream this commit shoudl be dropped Signed-off-by: astoycos <astoycos@redhat.com>
This commit can be reverted when vishvananda/netlink#784 merges and we can re-vendor the next netlink release Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit ecd92a8) (cherry picked from commit 153bfe7) (cherry picked from commit 79544d5) (cherry picked from commit 5599844) Conflicts: go-controller/vendor/github.com/vishvananda/netlink/conntrack_linux.go because netlink vendoring hasn't been done (was done in ovn-kubernetes/ovn-kubernetes#2042 which isn't in 4.9)
This commit can be reverted when vishvananda/netlink#784 merges and we can re-vendor the next netlink release Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit ecd92a8) (cherry picked from commit 153bfe7) (cherry picked from commit 79544d5) (cherry picked from commit 5599844) Conflicts: go-controller/vendor/github.com/vishvananda/netlink/conntrack_linux.go because netlink vendoring hasn't been done (was done in ovn-kubernetes/ovn-kubernetes#2042 which isn't in 4.9) (cherry picked from commit d3cba2c)
This commit can be reverted when vishvananda/netlink#784 merges and we can re-vendor the next netlink release Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit ecd92a8) (cherry picked from commit 153bfe7) (cherry picked from commit 79544d5) (cherry picked from commit 5599844) Conflicts: go-controller/vendor/github.com/vishvananda/netlink/conntrack_linux.go because netlink vendoring hasn't been done (was done in ovn-kubernetes/ovn-kubernetes#2042 which isn't in 4.9) (cherry picked from commit d3cba2c) (cherry picked from commit 0eb102b)
…t/cherry-pick-2028-to-release-4.14 [release-4.14] OCPBUGS-28789: Fix LGW ETP=Local on IPv6
- What this PR does and why is it needed
This PR implements Phase-2 support for Smart-NICs as introduced in #2005 (and is built on top of it)
It enables host access to pod and service networks.
With this change HostNetwork pods will be able to access K8s services and other pods.
This is achieved by exposing a VF backed
ovn-k8s-mp0
port on the Smart-NIC hostwith its corresponding VF Representor configured as
ovn-k8s-mp0
in OVS on Smart-NIC sideIn addition this PR enables Shared Gateway functionality on Smart-NICs by:
RouterSubnet
parameter is provided as overrideby OpenflowManager.
- Node to reviewers
This change does not alter the default management port and gateway creation and handling in ovnkube node when
Smart-NIC is not present.
Commits 1-4: management port support for Smart-NICs
Commits 5-12: Gateway support for Smart-NICs
Commit 13: Unit tests
- Description for the changelog
Please refer to commit messages
- How to verify it
Unit-tests were added to ensure correctness.