Skip to content

Conversation

adrianchiris
Copy link
Contributor

@adrianchiris adrianchiris commented Feb 14, 2021

- What this PR does and why is it needed
This PR implements Phase-2 support for Smart-NICs as introduced in #2005 (and is built on top of it)
It enables host access to pod and service networks.
With this change HostNetwork pods will be able to access K8s services and other pods.

This is achieved by exposing a VF backed ovn-k8s-mp0 port on the Smart-NIC host
with its corresponding VF Representor configured as ovn-k8s-mp0 in OVS on Smart-NIC side

In addition this PR enables Shared Gateway functionality on Smart-NICs by:

  • Populating the correct L3 gateway config to K8 node object.
    • Gateway router external subnet is assumed to be the same as the smart-NIC unless RouterSubnet parameter is provided as override
  • Splitting some of its functionality between the Smart-NIC and its Host.
  • Taking into account the Host PF representor on Smart-NIC when constructing OpenFlow rules that are used
    by OpenflowManager.

- Node to reviewers
This change does not alter the default management port and gateway creation and handling in ovnkube node when
Smart-NIC is not present.

Commits 1-4: management port support for Smart-NICs
Commits 5-12: Gateway support for Smart-NICs
Commit 13: Unit tests

- Description for the changelog
Please refer to commit messages

- How to verify it
Unit-tests were added to ensure correctness.

@adrianchiris adrianchiris changed the title Bf support phase2 Smart-NIC support phase2 Feb 14, 2021
@adrianchiris
Copy link
Contributor Author

@moshe010 @girishmg @trozet

@coveralls
Copy link

coveralls commented Feb 14, 2021

Coverage Status

Coverage increased (+0.3%) to 53.184% when pulling f82d300 on adrianchiris:bf_support_phase2 into 9aa3a7b on ovn-org:master.

@adrianchiris adrianchiris force-pushed the bf_support_phase2 branch 3 times, most recently from 9c95ea7 to f9048d6 Compare March 16, 2021 08:19
@adrianchiris adrianchiris force-pushed the bf_support_phase2 branch 2 times, most recently from f9e6ea9 to 74e36b4 Compare March 31, 2021 07:55
@adrianchiris adrianchiris force-pushed the bf_support_phase2 branch 6 times, most recently from 07388bb to 6332707 Compare May 3, 2021 15:15
@adrianchiris
Copy link
Contributor Author

@trozet @girishmg @zshi-redhat

I have added Gateway support for Smart-NICs on top of existing management port support in this PR. PTAL
If the approach is OK i will work on UT.

Tested locally on a Smart-NIC setup and flows are working as expected

@zshi-redhat
Copy link
Contributor

zshi-redhat commented May 10, 2021

@trozet @girishmg @zshi-redhat

I have added Gateway support for Smart-NICs on top of existing management port support in this PR. PTAL
If the approach is OK i will work on UT.

Tested locally on a Smart-NIC setup and flows are working as expected

Tested this PR locally, hit the issue that /sys/class/net/p0/smart_nic/pf/mac doesn't exist on RHEL-8.3 kernel.
Then tried to customize the code to return Host PF MAC directly when calling GetRepresentorMacAddress and retested:

  1. The ovnkube-node can start successfully on SmartNIC,
  2. Accessing from hostnetwork (on x86 host) to API service worked (haven't compared this with old PR).
  3. The programmed flows in br-ex are using the right MAC address (Host PF MAC instead of SmartNIC gwInt MAC).

@adrianchiris adrianchiris force-pushed the bf_support_phase2 branch 6 times, most recently from f3d0c77 to 7dcf357 Compare May 25, 2021 16:45
Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of these comments are necessarily blocking

stdout, stderr, err := util.RunOVSVsctl(
"--", "--may-exist", "add-port", "br-int", types.K8sMgmtIntfName,
"--", "set", "interface", types.K8sMgmtIntfName,
"external-ids:iface-id="+types.K8sPrefix+mp.nodeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically identical to the non-smart-nic version (and has to be, because lots of other code would fail if we didn't use exactly the same naming conventions). So it seems like this ought to be done by shared code, rather than by having an exact duplicate of the other RunOVSVsctl invocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed conceptually they are the same, but not exactly.

  1. it does not deal (and does not need to deal) with legacy interface names
  2. it is not an internal port
  3. MTU is set on the VF representor and not via ovs

here is the bit of code for the non-smart-nic version:

	legacyMgmtIntfName := util.GetLegacyK8sMgmtIntfName(mp.nodeName)
	stdout, stderr, err := util.RunOVSVsctl(
		"--", "--if-exists", "del-port", "br-int", legacyMgmtIntfName,
		"--", "--may-exist", "add-port", "br-int", types.K8sMgmtIntfName,
		"--", "set", "interface", types.K8sMgmtIntfName,
		"type=internal", "mtu_request="+fmt.Sprintf("%d", config.Default.MTU),
		"external-ids:iface-id="+types.K8sPrefix+mp.nodeName)

we can probably make a generic enough method for the "add-port" part to serve both cases but i dont think it would benefit code readability much, so id prefer to leave it as is for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... maybe there should just be more comments clarifying the ways that "smart-nic"+"smart-nic-host" is still different from "full"?

The main point of my comment is that I worry that someone will make a change to the "full" implementation and not notice that a very similar change may be needed in the "smart-nic" or "smart-nic-host" implementation...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

longer term i think the best way is to have some CI running on this hardware.
Im also working on design doc which depicts the changes which im happy to then upload to docs.

Reviewers would hopefully be aware of this to reduce chance of breakage.

I can add some comments explaining these implementations with another comment in management-port.go saying something of the sort: "if you change stuff here, changes may also be needed in management-port-smartnic.go as well"

WDYT ?

return err
}

if config.OvnKubeNode.Mode == types.NodeModeSmartNICHost {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in this file has gotten completely... nonlogical. There really needs to be a refactoring to get rid of all the constant config.OvnKubeNode.Mode checks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not too happy with this either, but lets address this in a separate PR.
once this gets merged ill open an issue and followup on it.

also open to suggestions on how to achieve this with a good balance between complexity and logic duplication.

This commit adds a new ovnkube node configuration
to set a netdev which will back the management port.

When Smart-NIC is used, to allow host to access
pod and service network it is needed to back the
management port by a VF (and a VF representor on
the Smart-NIC)

We introduce `ovnkube-node-mgmt-port-netdev`
configuration exposed via flag to allow this
information to be passed to ovnkube node.

On Non-Smart-NIC hosts this configuration parameter
carries no meaning.

On Smart-NIC host it specifies the netdev to
be used as management port.

On Smart-NIC it specifies the represetnor netdev
to be used as management port.

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
Executes a callable in a periodic manner to allow code reuse
by ovnkube node to call healthcheck and monitoring methods:
 - node.checkForStaleOVSInterfaces()
 - ManagementPort.CheckManagementPortHealth()

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
This commit introduces a new ManagementPort interface
which provides:
 - Create method to create and setup the management port
 - A healthcheck method to perform healthcheck on the management port

Create 3 types of implementations for the ManagementPort interface

 1. managementPort : management port creation and healthcheck
    as it is today.
 2. managementPortSmartNIC: management port creation and health
    monitoring on smart-NIC side. consists of OVS configuration
    of a VF-representor backed management port.
 3. managementPortSmartNICHost: VF backed manangement port
    creation and health monitoring for the host side management port.

 This split of logic is needed to support Smart-NICs where IPTables
 logic is required to be performed only on host and OVS logic is
 required to be performed only on the Smart-NIC embedded host.

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
This commit adjusts ovnkube Node.Run() logic to use
the newly defined management port interface according
to the defined ovnkube-node-mode.

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
Update sriovnet lib to support new functionality
needed for supporting shared gateway for smart-NICs

The new sriovnet lib relies on new devlink functionality
added to netlink.

Run go mod tidy and go mod vendor

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
Extend sriovnet ops with functionality required
to support shared gateway in Smart-NICs

- GetRepresentorPeerMacAddress(): Get The MAC address of the
  Host interface associated with the provided port representor.

- GetRepresentorPortFlavour(): Returns the representor port flavour
  e.g whether the given interface is a VF or PF representor netdev

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
This method will list the link devices in the system.
This is required as in Smart-NIC host we need to find
the Interface which is associated with the K8s Node IP
so the proper routes/nat may be programmed on the interface.

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
This commit extends utils package with a method
that will later be utilized in the series to support
shared gateway in Smart-NICs.

In addition add a helper method to allow code reuse.

- GetSmartNICHostInterface(): returns the host representor netdev
  which is connected to the provided OVS bridge.

- getBridgePortsInterfaces(): helper method to get an OVS bridge's
  ports and their associated interfaces

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
gateway-router-subnet configuration parameters allows
to specify the subnet to be used for the GR when traffic
Egresses OVN to External/Underlay.

In Smart-NICs Nodes the GR is located on the Smart-NIC
embedded CPU and may not share the same subnet as the host.

This parameter allows to explicitly configure the GR subnet
instead of derriving it form the machine where the Gateway
is located.

The subnet must match the kube node ip address.

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
In Smart-NICs OVN/OVS runs on the NIC's embedded CPU.
There are several changes that are required to support Shared
Gateway For Smart-NIC nodes.

1. k8s node l3 gateway annotations should be updated with the host's
   IP address, subnet and MAC so that ovn-kubernetes master may create
   the GR with the correct external port information.
   Before the change this commit proposes, the GR external port was
   created with the smart-NIC uplink
   attributes instead of the hosts uplink (or in our case host PF)
   attributes
2. update OpenflowManager rules to create rules on the physical bridge
   based on the host's OVS port. while in non Smart-NICs its simply
   the bridge's LOCAL port. when Smart-NICs are used it is the hosts
   PF representor that is connected to the bridge.
3. iptable rules for node ports should be created on the host
   while openflow rules on the smart-nic
4. loadbalancer health check should run on the host

This commit implements the above required changes.

- Split shared gateway initialization logic based on
  Ovnkube node mode.
- Initialize L3 Gateway annotation with host information.
- Add method to initialize shared gateway on Smart-NIC hosts.
  This method performs a minimal Initialization on the host
  to enable gateway functionality.
- OpenFlowManager rules for smart-NICs to use host PF
  representor and host MAC instead of LOCAL port when forwarding
  traffic to/from Smart-NIC host.
- Move shared gateway NodePort watcher IPTables logic to its own
  object so it may be used separately.
- Add method to initialize shared gateway on Smart-NIC hosts.
  This method performs minimal initalization on the host to enable
  gateway functionality.

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
- Get K8s node IP address and pass it to gateway init
  functions
- Initialize gateway on Smart-NIC host
- Do not set up routes on Smart-NIC when creating gateway
- Reuse Service route initialization method from gateway_init.go

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
prevent ovn syncNodeGateway from failing on missing
host-addresses annotations.

Instead, proceeed with syncing ovn gateway without
additional addresses.

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
- config tests
  - Fix OvnKubeNodeMode tests
  - Add tests for OvnKubeNode.MgmtPortNetdev config
  - Add tests for Gateway.Router config
- gateway_init_linux tests
  - update call to newSharedGateway() with new param
  - add unit test for getSmartNICHostPrimaryIPAddresses()
  - add unit test for getInterfaceByIP()
  - add unit test for configureSvcRouteViaInterface()
  - add test for initGatewaySmartNicHost()
  - add test for newSharedGateway() in smart-nic mode
- management-port tests
  - Add unit tests for managementPortSmartNIC()
  - Add unit tests for managementPortSmartNICHost()
  - Add unit tests for NewManagementPort()
- util unit tests
  - Add test for RunPeriodicallyUntilStop()

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
During CI runs, when attempting to get the ofport of
the patch, ofport is reported as "". gatewayReady method
in this case reports gateway as ready. gateway init then fails
as patch port is not yet available.

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
Remove util.RunPeriodicallyUntilStop() method
as k8s.io/apimachinery/pkg/util/wait.Until provides
an identical API.

Signed-off-by: Adrian Chiris <adrianc@nvidia.com>
@adrianchiris
Copy link
Contributor Author

@trozet @danwinship rebased code and addressed comments PTAL.

Hoping we can converge to merge this week.

@trozet trozet merged commit eb1b265 into ovn-kubernetes:master Jul 27, 2021
astoycos added a commit to astoycos/ovn-kubernetes-1 that referenced this pull request Aug 17, 2021
This commit exists in order to merge the External
TrafficPolicy feature downstream without the
[smartnicv2](ovn-kubernetes/ovn-kubernetes#2042)
work

Once the smartnicv2 work is pulled downstream
this commit shoudl be dropped

Signed-off-by: astoycos <astoycos@redhat.com>
astoycos added a commit to astoycos/ovn-kubernetes-1 that referenced this pull request Aug 25, 2021
This commit exists in order to merge the External
TrafficPolicy feature downstream without the
[smartnicv2](ovn-kubernetes/ovn-kubernetes#2042)
work

Once the smartnicv2 work is pulled downstream
this commit shoudl be dropped

Signed-off-by: astoycos <astoycos@redhat.com>
astoycos added a commit to astoycos/ovn-kubernetes-1 that referenced this pull request Aug 25, 2021
This commit exists in order to merge the External
TrafficPolicy feature downstream without the
[smartnicv2](ovn-kubernetes/ovn-kubernetes#2042)
work

Once the smartnicv2 work is pulled downstream
this commit shoudl be dropped

Signed-off-by: astoycos <astoycos@redhat.com>
astoycos added a commit to astoycos/ovn-kubernetes-1 that referenced this pull request Aug 25, 2021
This commit exists in order to merge the External
TrafficPolicy feature downstream without the
[smartnicv2](ovn-kubernetes/ovn-kubernetes#2042)
work

Once the smartnicv2 work is pulled downstream
this commit shoudl be dropped

Signed-off-by: astoycos <astoycos@redhat.com>
astoycos added a commit to astoycos/ovn-kubernetes-1 that referenced this pull request Aug 25, 2021
This commit exists in order to merge the External
TrafficPolicy feature downstream without the
[smartnicv2](ovn-kubernetes/ovn-kubernetes#2042)
work

Once the smartnicv2 work is pulled downstream
this commit shoudl be dropped

Signed-off-by: astoycos <astoycos@redhat.com>
tssurya added a commit to tssurya/ovn-kubernetes-1 that referenced this pull request Jul 13, 2022
This commit can be reverted when
vishvananda/netlink#784
merges and we can re-vendor the next netlink release

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
(cherry picked from commit ecd92a8)
(cherry picked from commit 153bfe7)
(cherry picked from commit 79544d5)
(cherry picked from commit 5599844)

 Conflicts:
	go-controller/vendor/github.com/vishvananda/netlink/conntrack_linux.go
 because netlink vendoring hasn't been done (was done in
ovn-kubernetes/ovn-kubernetes#2042 which isn't in 4.9)
tssurya added a commit to tssurya/ovn-kubernetes-1 that referenced this pull request Jul 14, 2022
This commit can be reverted when
vishvananda/netlink#784
merges and we can re-vendor the next netlink release

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
(cherry picked from commit ecd92a8)
(cherry picked from commit 153bfe7)
(cherry picked from commit 79544d5)
(cherry picked from commit 5599844)

 Conflicts:
	go-controller/vendor/github.com/vishvananda/netlink/conntrack_linux.go
 because netlink vendoring hasn't been done (was done in
ovn-kubernetes/ovn-kubernetes#2042 which isn't in 4.9)

(cherry picked from commit d3cba2c)
tssurya added a commit to tssurya/ovn-kubernetes-1 that referenced this pull request Jul 14, 2022
This commit can be reverted when
vishvananda/netlink#784
merges and we can re-vendor the next netlink release

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
(cherry picked from commit ecd92a8)
(cherry picked from commit 153bfe7)
(cherry picked from commit 79544d5)
(cherry picked from commit 5599844)

 Conflicts:
	go-controller/vendor/github.com/vishvananda/netlink/conntrack_linux.go
 because netlink vendoring hasn't been done (was done in
ovn-kubernetes/ovn-kubernetes#2042 which isn't in 4.9)

(cherry picked from commit d3cba2c)
(cherry picked from commit 0eb102b)
kyrtapz pushed a commit to kyrtapz/ovn-kubernetes that referenced this pull request Feb 8, 2024
…t/cherry-pick-2028-to-release-4.14

[release-4.14] OCPBUGS-28789: Fix LGW ETP=Local on IPv6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants