-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
Abnormal Openshift Node process utilization.
We're experiencing an unusual CPU utilization in few of our Openshift nodes, sometimes eating more then half of the node CPU (please refer to the stats provided below for more details).
There is no clear evidence as to where does the load come from within the origin node container. PODs are spread equally across nodes, so we would expect more or less same utilization across all nodes.
Our non-prod environments are running Openshift Origin 1.4.
Prod environment is running on OCP v3.4. We need to make sure this bug is not replicated in OCP before we do any deployments there, so your quick assistance will be appreciated.
Version
oc v1.4.0+208f053
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO
Steps To Reproduce
- Can't reproduce in other nodes
Current Result
Container name:
openshift/node:v1.4.0
Docker stats from Node experiencing issue:
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
edfa59a0981a 226.59% 661.9 MiB / 27.79 GiB 2.33% 0 B / 0 B 487.6 MB / 707.9 MB 31
Expected Result
Docker stats from a healthy node:
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
9c5325114fe1 10.04% 231.4 MiB / 27.79 GiB 0.81% 0 B / 0 B 261.5 MB / 11.51 GB 36
Additional Information
Deployment type: Containerized
Docker version: 1.12.5
OS: Redhat 7.3 (kernel 3.10.0-514.6.1.el7.x86_64)
Cluster Info: 3 Master + 12 Nodes
Node Resources: 4 CPU/30.5 GB of RAM
Average Pods/Node: ~50
Faulty node docker info:
docker info
Server Version: 1.12.5
Storage Driver: devicemapper
Pool Name: docker--vg-docker--pool
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 25.48 GB
Data Space Total: 85.46 GB
Data Space Available: 59.98 GB
Metadata Space Used: 11.42 MB
Metadata Space Total: 218.1 MB
Metadata Space Available: 206.7 MB
Thin Pool Minimum Free Space: 8.546 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: host bridge null overlay
Authorization: rhel-push-plugin
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Security Options: seccomp selinux
Kernel Version: 3.10.0-514.16.1.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.3 (Maipo)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 4
Total Memory: 27.79 GiB
ID: W6NH:D2OP:JCBO:SLNW:Q4FX:FUOJ:2JIS:L5SU:IBNY:RZI2:ORCH:6KR2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://registry.access.redhat.com/v1/
Insecure Registries:
172.30.0.0/16
127.0.0.0/8
Registries: registry.access.redhat.com (secure), docker.io (secure)
Healthy node docker info:
Server Version: 1.12.5
Storage Driver: devicemapper
Pool Name: docker--vg-docker--pool
Pool Blocksize: 524.3 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 41.87 GB
Data Space Total: 214.5 GB
Data Space Available: 172.6 GB
Metadata Space Used: 19.18 MB
Metadata Space Total: 109.1 MB
Metadata Space Available: 89.87 MB
Thin Pool Minimum Free Space: 21.45 GB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge null host overlay
Authorization: rhel-push-plugin
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Security Options: seccomp selinux
Kernel Version: 3.10.0-514.10.2.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.3 (Maipo)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 4
Total Memory: 27.79 GiB
ID: AJCW:VLRA:Z3PB:VKGN:RMJ3:DKD6:32EN:NDRC:EEIW:4KII:DK6H:STPG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://registry.access.redhat.com/v1/
Insecure Registries:
172.30.0.0/16
127.0.0.0/8
Registries: registry.access.redhat.com (secure), docker.io (secure)