Check Cluster Health
Action:
To verify all components are running correctly, use this short sanity-check on your system. The healthcheck script can be found in the OVF.
To execute the healthcheck script, navigate to the /root folder, then use the following command:
#./healthcheck
The expected results are:
n | No pods with issues. |
n | No errors in Redis pods. |
n | All elastic pods are running. |
You may proceed to open the UI when the following conditions are met:
n | All pods are running. |
n | The Redis cluster is in its desired state. |
n | Disk speeds are at the required rates. |
Login to the Management Dashboard and upload a file using the Test File feature on the Policies page.
Diagnosis:
n | One of the nodes' state is Not Ready. |
Resolution:
Check for errors, using the following command:
#kubectl describe node NodeName -n votiro
Check Redis Cluster Health
Action:
Check the Redis cluster, using the healthcheck script.
Diagnosis:
n | Redis cluster status: fail. |
n | Cluster known nodes: less than 6. |
n | Cluster size: less than 3. |
Nodes details:
n | Master-fail. |
n | Slave-fail. |
n | Redis nodes are in crashloopback state. |
Resolution:
Reset redis, using the following command:
#/root/reser-redis.sh
Check Cluster Health - Pods not Running
Action:
Check the status of pods, that they are not in a Running State after the Installation / Upgrade, using the following command:
#kubectl get pods -n votiro
Pod status: imagepullbackoff error \ crashloopbackoff
Diagnosis:
n | To further understand what happened on the affected pod, use the following command: |
#kubectl describe pod <pod-name> -n Votiro | grep -A20 Events
Scroll down to see the Events.
Resolution:
If the error is clear, act accordingly to resolve the issue.
If not address support with the describe output.
Check Cluster Health - Pods in imagepullback \ ImageErr
Action:
To further understand the issue with the affected pod, use the following command:
#kubectl describe pod <pod-name> -n Votiro | grep -A20 Events
Diagnosis:
n | The image attempting to be loaded does not exist, using the following command: |
#Docker images | grep “<docker_image>”
Resolution:
Load the images manually.
Navigate to the Upgrade folder, and use the following command:
docker load -i images.tar
Perform CPU Readiness Check
Action:
To monitor the VM host perform a CPU readiness check to ensure the percentage time the VM was ready, but could not run on the physical CPU.
To view information about the VM readiness, use the following steps:
1. | Open ESXi. |
2. | Select the host where the VMS are deployed. |
3. | Naviage to the Monitor pane. |
4. | Select Performance > Advanced > Real-time. |
5. | Edit Chart Options. |
a. | For Counters, select Readiness. |
b. | For Chart Type, select Stacked Graph per VM. |
c. | For Select object for this chart, select 5 nodes cluster. |
6. | Click OK. |
A graph will be created, for example:
Diagnosis:
Assess the results detailed on your graph with the analysis information in the table below. These details are based on a machine with 16 virtual cores.
Ref |
CPU Summation | CPU Readiness | Status |
---|---|---|---|
1 | 15,000 ms | 4.69% | Green - no problem present |
2 | 16,000 ms | 5.00% | Amber - application performance may be impacted |
3 | 32,000 ms | 10.00% | Red - application performance may be impacted |
Resolution:
If your status is not green, continue with other checks.
Co-Stop
Action:
Determine the time that the VM is ready to run, but unable to due to co-scheduling constraints.
To view information about the VM readiness, use the following steps:
1. | Open ESXi. |
2. | Select the host where the VMS are deployed. |
3. | Naviage to the Monitor pane. |
4. | Select Performance > Advanced > Real-time. |
5. | Edit Chart Options. |
a. | For Counters, select Co-Stop. |
b. | For Chart Type, select Stacked Graph per VM. |
c. | For Select object for this chart, select 5 nodes cluster. |
6. | Click OK. |
A graph will be created, for example:
Diagnosis:
The Co-Stop value is the amount of time an SMP VM was ready to run and incurred a delay due to co-vCPU scheduling contention.
Assess the results detailed on your graph.
Resolution:
When the result of running #ESXi is less than 3%, or vSphere monitor is less than 10 ms during the time period assessed, continue with other checks.
Comments
0 comments
Please sign in to leave a comment.