Automatic Diagnosis and Remediation Tool

Overview

The automatic remediation tool is used to check the health of the Kubernetes cluster after the installation, patch deployments or upgrades. You can choose to perform only diagnosis or a remediation along with the diagnosis of the various Kubernetes components and the services.

In diagnosis only the logs of the issues encountered are collected and saved in a file, while in remediation, the utility attempts to fix the issues encountered during the diagnostic process.

This tool can also be used before the patch and upgrade process to determine if users can proceed with the deployment process.

Diagnosis and Remediation Process

To initiate the remediation process

  1. Navigate to the appviewx_kubernetes/scripts folder and execute the command below.
    ./appviewx.sh --remediation

    You will be prompted to perform a diagnosis or diagnosis with remediation.

  2. Enter D for diagnosis only and R for diagnosis with remediation.
    Note: To check for infra readiness use only D.
  3. Enter the password if required and continue (for passwordless applications no passwords will be asked).
  4. After the process is completed the log files are stored in the location $installer_directory/appviewx_kubernetes/logs/auto_remediation_log_files

Auto Remediation Tool Validations

The following validations are added in auto-remediation tool:

  1. Firewalld status check
    The firewalld status is checked in all the nodes. By default it should be disabled, but if found in the running state in any node, then the script throws an error.
  2. Containerd status check
    The containerd status is checked in all the nodes. By default it should be running, but if found in the not running state, then the script throws an error.
  3. Kubelet status check
    The kubelet status is checked in all the nodes. By default it should be running, but if found in the not running state, then the script throws an error.
  4. Kubectl command check
    The Kubectl command checked to see if they are working as expected in all the nodes.
  5. Hard disk space check
    The hard disk usage is checked in all the nodes. If the hard disk usage is more than 70% in any node then the script throws an warning message to free some space.
  6. Namespace check
    This check is used to find if the avx and dc namespaces are present in the cluster.
  7. Kube-system pod status check
    The pods of the kube-system namespace are checked to see if the are in the running state.
  8. Mongodb pod status check
    The mongodb pods are checked to see if the are in the running state.
  9. Istio-system pod status check
    The pods of the Istio-system namespace are checked to see if the are in the running state.
  10. Config-server pod status check
    The config-server pods are checked to see if the are in the running state.
  11. Consul server status check
    The consul server pods are checked to see if the are up; if they are up it then checks if the consul server leader is present.
  12. Active OpenBao status check
    The active vault pods are checked to see if the are up; if they are up it then checks if the active vault is present.
  13. Ephemeral OpenBao status check
    The ephemeral vault pods are checked to see if the are up; if they are up it then checks if the active vault is present.
  14. DC namespace’s pod status check
    This checks the pod status of all DC namespaces to see if are up and running.
  15. Calico status check
    It checks to see if the calico is working as expected.
  16. Istio proxy status check
    It checks to see if the istio proxy is working as expected.
  17. SElinux status check
    This checks to see if the SElinux status is as expected. It should be either permissive or disabled
  18. Proxy check
    This checks for the proxy status. It should disabled by default.
  19. Plugin helm chart check
    This checks if helm charts are present for the plugins which are added in the ENABLED_PLUGINS parameter of appviewx.conf file.
  20. Infra helm chart check
    It checks if helm charts are present for the required infra components.
  21. Mongo URL check
    It checks if the Mongo URLs are properly updated in avx-common-config config map of the avx and DC namespaces. If they are not then should be updated with the proper values.
  22. OpenBao URL check
    It checks if the Vault URLs are properly updated in avx-common-config config map of the avx and DC namespaces.
  23. Database password check
    It checks if the database password is properly updated in avx-common-config config map of the avx and DC namespaces.
  24. Super User password check
    It checks if the super user password is properly updated in avx-common-config config map of the avx and DC namespaces.
  25. Collect TCPDUMP logs
    It collects the TCPDUMP logs for all the servers in the cluster.
  26. Checking Registered VMs in gateway config
    It checks if the VMs (Pod URL) are accurate in the Registered VMs parameter of the Gateway config map.
The following checks can be automatically remediated with the command
appviewx.sh --remediation -R
  • Firewalld status check
  • Containerd status check
  • Kubelet status check
  • Kubectl command check
  • Mongo URL check