Removing old Host Data from Ambari Server and Tuning the Database

Within the lifecycle of a cluster, the time will eventually come to decommission and delete some hosts from it. While Ambari Server can take care of this process within the UI, sometimes host entries are left behind on their way out of the Ambair database. This can lead to false alerts, log message warnings and false reporting metrics in Ambari Server from non-existent hosts. This increase in warnings and false reporting can also lead to poor performance from Ambari, so we will go over the tuning process for Ambari Server while we’re at it.

Example of such log warnings may look like the following;

WARN [alert-event-bus-1] AlertReceivedListener:530 - Unable to process alert yarn_nodemanager_health for an invalid service YARN and component NODEMANAGER on host myDeletedHost.cloud.local

So, how do you delete a host from Ambari that technically doesn’t exist? Glad you asked, here’s the process;

Fixing these issues will require the Ambari DB to have records altered. Please be sure to have a valid backup of your Ambari database before proceeding.

  1. Stop Ambari Server and login to your Ambari Server database. Find the host_id and host_name of the host in question;
    MariaDB [ambari]> select host_id, host_name from hosts where host_name='myDeletedHost.cloud.local';
    +---------+--------------------------+
    | host_id | host_name                |
    +---------+--------------------------+
    |    1002 | myDeletedHost.cloud.local|
    +---------+--------------------------+
  2.  Using the values from above, execute the following SQL code in this exact order to remove the host entries from the tables;
    delete from execution_command where task_id in (select task_id from host_role_command where host_id in (1002));
    delete from host_version where host_id in (1002);
    delete from host_role_command where host_id in (1002);
    delete from serviceconfighosts where host_id in (1002);
    delete from hoststate where host_id in (1002);
    delete from kerberos_principal_host WHERE host_id='myDeletedHost.cloud.local';
    delete from hosts where host_name in ('myDeletedHost.cloud.local');
    delete from alert_current where history_id in ( select alert_id from alert_history where host_name in ('myDeletedHost.cloud.local'));

    The order of operations is important here because the tables use Foreign Key Checks, to try and prevent orphaned entries. If you do you encounter issues where you cannot remove entries due to the Foreign Key checks, and you are CERTAIN you want to remove the entry anyway, you can disable them and re-enable them after your edits.

    Turn off Foreign Key Checks:

    SET foreign_key_checks = 0;

    Turn on Foreign Key Checks:

    SET foreign_key_checks = 1;
  3. You can now restart Ambari Server and the alerts/warnings should no longer be present.

If you happen to continue having Ambari Server Performance alerts after removing the old host entries, you can try Tuning Ambari Server as recommended by Cloudera/Hortonworks. You will want to perform the following;

  1. Adjust Ambari Server heap sizes in /var/lib/ambari-server/ambari-env.sh based on the table below
    +-----------------------+-----------+-----------+
    |# of Cluster Nodes	|Xmx value  |Xmn value	|
    +-----------------------+-----------+-----------+
    |100 - 400		|4 GB	    |2 GB	|
    +-----------------------+-----------+-----------+
    |400 - 800		|4 GB	    |2 GB	|
    +-----------------------+-----------+-----------+
    |800 - 1200		|8 GB	    |2 GB	|
    +-----------------------+-----------+-----------+
    |1200 - 1600		|16 GB	    |2.4 GB	|
    +-----------------------+-----------+-----------+

    In our experience for a small clusters of ~50 nodes, we have found that 4GB Xmx AND Xmn works best to keep Ambari happy.

  2. Calculate and set the Ambari Server cache size. You can calculate the cache size using the this function: server.ecCacheSize= 60 * [# of Cluster Nodes]. Add the server.ecCacheSize value to ambari.properties

    Example for a 50 node cluster: 60 * 50 = 3600

    server.ecCacheSize=3600
  3. Configure Ambari Server JDBC properties
    server.jdbc.connection-pool.acquisition-size=5
    server.jdbc.connection-pool.max-age=0
    server.jdbc.connection-pool.max-idle-time=14400
    server.jdbc.connection-pool.max-idle-time-excess=0
    server.jdbc.connection-pool.idle-test-interval=7200
  4. Adjust MySQL timeout settings. Login to the Ambari Server database and set the following variables;
    set wait_timeout 28800;
    set interactive_timeout 28800;
    set global max_connections 128;

    The Ambari Server properties “server.jdbc.connection-pool.max-idle-time” and “server.jdbc.connection-pool.idle-test-interval” must be lower than “wait_timeout” and “interactive_timeout” respectively. The settings in step 3 reflect this.

  5. Purge the Ambari DB of old historical data. A good rule of thumb is keep 3 months worth or as much/little you are comfortable with.
    # ambari-server db-purge-history --cluster-name CLUSTER_NAME --from-date 2020-12-01
    Using python  /usr/bin/python
    Purge database history...
    Ambari Server configured for Embedded Postgres. Confirm you have made a backup of the Ambari Server database [y/n] y
    Ambari server is using db type Embedded Postgres. Cleanable database entries older than 2020-12-01 will be purged. Proceed [y/n] y
    Purging historical data from the database ...
    Purging historical data completed. Check the ambari-server.log for details.
    Ambari Server 'db-purge-history' completed successfully.

    Restart Ambari Server and you should now be free of performance issues! If you found this article helpful, you can check out our other blog posts here.

Happy Hadooping!

Written by Ryan St. Louis