CML System Maintenance

In order to facilitate completion of system maintenance tasks, especially in multi-user and cluster CML instances, administrators can put the CML system into Maintenance Mode. The main distinction in behavior during system maintenance is that non-administrative user accounts are locked out of using the UI or API for the duration of maintenance operations performed by admins.

Additionally, the Admission State of individual compute hosts, whether the standalone controller or all compute member hosts in a cluster, can be controlled to temporarily or permanently disable them. The disablement ranges from disallowing node starts on a given host to their decommissioning and removal.

Since release 2.6.0, both setting up maintenance mode and a partial compute disablement are actions required before the CML system upgrade process can run. Upgrades from previous released versions will put the upgraded system into such compliant state.

Maintenance Mode

Whenever administrators of a CML instance decide that they need to temporarily disallow use by regular users, they can enable Maintenance Mode.

If circumstances allow, you schedule the work in advance, and let the users know, possibly by setting up a CML Notices explaining the timeline, and what the users need to prepare (e.g., save their work and stop their labs). The administrator may check which users have read and acknowledged the notice before proceeding.

One such notice can be selected to show up on the UI Login page when maintenance mode is activated.

Procedure


Log into the CML UI as a user with administrator privileges.

Click the Tools ‣ System Administration menu item.

On the System Administration page, click System Maintenance.

On the System Maintenance page, switch Maintenance Mode on or off.

(Optional) Select a Login Notice from the dropdown, or clear the field.

Click Save to apply changes.

Compute Hosts Admission States

In both Standalone All-in-One and Cluster CML instances, the Admission State of each host, including the controller, regulates the host’s availability. There are four states:

READY

The compute host is enabled for all operations and normal use, the goal state.

ONLINE

The compute host is ready but node starts are not allowed.

Use this state for hosts to prevent starting nodes, both new and stopped while deployed on this host. This is useful if the host is under minor maintenance, or to help clear it of all deployed nodes.

REGISTERED

The compute host is registered, but intentionally disconnected. ONLINE state to verify its health, and to READY state to use it.

No statistics are collected on this host, including resource consumption. All lab nodes on this host are marked in DISCONNECTED state. You can wipe them to abandon the nodes in the same manner as hosts disconnected by failure.

Abandoned nodes may be restarted on other available compute hosts; the original node data will be removed once the host is reconnected. Only the controller host can start External Connector and Unmanaged Switch nodes.

UNREGISTERED

Prepare the host for decommissioning. The controller host cannot be unregistered. Compute hosts in this state cannot be put back into any other state, and must go through the removal process, but they may be re-added later.

Note

The initial state for new compute hosts which join a CML cluster is READY, i.e., the hosts are immediately made available for node starts as they register. In some cases this is not desired, e.g., when some further customization is required after the compute host is set up and registers itself with the controller. This initial state can be set to REGISTERED by using the compute hosts configuration API call, documented and made available for requests via the API Documentation –> System section.

Procedure


Log into the CML UI as a user with administrator privileges.

Click the Tools ‣ System Administration menu item.

On the System Administration page, click Compute Hosts.

On the Compute Hosts page, check to select one or more Compute Hosts from the table.

Click on the Change Admission State dropdown and select the desired admission state.

Decommissioning a Compute Member Host

In a CML cluster instance, compute member hosts may be removed, e.g., when their underlying hardware is broken or being repurposed for other usage.

The controller host can not be decommissioned or replaced in this manner; a new CML cluster must be deployed in case the controller host is damaged.

A once-decommissioned compute host may later re-join the same cluster if needed, e.g., when the host is repaired, to handle increased lab workloads, or if the host was removed by accident. On a compute host with the same CML release version, and its configuration preserved, this process is automatic whenever the host reconnects into the cluster network.

Any existing lab nodes on the decommissioned hosts must be wiped beforehand.

Procedure


Log into the CML UI as a user with administrator privileges.

Click the Tools ‣ System Administration menu item.

On the System Administration page, click Compute Hosts.

Verify that the intended compute hosts report zero nodes deployed on them. You can use the Node Administration subpage to review node deployments.

If the compute hosts are currently connected, you can put them into ONLINE state to avoid new nodes from appearing, then stop and wipe all lab nodes. Nodes on disconnected compute hosts can and must be wiped as well.

On the Compute Hosts page, check to select one or more Compute Hosts from the table, which have zero deployed nodes. Do not select the controller host.

Click on the Decommission button. This will put the computes into the UNREGISTERED Admission State.

(Optional) Shut down the compute hosts using any of the usual methods, e.g., console command sudo shutdown -h now, or by using the CIMC or VMware UI.

Disconnect or otherwise prevent the compute hosts from connecting to the controller over the cluster network, either physically or by using the CIMC or VMware UI.

On the Compute Hosts page, check to select one or more Compute Hosts from the table which are in the UNREGISTERED Admission State.

Click on the Remove button and confirm the removal.

Troubleshooting Duplicate Compute Hosts

During CML cluster deployment, two compute hosts with the same hostname may be accidentally set up. This is not supported, and will result in a broken cluster.

One of such compute hosts must be decommissioned. To help identify the individual compute hosts in the Compute Hosts table, the ID column can be used. The following procedure can be used to find a compute host’s ID value.

Note

A particular sub-class of this problem is the result of cloning a member host VM after it was first started. The cloned VM will have the same ID value, and this value is assigned on boot before the initial configuration dialogues are shown. Other configuration is also created during this time, and interrupting these processes will result in a broken instance as well. Changing the ID value in configuration file mentioned in the procedure is unsupported, and is not the only place containing this value.

In such cases, decommission both the original and cloned instances.

Procedure


Log into the System Administration Cockpit as the system administrator account. See Logging into the System Administration Cockpit.

On the System Administration page, click Compute Hosts.

Enter the command grep COMPUTE_ID /etc/default/virl2. Compare the printed value with the items in the Compute Hosts table when selecting nodes in the decommissioning procedure.

Related Information