GreenOps Tracker is a project aimed at optimizing the use of computing resources. This not only reduces costs associated with server operations but also decreases CO2 emissions. Utilizing Performance Co-Pilot (PCP) and Ansible Event-Driven, this system monitors Linux servers, collects data in real-time when events occur, and automatically adjusts server configurations. Consequently, administrators can make data-driven decisions when selecting server resources. Moreover, since server resource management is automated by the system, manual intervention is minimized, preventing time loss and ensuring servers are always operated with optimal resources.
The initiative behind GreenOps Tracker is to empower server administrators with precise, real-time data to make informed decisions about resource allocation, thereby minimizing unnecessary power consumption and reducing the overall carbon footprint of data centers.
As digital infrastructures expand, the energy consumption of data centers has become a critical issue. GreenOps Tracker addresses this by providing tools that not only help in reducing operational costs but also support environmental sustainability goals.
GreenOps Tracker incorporates a dynamic system architecture featuring:
Here is the basic workflow of this system(Of course this configuration is scalable)
To further enhance GreenOps Tracker, potential future features might include:
Cisco Webex
account and a room that can receive notifications from the Ansible server.$ git clone https://github.com/mahoutukaisali/GreenOps-Tracker
Here are the steps to build Ansible Event-Driven server using a container image.
$ podman pull quay.io/ansible/ansible-rulebook
quay.io/ansible/ansible-rulebook
image, mounts a project directory, and listens to port 5000. You need to replace /home/vagrant/work/GreenOps-Tracker
with an actual project directory.$ sudo podman run -p 5000:5000 -v /home/vagrant/work/GreenOps-Tracker:/work:z quay.io/ansible/ansible-rulebook ansible-rulebook --rulebook /work/pcpEventRules.yml -i /work/inventory.yml --verbose
192.168.56.4
with your actual Ansible Event-Driven`s host address.$ curl -H 'Content-Type: application/json' -d "{\"message\": \"Ansible is super cool\"}" 192.168.56.4:5000
$ sudo podman run -p 5000:5000 -v /home/vagrant/work/GreenOps-Tracker:/work:z quay.io/ansible/ansible-rulebook ansible-rulebook --rulebook /work/pcpEventRules.yml -i /work/inventory.yml --verbose
2024-05-30 13:29:30,306 - ansible_rulebook.app - INFO - Starting sources
2024-05-30 13:29:30,306 - ansible_rulebook.app - INFO - Starting rules
2024-05-30 13:29:30,307 - drools.ruleset - INFO - Using jar: /app/.local/lib/python3.9/site-packages/drools/jars/drools-ansible-rulebook-integration-runtime-1.0.6-SNAPSHOT.jar
2024-05-30 13:29:31 039 [main] INFO org.drools.ansible.rulebook.integration.api.rulesengine.AbstractRulesEvaluator - Start automatic pseudo clock with a tick every 100 milliseconds
2024-05-30 13:29:31,058 - ansible_rulebook.engine - INFO - load source ansible.eda.webhook
2024-05-30 13:29:31,486 - ansible_rulebook.engine - INFO - loading source filter eda.builtin.insert_meta_info
2024-05-30 13:29:31,881 - ansible_rulebook.engine - INFO - Waiting for all ruleset tasks to end
2024-05-30 13:29:31,881 - ansible_rulebook.rule_set_runner - INFO - Waiting for actions on events from Listen for RHEL Performance Co-Pilot events
2024-05-30 13:29:31,882 - ansible_rulebook.rule_set_runner - INFO - Waiting for events, ruleset: Listen for RHEL Performance Co-Pilot events
2024-05-30 13:29:31 881 [drools-async-evaluator-thread] INFO org.drools.ansible.rulebook.integration.api.io.RuleExecutorChannel - Async channel connected
2024-05-30 13:32:26,946 - aiohttp.access - INFO - 192.168.56.3 [30/May/2024:13:32:26 +0000] "POST / HTTP/1.1" 200 150 "-" "curl/8.2.1"
2024-05-30 13:32:26 949 [main] INFO org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation threshold set to 90%
2024-05-30 13:32:26 949 [main] INFO org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory check event count threshold set to 64
2024-05-30 13:32:26 950 [main] INFO org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Exit above memory occupation threshold set to false
** 2024-05-30 13:32:27.002094 [debug] ******************************************
Received: {'message': 'Ansible is super cool'}
********************************************************************************
The installation process follows the official document:
https://ansible.readthedocs.io/projects/rulebook/en/stable/installation.html
$ sudo dnf --assumeyes install java-17-openjdk python3-pip
$ export JAVA_HOME=/usr/lib/jvm/jre-17-openjdk
$ pip3 install ansible ansible-rulebook ansible-runner
Install ansible.eda collection:
$ ansible-galaxy collection install ansible.eda
Open inventory.yml
file of this project and modify IP addresses and sudo password
. Below is the current file contents. You can replace 192.168.57.10/192.168.57.11
with your pcp server's IP and replace ansible_become_pass: changeme
with your sudo password
.
---
all:
children:
pcp_servers:
hosts:
192.168.57.10:
ansible_become_pass: changeme
192.168.57.11:
ansible_become_pass: changeme
Next, open vars/webhook.ymlfile and replace
CHANGEME` part with your ansible server's actual IP address.
---
webhook_endpoint: "http://CHANGEME:5000/endpoint"
Next, open playbooks/vars/webex.yml
file and replace changeme
with the actual values. room_id
is your Cisco Webex room ID, token
is your Cisco Webex account's token.
---
room_id: changeme
token: changeme
For further information about Cisco Webex, please refer to the official documentation: https://developer.webex.com/docs
Now you can launch pre_setup.yml
playbook on the Ansible server. This will install all the necessary packages on the pcp servers to run this project.
$ ansible-playbook pre_setup.yml
Now you can try GreenOps Tracker
project.
$ ansible-rulebook --rulebook pcpEventRules.yml -i inventory.yml --verbose
We have already installed stress-ng
package in pcp servers using pre_setup.yml
playbook at the Installation
step, so in this example, we can use stress-ng
to put a load on the pcp server. You can load one of the pcp servers referring to the below example command.
example command:
$ stress-ng --cpu $(nproc) --vm 2 --fork 8 --switch 4 --timeout 120s -v
Note: If the above command is not enough parameter to load a pcp server, try increasing the numbers to match the specifications of your server.
Once the Ansible server receives a webhook from a pcp server, one of a playbooks located under playbooks
directory will be launched. In the following execution output example, a playbook cpu.load_average.yml
is launched by Ansible Event-Driven to a pcp host 192.168.57.10
because Ansible Event-Driven can regocnize the source of a received webhook and a webhook includes High 1-minute load average
as a message.
2024-05-04 14:28:34,577 - aiohttp.access - INFO - 192.168.57.10 [04/May/2024:14:28:34 +0000] "POST /endpoint HTTP/1.1" 200 159 "-" "curl/8.2.1"
PLAY [EDA response to PMIE rule cpu.load_average] ******************************
TASK [Gathering Facts] *********************************************************
ok: [192.168.57.10]
TASK [ansible.builtin.include_vars] ********************************************
ok: [192.168.57.10]
TASK [Display ansible_eda.event.payload.pcp.pmie.message variable] *************
ok: [192.168.57.10] => {
"msg": "ansible_eda.event.payload.pcp.pmie.message value: 11.2load@192.168.57.10"
}
TASK [Display ansible_eda.event.payload.pcp.pmie.rule variable] ****************
ok: [192.168.57.10] => {
"msg": "ansible_eda.event.payload.pcp.pmie.rule value: High 1-minute load average"
}
TASK [Collect uptime data to check CPU load average over the past minutes] *****
changed: [192.168.57.10]
TASK [Make /opt writable directory] ********************************************
ok: [192.168.57.10]
TASK [Save uptime result] ******************************************************
changed: [192.168.57.10]
TASK [Cisco Webex Teams - Text Message to a Room] ******************************
ok: [192.168.57.10]
PLAY RECAP *********************************************************************
192.168.57.10 : ok=8 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Additionally, you can see a message sent from Ansible Playbook in Cisco Webex room. In this screenshot, you can see uptime
command result executed on a pcp host 192.168.57.10
. It's executed by Ansible Playbook. This allows you to check the state of the uptime when the event occurred. You won't miss real-time data.
In the above Usage
section, cpu.load_average.yml
playbook is launched by Ansible Event-Driven because an event matching one of the pcp rules cpu.load_average
. In this project, there are more playbooks. Let me explain more 2 playbook files.
playbooks/filesys.filling.yml
This file is executed by Ansible EventDriven when an event defined in the PCP rule "filesys.filling" occurs. This playbook not only collects the state of the file system but also performs tasks to delete files in the /tmp directory of the monitored server (PCP server) if they haven't been used in the past 7 days. This allows for automatic resolution of a full file system state. Not only does it prevent troubles, but it also saves resources by removing unnecessary files.
Applicable parts in playbooks/filesys.filling.yml
:
- name: Collect file system space usage
command: df -h
register: result
- name: Make /opt writable directory
ansible.builtin.file:
path: /opt
state: directory
mode: '0777'
- name: Save `df -h` command result
ansible.builtin.copy:
content: "{{ result.stdout }}"
dest: "/opt/filesys.filling_{{ lookup('pipe', 'date +%Y%m%d%H%M%S') }}.txt"
- name: Delete unused files for 7 days
command: find /tmp -type f -atime +7 -delete
- name: Cisco Webex Teams - Text Message to a Room
community.general.cisco_webex:
recipient_type: roomId
recipient_id: "{{ room_id }}"
msg_type: markdown
personal_token: "{{ token }}"
msg: "pcp `filesys.filling` rule has been launced from the managed host {{ ansible_eda.event.payload.pcp.pmie.hostname }}. Please check this host.\nHere is `df -h` result.\n{{ result.stdout }}"
playbooks/memory.exhausted.yml
In cases where performance-related issues are not consistently occurring but rather sporadic, there may be a need to gather information by executing commands immediately when the issue arises. For example, to identify a process consuming excessive memory. This playbook will run ps aux
command and send that result to Cisco Webex room.
Applicable parts in playbooks/memory.exhausted.yml
.
- name: Investigate which processes are consuming high memory
command: ps aux --sort=-%mem | head
register: processResult
- name: Cisco Webex Teams - Text Message to a Room
community.general.cisco_webex:
recipient_type: roomId
recipient_id: "{{ room_id }}"
msg_type: markdown
personal_token: "{{ token }}"
msg: "pcp `memory.exhausted` rule has been launced from the managed host {{ ansible_eda.event.payload.pcp.pmie.hostname }}. Please check this host.\nHere is `ps aux` command result.\n{{ processResult.stdout }}"
GreenOps Tracker is not just a monitoring tool; it is a strategic asset for sustainable IT management. By providing real-time insights and automated responses, it plays a pivotal role in optimizing server performance and advancing environmental sustainability in the tech industry.
Here are the steps what's going on this video:
ansible-rulebook --rulebook pcpEventRules.yml -i inventory.yml --verbose
in the Ansible nodestress-ng --cpu $(nproc) --vm 2 --fork 8 --switch 4 --timeout 120s -v
in the managed nodeOwner
Contributors
Categories
Programming Languages
License
Code Exchange Community
Get help, share code, and collaborate with other developers in the Code Exchange community.View Community