Monitoring Windows servers using Prometheus — wmi_exporter
A DevOps engineer or a Site Reliability Engineer need to spend a lot of time monitoring their Windows servers.
And doing RCA on a Windows server when it goes down it not that easy task. Is it because of a high CPU usage on one of the processes?
Is the server having some memory issues? Is the RAM used too much on my Windows server?
Let’s start learning how to configure wmi_exporter in order to monitor Windows servers with Prometheus.
Prerequisites
To follow this tutorial you’ll need:
- One Linux server set up
- Prometheus 2.x installed on your server, including the Prometheus Web UI. You can find out your Prometheus version by running the
prometheus -version
command. The output contains your Prometheus version as well as build information. - PLEASE TAKE CARE OF YAML FILE INDENTATION :)
Windows Server Monitoring Architecture
The WMI exporter will run as a Windows service and it will be responsible for gathering metrics about your system
Installing the WMI Exporter
Download wmi_exporter msi installer from here -
We are using the latest version — https://github.com/martinlindhe/wmi_exporter/releases/download/v0.11.1/wmi_exporter-0.11.1-amd64.msi
Running the WMI installer
Run the msi installer and accept firewall exceptions if any.
To verify the installation go to Services panel of Windows and search for the “WMI exporter” entry in the list. Make sure the service is running properly.
Access the Windows Server metrics
Now that wmi_exporter is running, it should start exposing metrics on http://localhost:9182/metrics
Configure Prometheus in accordance to WMI exporter
Open your prometheus configuration file (mine is located at /etc/prometheus/prometheus.yml) and edit the following changes to your file at the end.
#cat /etc/prometheus/prometheus.yml
- file_sd_configs:
- files:
- /etc/prometheus/file_sd/wmi.yml
job_name: wmi_exporter
metrics_path: /metrics
scrape_interval: 5s
Add your windows targets to this file -
# cat /etc/prometheus/file_sd/wmi.yml
labels:
project: my_windows_hosts
targets:
- 192.168.0.100:9182
Configure Alert Rules -
- alert: WinHostOutOfMemory
expr: wmi_os_physical_memory_free_bytes{job=”wmi_exporter”} / wmi_cs_physical_memory_bytes{job=”wmi_exporter”} < 0.1
labels:
severity: warning
annotations:
summary: “Windows Host out of memory (instance {{ $labels.instance }})”
description: “Node memory is filling up (< 10% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}”- alert: WinHostCpuUsage
expr: 100 — (avg by (instance) (irate(wmi_cpu_time_total{job=”wmi_exporter”,mode=”idle”}[2m])) * 100) > 80
for: 10m
labels:
severity: warning
annotations:
summary: “CPU Usage (instance {{ $labels.instance }})”
description: “CPU Usage is more than 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}”
Save your file, and restart your Prometheus service.
#kill -HUP process_id
Verify your configuration -
Now you can Create a Grafana Dashboard as well and monitor the metrics.
The following Windows Node dashboard can be imported, accessible via the 2129 ID.
Hope you like the tutorial. Please let me know your feedback in the response section.
Happy learning!