Grafana Alerting Basics
Grafana open source is open source visualization and analytics software. It allows you to query, visualize, alert on, and explore your metrics, logs, and traces no matter where they are stored. It provides you with tools to turn your time-series database (TSDB) data into insightful graphs and visualizations.lete observability stack for monitoring and analysing metrics, logs, and traces. It enables you to query, visualise, alert on, and comprehend your data, regardless of where it is kept.
Grafana 8.0 introduces new and improved alerting, which groups alerting data into a single, searchable display. For all new OSS instances, it is activated by default.
It allows you to:
- Create and manage Grafana alerts
- Create and manage Grafana Mimir and Loki managed alerts
- View alerting information from Prometheus and Alertmanager compatible data sources
In this blog we will see basics of Grafana Alerting, how to setup alerts etc.
Alerts allow you to learn about problems in your systems moments after they occur. Robust and actionable alerts help you identify and resolve issues quickly, minimizing disruption to your services.
Alerting System
The Grafana alerting system has two main components: a Scheduler
and an internal Alertmanager
. The Scheduler
is responsible for the evaluation of your alert rules while the internal Alertmanager takes care of the routing and grouping.
Key Components
Grafana alerting has four key components:
- Alerting rule – Evaluation criteria that determine whether an alert will fire. It consists of one or more queries and expressions, a condition, the frequency of evaluation, and optionally, the duration over which the condition is met.
- Contact point – Channel for sending notifications when the conditions of an alerting rule are met.
- Notification policy – Set of matching and grouping criteria used to determine where and how frequently to send notifications.
- Silences – Date and matching criteria used to silence notifications.
Alerting Fundamentals
Alertmanager
The Alertmanager helps both group and manage alert rules, adding a layer of orchestration on top of the alerting engines.
- Grafana includes built-in support for Prometheus Alertmanager.
- By default, notifications for Grafana managed alerts are handled by the embedded Alertmanager that is part of core Grafana.
- You can configure the Alertmanager’s contact points, notification policies, silences, and templates from the alerting UI by selecting the
Grafana
option from the Alertmanager drop-down.
You can do the setup in the “Admin” tab within the Grafana v8 Alerts UI.
Add a new external Alertmanager
- In the Grafana menu, click the Alerting (bell) icon to open the Alerting page listing existing alerts.
- Click Admin and then scroll down to the External Alertmanager section.
- Click Add Alertmanager and a modal opens.
- Add the URL and the port for the external Alertmanager. You do not need to specify the path suffix, for example,
/api/v(1|2)/alerts
. Grafana automatically adds this.
State and health of alerting rules
The state and health of alerting rules help you understand several key status indicators about your alerts. There are three key components: alert state, alerting rule state, and alerting rule health.
- Normal: None of the time series returned by the evaluation engine is in a Pending or Firing state.
- Pending: At least one time series returned by the evaluation engine is Pending.
- Firing: At least one time series returned by the evaluation engine is Firing.
- Normal: Condition for the alerting rule is false for every time series returned by the evaluation engine.
- Alerting: Condition of the alerting rule is true for at least one time series returned by the evaluation engine. The duration for which the condition must be true before an alert fires, if set, is met or has exceeded.
- Pending: Condition of the alerting rule is true for at least one time series returned by the evaluation engine. The duration for which the condition must be true before an alert fires, if set, has not been met.
- NoData: the alerting rule has not returned a time series, all values for the time series are null, or all values for the time series are zero.
- Error: Error when attempting to evaluate an alerting rule.
- Ok: No error when evaluating an alerting rule.
- Error: Error when evaluating an alerting rule.
- NoData: The absence of data in at least one time series returned during a rule evaluation.
Alert evaluation
Grafana managed alerts query the following backend data sources that have alerting enabled:
- built-in data sources or those developed and maintained by Grafana:
Graphite
,Prometheus
,Loki
,InfluxDB
,Elasticsearch
,Google Cloud Monitoring
,Cloudwatch
,Azure Monitor
,MySQL
,PostgreSQL
,MSSQL
,OpenTSDB
,Oracle
, andAzure Monitor
Create and manage Grafana alerting rules
An alerting rule is a collection of criteria used to determine whether or not an alert will be triggered. One or more queries and expressions, a condition, the frequency of evaluation, and, optionally, the time over which the condition is met comprise the rule.
Add Grafana managed rule
- In the Grafana menu, click the Alerting (bell) icon to open the Alerting page listing existing alerts.
- Click New alert rule.
- In Step 1, add the rule name, type, and storage location.
- In Rule name, add a descriptive name. This name is displayed in the alert rule list. It is also the
alertname
label for every alert instance that is created from this rule. - From the Rule type drop-down, select Grafana managed alert.
- From the Folder drop-down, select the folder where you want to store the rule. If you do not select a folder, the rule is stored in the General folder. To create a new folder, click the drop-down and enter the new folder name.
- In Rule name, add a descriptive name. This name is displayed in the alert rule list. It is also the
- In Step 2, add queries and expressions to evaluate.
- Keep the default name or hover over and click the edit icon to change the name.
- For queries, select a data source from the drop-down.
- Add one or more queries or expressions.
- Click Run queries to verify that the query is successful.
- In Step 3, add conditions.
- From the Condition drop-down, select the query or expression to trigger the alert rule.
- For Evaluate every, specify the frequency of evaluation. Must be a multiple of 10 seconds. For examples,
1m
,30s
. - For Evaluate for, specify the duration for which the condition must be true before an alert fires.Note: Once a condition is breached, the alert goes into the Pending state. If the condition remains breached for the duration specified, the alert transitions to the Firing state, else it reverts back to the Normal state.
- In Configure no data and error handling, configure alerting behavior in the absence of data. Use the guidelines in No data and error handling.
- Click Preview alerts to check the result of running the query at this moment. Preview excludes no data and error handling.
- In Step 4, add additional metadata associated with the rule.
- Add a description and summary to customize alert messages.
- Add Runbook URL, panel, dashboard, and alert IDs.
- Add custom labels.
- Click Save to save the rule or Save and exit to save the rule and go back to the Alerting page.
Silences
Silences can be used to disable one or more alerting rules. Silence does not prevent the evaluation of alert rules. They also don’t prevent alerting instances from appearing in the user interface. Silences merely prevent the creation of notifications. Silence only lasts for a certain amount of time.
Conclusion
So, in this article we saw the basic architecture of alerting in grafana, components involved and how to setup alerts.