Quick Start Guide
A Grafana/Prometheus/Influxdb installation has been set up with the purpose to monitor the DAFNE Control System (DCS).
- URL: https://dashboard.lnf.infn.it/prod/grafana/
- username: <aai user>
- password: <aai password>
Fig. 1 - Login page
Below the main page of grafana:
Fig. 2 - main page
From the main page, click on the left corner the "Home" button to display the folders of projects and the recent dashboard open.
Below the home page:
Fig. 3 - home page
In the red circles, there are the folders of the projects or the recent dashboards open.
Fig. 4 - Example of dashboard
Configuration Guide
Having different systems to monitor, with different OSes (Solaris 9, Linux Centos 3-8, Windows), is needed to found a performance tool/agent or methodology to monitoring the devices.
Prometheus "collector"
The recent linux OSes, as CentOS 6-8, or Windows could use prometheus for store the data of metrics. Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Prometheus fundamentally stores all data as time series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions. Besides stored time series, Prometheus may generate temporary derived time series as the result of queries. (see https://prometheus.io/docs/introduction/overview/)
CentOS 6-8
For CentOS 6-8 exists "node-exporter" agent. (see the page https://prometheus.io/docs/guides/node-exporter/). The guide show, the agent is possible installed that manually, but with current infrastructure of "configuration management" (Foreman/Puppet/Ansible) is possible automating the process by enable the specific module.
prometheus::node_exporter: proxy_server: http://squid.lnf.infn.it:3128 service_enable: true
Windows
For Windows exists a Prometheus Agent similar to node_exporter for Linux calling windows_exporter (see the page https://github.com/prometheus-community/windows_exporter/releases).
Prometheus
The node_exporter and the windows_exporter required a Prometheus server installed. On the vldantemon001.lnf.infn.it is already installed.
Prometheus configuration
To read machine metrics, Prometheus must be enabled. The Foreman machine is ready to do so. Below the configuration in foreman/puppet:
scrape_configs: - job_name: mongodb_prod scrape_interval: 10s scrape_timeout: 10s static_configs: - targets: - mongo01.chaos.lnf.infn.it:9100 - job_name: DCS scrape_interval: 10s scrape_timeout: 10s static_configs: - labels: instance: vldantedev014 targets: - 192.168.198.114:9100 - labels: instance: vldantedev001 targets: - 192.168.198.101:9100 - job_name: devil_win scrape_interval: 10s scrape_timeout: 10s static_configs: - labels: instance: vwdantedev002 targets: - 192.168.198.160:9182 .......
Alias
In this Foreman yaml code, using "Labels" for each target to rename the machine inside the grafana dashboard, instead of "ip:port"
Influxdb "collector"
Node_exporter is not compatible with older operating systems, such as CentOS 3-5 or Solaris 9. In this case, it was decided to use a "call" curl to send data metrics within influxdb. This method is reliable and is very easy to implement in several old systems or on raw devices where only the curl command is present. Influxdb, similary to Prometheus, is the open source time series database that is part of the TICK (Telegraf, InfluxDB, Chronograf, Kapacitor) stack. It is designed to handle high write and query loads and provides a SQL-like query language called InfluxQL for interacting with data.
The metrics are collected with metrics.sh (https://github.com/pstadler/metrics.sh). Pstadler / metrics.sh is a discontinued project, but it offers the basis for creating a simple metrics collector in older systems. On the other hand, metrics.sh is a lightweight metrics collection and forwarding daemon implemented in portable POSIX compliant shell scripts. A transparent interface based on hooks enables writing custom collectors and reporters in an elegant way.
Metrics.sh is compatible, with a few fix, in CentOS 3-5 but needs some fixes for Solaris 9. On Solaris machines, you can only install metrics.sh manually, because the configuration manager agent is not compatible. Instead, on CentOS 3-5 the process could be automated with Foreman / Puppet / Ansible, but currently no role (Ansible) or module (Puppet) has been created to do so.
The fork of matrics.sh for Solaris 9 and CentOS 3 are active on:
Solaris 9: https://baltig.infn.it/chaos-lnf-control/metrics.sh-solaris9
CentOS 3: https://baltig.infn.it/chaos-lnf-control/metrics.sh-centos3
Solaris 9
Install metrics.sh on Solaris 9 (see: https://baltig.infn.it/chaos-lnf-control/metrics.sh-solaris9):
# Install metrics.sh at /opt/metrics.sh
$ mkdir /opt; cd /opt
$ git clone https://baltig.infn.it/chaos-lnf-control/metrics.sh-solaris9.git metrics.sh
$ cd metrics.sh
# Without git, git clone into other machine and copy directory in solaris 9 machine
# Install the service
$ ln $PWD/init.d/metrics.sh /etc/init.d/metrics.sh
$ ln /etc/init.d/metrics.sh /etc/rc3.d/Smetrics.sh
$ ln /etc/init.d/metrics.sh /etc/rc0.d/Kmetrics.sh
# Copy config file
$ mkdir /etc/metrics.sh;
$ cp /opt/metrics.sh/metrics.ini /etc/metrics.sh/metrics.ini
# At this point you should edit your config file at
# /etc/metrics.sh/metrics.ini
# Start service
$ /etc/init.d/metrics.sh start
# If run with the default configuration where reporter is 'stdout', metrics
# will be written to /var/log/metrics.sh.log. Be aware that this file will
# grow fast.
$ tail -f /var/log/metrics.sh.log
# Stop service
$ /etc/init.d/metrics.sh stop
# Check service status
$ /etc/init.d/metrics.sh status
CentOS 3:
Install metrics.sh on CentOS 3:
# Install metrics.sh at /opt/metrics.sh
$ mkdir /opt; cd /opt
$ git clone https://baltig.infn.it/chaos-lnf-control/metrics.sh-solaris9.git metrics
$ cd metrics.sh
# Install the service
$ ln -s $PWD/init.d/metrics.sh /etc/init.d/metrics.sh
# Create a config file
$ mkdir /etc/metrics.sh && chmod 600 /etc/metrics.sh
$ ./metrics.sh -C > /etc/metrics.sh/metrics.ini
# At this point you should edit your config file at
# /etc/metrics.sh/metrics.ini
# Start service
$ service metrics.sh start
# If run with the default configuration where reporter is 'stdout', metrics
# will be written to /var/log/metrics.sh.log. Be aware that this file will
# grow fast.
$ tail -f /var/log/metrics.sh.log
# Stop service
$ service metrics.sh stop
# Check service status
$ service metrics.sh status
Creation DB into influxdb:
curl -i -XPOST http://vldantedbn001.lnf.infn.it:8086/query --data-urlencode "q=CREATE DATABASE metrics"
Customized file.ini to write data into influxdb:
....... [reporter influxdb] ;Send data to InfluxDB. INFLUXDB_API_ENDPOINT=vldantedbn001.lnf.infn.it:8086/write?db=metrics INFLUXDB_SEND_HOSTNAME=true .......
Solaris file ini
In solaris, use the ip, instead of FQDN as endpoint for influxdb: (ex: 192.168.192.15:8086/write?db=metrics )
GRAFANA:
Before to import the dashboard, yum must configure the Data Source, where is store the metrics. In this case, we need of two Data Sources:
1) Prometheus (for new machines)
2) Influxdb (for old machine or raw devices)
Click on Configuration → Data Sources and find influxdb or/and prometheus:
Fig 5: Configuration → Data Sources
Search data source:
Fig 6: Search data source
Select and install which need for us.
Configure Data Source:
Fig 7: Configure Influxdb
Push "Save and Test". If it's ok, grafana show:
Fig 8: Answer: Data Source is ok
Do the same procedure for Prometheus Data Source:
Fig 9: Configure Prometheus
Prometheus URL
Beware: Prometheus server in localhost:9090
Now, it's possible import dashboard, corresponding to your scope: Create → Import
Fig 10: Import Dashboard
And chose the method to load:
Fig 10: Load Dashboard
The dashboard for Solaris or VMIC (centOS 3.9) are inside the git repository:
VMIC: https://baltig.infn.it/chaos-lnf-control/metrics.sh-centos3/-/blob/master/VMIC-metrics.json
Solaris: https://baltig.infn.it/chaos-lnf-control/metrics.sh-solaris9/-/blob/master/Solaris-metrics.json
The dashboard for node-exporter or other type of system, should be ready on:
https://grafana.com/grafana/dashboards
Change the variable "datasource": "InfluxDBMetrics" inside the json file: "datasource": "<name Influxdb Data Source>" :
{ "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "InfluxDB-DCS", "fieldConfig": { "defaults": { "custom": {} }, "overrides": [] },
"Save Dashboard"
Open DashBoard:
Fig 11: Dashboard running
Attention
In this figure only dante057 was migrated in new influxdb.