Quick Start Guide

A Grafana/Prometheus/Influxdb installation has been set up with the purpose to monitor the DAFNE Control System (DCS).

URL: https://dashboard.lnf.infn.it/prod/grafana/
username: <aai user>
password: <aai password>

Fig. 1 - Login page

Below the main page of grafana:

Fig. 2 - main page

From the main page, click on the left corner the "Home" button to display the folders of projects and the recent dashboard open.

Below the home page:

Fig. 3 - home page

In the red circles, there are the folders of the projects or the recent dashboards open.

Fig. 4 - Example of dashboard

Configuration Guide

Having different systems to monitor, with different OSes (Solaris 9, Linux Centos 3-8, Windows), is needed to found a performance tool/agent or methodology to monitoring the devices.

Prometheus "collector"

The recent linux OSes, as CentOS 6-8, or Windows could use prometheus for store the data of metrics. Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Prometheus fundamentally stores all data as time series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions. Besides stored time series, Prometheus may generate temporary derived time series as the result of queries. (see https://prometheus.io/docs/introduction/overview/)

CentOS 6-8

For CentOS 6-8 exists "node-exporter" agent. (see the page https://prometheus.io/docs/guides/node-exporter/). The guide show, the agent is possible installed that manually, but with current infrastructure of "configuration management" (Foreman/Puppet/Ansible) is possible automating the process by enable the specific module.

Puppet code for node_exporter

 prometheus::node_exporter:
    proxy_server: http://squid.lnf.infn.it:3128
    service_enable: true

Windows

For Windows exists a Prometheus Agent similar to node_exporter for Linux calling windows_exporter (see the page https://github.com/prometheus-community/windows_exporter/releases).

Prometheus

The node_exporter and the windows_exporter required a Prometheus server installed. On the vldantemon001.lnf.infn.it is already installed.

Prometheus configuration

To read machine metrics, Prometheus must be enabled. The Foreman machine is ready to do so. Below the configuration in foreman/puppet:

Foreman code for prometheus

 scrape_configs:
    - job_name: mongodb_prod
      scrape_interval: 10s
      scrape_timeout: 10s
      static_configs:
      - targets:
        - mongo01.chaos.lnf.infn.it:9100
    - job_name: DCS
      scrape_interval: 10s
      scrape_timeout: 10s
      static_configs:
      - labels:
          instance: vldantedev014
        targets:
        - 192.168.198.114:9100
      - labels:
          instance: vldantedev001
        targets:
        - 192.168.198.101:9100
    - job_name: devil_win
      scrape_interval: 10s
      scrape_timeout: 10s
      static_configs:
      - labels:
          instance: vwdantedev002
        targets:
        - 192.168.198.160:9182
  .......

Alias

In this Foreman yaml code, using "Labels" for each target to rename the machine inside the grafana dashboard, instead of "ip:port"

Influxdb "collector"

Node_exporter is not compatible with older operating systems, such as CentOS 3-5 or Solaris 9. In this case, it was decided to use a "call" curl to send data metrics within influxdb. This method is reliable and is very easy to implement in several old systems or on raw devices where only the curl command is present. Influxdb, similary to Prometheus, is the open source time series database that is part of the TICK (Telegraf, InfluxDB, Chronograf, Kapacitor) stack. It is designed to handle high write and query loads and provides a SQL-like query language called InfluxQL for interacting with data.

The metrics are collected with metrics.sh (https://github.com/pstadler/metrics.sh). Pstadler / metrics.sh is a discontinued project, but it offers the basis for creating a simple metrics collector in older systems. On the other hand, metrics.sh is a lightweight metrics collection and forwarding daemon implemented in portable POSIX compliant shell scripts. A transparent interface based on hooks enables writing custom collectors and reporters in an elegant way.

Metrics.sh is compatible, with a few fix, in CentOS 3-5 but needs some fixes for Solaris 9. On Solaris machines, you can only install metrics.sh manually, because the configuration manager agent is not compatible. Instead, on CentOS 3-5 the process could be automated with Foreman / Puppet / Ansible, but currently no role (Ansible) or module (Puppet) has been created to do so.

The fork of matrics.sh for Solaris 9 and CentOS 3 are active on:

Solaris 9: https://baltig.infn.it/chaos-lnf-control/metrics.sh-solaris9

CentOS 3: https://baltig.infn.it/chaos-lnf-control/metrics.sh-centos3

Solaris 9

Install metrics.sh on Solaris 9 (see: https://baltig.infn.it/chaos-lnf-control/metrics.sh-solaris9):

# Install metrics.sh at /opt/metrics.sh
$ mkdir /opt; cd /opt
$ git clone https://baltig.infn.it/chaos-lnf-control/metrics.sh-solaris9.git metrics.sh
$ cd metrics.sh
# Without git, git clone into other machine and copy directory in solaris 9 machine
# Install the service
$ ln $PWD/init.d/metrics.sh /etc/init.d/metrics.sh
$ ln /etc/init.d/metrics.sh /etc/rc3.d/Smetrics.sh
$ ln /etc/init.d/metrics.sh /etc/rc0.d/Kmetrics.sh
# Copy config file
$ mkdir /etc/metrics.sh;
$ cp /opt/metrics.sh/metrics.ini /etc/metrics.sh/metrics.ini
# At this point you should edit your config file at
# /etc/metrics.sh/metrics.ini

# Start service
$ /etc/init.d/metrics.sh start

# If run with the default configuration where reporter is 'stdout', metrics
# will be written to /var/log/metrics.sh.log. Be aware that this file will
# grow fast.
$ tail -f /var/log/metrics.sh.log

# Stop service
$ /etc/init.d/metrics.sh stop

# Check service status
$ /etc/init.d/metrics.sh status

CentOS 3:

Install metrics.sh on CentOS 3:

# Install metrics.sh at /opt/metrics.sh
$ mkdir /opt; cd /opt
$ git clone https://baltig.infn.it/chaos-lnf-control/metrics.sh-solaris9.git metrics
$ cd metrics.sh
# Install the service
$ ln -s $PWD/init.d/metrics.sh /etc/init.d/metrics.sh
# Create a config file
$ mkdir /etc/metrics.sh && chmod 600 /etc/metrics.sh
$ ./metrics.sh -C > /etc/metrics.sh/metrics.ini
# At this point you should edit your config file at
# /etc/metrics.sh/metrics.ini

# Start service
$ service metrics.sh start

# If run with the default configuration where reporter is 'stdout', metrics
# will be written to /var/log/metrics.sh.log. Be aware that this file will
# grow fast.
$ tail -f /var/log/metrics.sh.log

# Stop service
$ service metrics.sh stop

# Check service status
$ service metrics.sh status

Creation DB into influxdb:

curl -i -XPOST http://vldantedbn001.lnf.infn.it:8086/query --data-urlencode "q=CREATE DATABASE metrics"

Customized file.ini to write data into influxdb:

metrics.ini

....... 
[reporter influxdb]
;Send data to InfluxDB.
INFLUXDB_API_ENDPOINT=vldantedbn001.lnf.infn.it:8086/write?db=metrics
INFLUXDB_SEND_HOSTNAME=true
.......

Solaris file ini

In solaris, use the ip, instead of FQDN as endpoint for influxdb: (ex: 192.168.192.15:8086/write?db=metrics )

GRAFANA:

Before to import the dashboard, yum must configure the Data Source, where is store the metrics. In this case, we need of two Data Sources:

1) Prometheus (for new machines)

2) Influxdb (for old machine or raw devices)

Click on Configuration → Data Sources and find influxdb or/and prometheus:

Fig 5: Configuration → Data Sources

Search data source:

Fig 6: Search data source

Select and install which need for us.

Configure Data Source:

Fig 7: Configure Influxdb

Push "Save and Test". If it's ok, grafana show:

Fig 8: Answer: Data Source is ok

Do the same procedure for Prometheus Data Source:

Fig 9: Configure Prometheus

Prometheus URL

Beware: Prometheus server in localhost:9090

Now, it's possible import dashboard, corresponding to your scope: Create → Import

Fig 10: Import Dashboard

And chose the method to load:

Fig 10: Load Dashboard

The dashboard for Solaris or VMIC (centOS 3.9) are inside the git repository:

VMIC: https://baltig.infn.it/chaos-lnf-control/metrics.sh-centos3/-/blob/master/VMIC-metrics.json

Solaris: https://baltig.infn.it/chaos-lnf-control/metrics.sh-solaris9/-/blob/master/Solaris-metrics.json

The dashboard for node-exporter or other type of system, should be ready on:

https://grafana.com/grafana/dashboards

Change the variable "datasource": "InfluxDBMetrics" inside the json file: "datasource": "<name Influxdb Data Source>" :

DashBoard json file

     {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "InfluxDB-DCS",
      "fieldConfig": {
        "defaults": {
          "custom": {}
        },
        "overrides": []
      },

"Save Dashboard"

Open DashBoard:

Fig 11: Dashboard running

Attention

In this figure only dante057 was migrated in new influxdb.

Space shortcuts

Page tree

Quick Start Guide

Configuration Guide

Prometheus "collector"

CentOS 6-8

Windows

Prometheus configuration

Influxdb "collector"

Solaris 9

CentOS 3:

Creation DB into influxdb:

Customized file.ini to write data into influxdb:

GRAFANA:

Space shortcuts

Page tree

Monitor Service (Grafana)

Quick Start Guide

Configuration Guide

Prometheus "collector"

CentOS 6-8

Windows

Prometheus configuration

Influxdb "collector"

Solaris 9

CentOS 3:

Creation DB into influxdb:

Customized file.ini to write data into influxdb:

GRAFANA: