Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

message_example.zip

Author(s)

NameInstitutionMail AddressSocial Contacts
Federica LeggerINFN Sezione di Torinofederica.legger@to.infn.itN/A
Micol OloccoUniversità di Torinomicol.olocco@edu.unito.itN/A

...

ML/DL Technologies

NLP

Science FieldsHigh Energy Physics, Computing
DifficultyMedium
Language

English

Typefully annotated / runnable / external resource

...

If a transfer fails, an error message is generated and stored. Failed transfers are of the order of a few hundred thousand per day. Understanding and possibly fixing the cause of failed transfers is part of the duties of the experiment operation teams. Due to the large number of failed transfers, not all can be addressed. We developed a pipeline to discover failure patterns from the analysis of FTS error logs. Error messages are read in, cleaned from meaningless parts (file paths, host names), and the text is analysed using NLP (Natural Language Processing) techniques such as word2vec. FInally the messages can be grouped in clusters based on the similarity of their text using the Levenshtein distance or using ML algorithms for unsupervised clustering such as DBSCAN. The biggest clusters and their relationship with the host names with largest numbers of failing transfers is presented in a dedicated dashboard for the CMS experiment (access to the dashboard requires login with CERN SSO). The clusters can be used by the operation teams to quickly identify anomalies in user activities, tackle site issues related to the backlog of data transfers, and in the future to implement automatic recovery procedures for the most common error types.

Image Added

How to execute it

Way #1: Use Jupyter notebooks from an INFNCloud Virtual machine

Once you have a deployment with JupyterHub on INFN Cloud, you can open a terminal and clone this github repo. Then execute the notebook in

TestsINFNCloud/test_clusterLogs/NLP_example.ipynb

The input file can be found in zipped form in the github repo (message_example.zip), or read in from Minio.

Annotated Description

References

Attachments

message_example.zip