Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Way #1: Use Jupyter notebooks from an INFNCloud INFN Cloud Virtual machine

Once you have a deployment with JupyterHub on INFN Cloud, you can open a terminal and clone this github repo. Then execute the notebook in

...

The input file can be found in zipped form in the github repo (message_example.zip), or read in from Minio.

Way #2: Execute from any machine with JupyterHub

Open a terminal and clone this github repo. Then execute the notebook in

TestsINFNCloud/test_clusterLogs/NLP_example.ipynb

Annotated Description

The error message analysis, and in general any text analysis, can be divided in two main phases: pre-processing, made of Data preparation and Tokenization, and processing, specific of the analysis. Before going deeper into the description of each phase, we have to define what we mean for similarity. Indeed this is a key concept for our purpose, being the metric that will regulate the clustering. In our approach we map the words in numeric vectors so that similarity of x and y can be expressed, for instance, by the cosine of the angle between the corresponding vectors.

...