You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Author(s)

NameInstitutionMail AddressSocial Contacts
Leonardo GianniniINFN Sezione di Pisa / UCSDleonardo.giannini@cern.ch
Tommaso BoccaliINFN Sezione di Pisatommaso.boccali@pi.infn.itSkype: tomboc73; Hangouts: tommaso.boccali@gmail.com

How to Obtain Support

General Information

ML/DL TechnologiesLSTM
Science FieldsHigh Energy Physics
DifficultyLow
LanguageItalian
Typefully annotated and runnable

Software and Tools

Programming LanguagePython
ML ToolsetKeras + Tensorflow
Additional librariesuproot
Suggested EnvironmentsINFN-Cloud VM, bare Linux Node, Google CoLab

Needed datasets

Data CreatorCMS Experiment
Data TypeSimulation
Data Size1 GB
Data SourceINFN Pandora

Short Description of the Use Case

Jets originating form b quarks have peculiar characteristics that one can exploit to discriminate them from jets originating from light quarks and gluons, and to better reconstruct their momentum.  Both tasks have been dealt with using ML and are now tackled with Deep Learning techniques.

Two original Deep Learning applications, both involving b quark jets, are described in this chapter.

The first application described is the momentum regression, the second one is b-tagging algorithm which aims at processing lower level data, and lets a DNN learn the secondary vertex information.

A combination of this tagger, called "DeepVertex", with another state-of-the-art tagger, called "DeepJet", which aims at a single particle and secondary vertex description of the jet, is also presented.

Both the regression and the DeepVertex tagger improve on the previously developed benchmark algorithms applied in Physics Analysis by the CMS collaboration.

Furthermore, the combination of "DeepVertex" and "DeepJet" reaches unprecedented performance in simulation.

The DNN regression was developed together with ETH collaborators working on the search for Higgs pairs, but was deployed in data specifically for the VH(b \bar b) analysis. The DeepJet algorithm was developed in parallel to DeepVertex by other groups, while DeepVertex and the combinations are presented in the thesis for the first time.

This hands on is largely inherited from an Hands-on by Leonardo Giannini (SNS-CMS) for the Scientific Data Analysis School at Scuola Normale, November 2019.

A complete explanation of the tutorial is available for download here.

How to execute it

Way #1: Use Googe Colab 

Google's Colaboratory (https://colab.research.google.com/) is tool offered by Google which offers a Jupyter like environment on a Google hosted machine, with some added features, like the possibility to attach a GPU or a TPU if needed.

You can access directly Colab by clicking on a .ipynb (Python Notebook) file. The notebooks for this tutorial can be found here. The .ipynb files are also available in the next attachment.

drive-download-20200317T155726Z-001.zip

The input files are on Google Drive, and are automatically loaded by the notebooks. In case, they are also available here and here.

In the following the most important excerpts are described.

Way #2: Use Python from a ML-INFN Virtual machine

Another option is to use not Colab, but a real VM as made available by ML-INFN (LINK MISSING), or any properly configured host.

In that case, you can run the python code from the following link. You will also need to provide the input files here and here.

btagging_cms.tar.gz


The tar file, to be unpacked via

tar zxvf btagging_cms.tar.gz

contains 5 python scripts with the same name as the notebooks. It can be downloaded on the virtual machine via

wget https://confluence.infn.it/download/attachments/33161478/btagging_cms.tar.gz


The 2 input files can be downloaded as
davix-get https://pandora.infn.it/public/307caa/dl/test93_0_20000.npz test93_0_20000.npz
davix-get https://pandora.infn.it/public/ec5e6a/dl/test93_20000_40000.npz test93_20000_40000.npz

Way #3: Use Jupyter notebooks from a ML-INFN Virtual machine

This way is somehow intermediate between #1 and #2. It still uses a browser environment and a Jupyter notebook, but the processing engine is on a ML-INFN instead of Google's public cloud.

You can get access to a Jupyter environment from XXX (same machine as #2), but insted from logging on that via ssh, you connect to hestname:888 and provide the password you selected at creation time. At than point, you can upload the .ipynb notebooks from drive-download-20200317T155726Z-001.zip.


Annotated Description

References

  • No labels