...
# Plot dijets mass for signal, background and NN/RF selected events plt.xlabel('massjj (GeV)') X = np.linspace(0.0,1000.,100) plt.rcParams['figure.figsize'] = (10,5) df_all['f_massjj'][(df_all['isSignal'] == 0)].plot.hist(bins=X, label='bkg',histtype='step', density=1 ) df_all['f_massjj'][(df_all['isSignal'] == 1)].plot.hist(bins=X, label='signal',histtype='step', density=1) df_sel['f_massjj'].plot.hist(bins=X, label='NN',histtype='step', density=1) df_sel_rf['f_massjj'].plot.hist(bins=X, label='RF',histtype='step', density=1) plt.title('$m_{jj}$ normalized distribution',fontsize=12,fontweight='bold', color='r') plt.legend(loc='upper right') plt.xlim(0,1000)
# Plot dijets mass for signal, background and NN/RF selected events plt.xlabel('f_mass4l (GeV)') X = np.linspace(50, 400, 100) plt.rcParams['figure.figsize'] = (10,5) df_all['f_mass4l'][(df_all['isSignal'] == 0)].plot.hist(bins=X, label='bkg',histtype='step',log=True, density=1) df_all['f_mass4l'][(df_all['isSignal'] == 1)].plot.hist(bins=X, label='signal',histtype='step',log=True, density=1) df_sel['f_mass4l'].plot.hist(bins=X, label='NN',histtype='step', log=True, density=1) df_sel_rf['f_mass4l'].plot.hist(bins=X, label='RF',histtype='step',log=True, density=1) plt.title('$mass(4\mu)$ normalized distribution',fontsize=12,fontweight='bold', color='r') plt.legend(loc='upper right') plt.xlim(50,400)
plt.xlabel('f_Z1mass (GeV)') X = np.linspace(20, 150, 100) plt.rcParams['figure.figsize'] = (10,5) df_all['f_Z1mass'][(df_all['isSignal'] == 0)].plot.hist(bins=X, label='bkg',histtype='step',log=True ,density=1) df_all['f_Z1mass'][(df_all['isSignal'] == 1)].plot.hist(bins=X, label='signal',histtype='step',log=True, density=1) df_sel['f_Z1mass'].plot.hist(bins=X, label='NN',histtype='step', log=True,density=1) df_sel_rf['f_Z1mass'].plot.hist(bins=X, label='RF',histtype='step',log=True, density=1) plt.title('$mass(Z_{1})$ normalized distribution',fontsize=12,fontweight='bold', color='r') plt.legend(loc='upper right') plt.xlim(20,150)
plt.xlabel('f_Z2mass (GeV)') X = np.linspace(0., 150, 100) plt.rcParams['figure.figsize'] = (10,5) df_all['f_Z2mass'][(df_all['isSignal'] == 0)].plot.hist(bins=X, label='bkg',histtype='step', density=1) df_all['f_Z2mass'][(df_all['isSignal'] == 1)].plot.hist(bins=X, label='signal',histtype='step', density=1) df_sel['f_Z2mass'].plot.hist(bins=X, label='NN',histtype='step', density=1) df_sel_rf['f_Z2mass'].plot.hist(bins=X, label='RF',histtype='step', density=1) plt.title('$mass(Z_{2})$ normalized distribution',fontsize=12,fontweight='bold', color='r') plt.legend(loc='upper right') plt.xlim(0.,150)
Let's do the same for some variables which we have not used during the training phase. What can you say about them?
# Plot Jet1 eta for signal, background and NN/RF selected events plt.xlabel('$\eta$(Jet1)') X = np.linspace(-5.,5.,100) plt.rcParams['figure.figsize'] = (10,5) df_all['f_jet1_eta'][(df_all['isSignal'] == 0)].plot.hist(bins=X, label='bkg',histtype='step', density=1) df_all['f_jet1_eta'][(df_all['isSignal'] == 1)].plot.hist(bins=X, label='signal',histtype='step', density=1) df_sel['f_jet1_eta'].plot.hist(bins=X, label='NN',histtype='step', density=1) df_sel_rf['f_jet1_eta'].plot.hist(bins=X, label='RF',histtype='step', density=1) plt.legend(loc='upper right') plt.title('$jet1(\eta)$ normalized distribution',fontsize=12,fontweight='bold', color='r') plt.xlim(-5,5)
# Plot Jet2 eta for signal, background and NN/RF selected events plt.xlabel('$\eta$(Jet2)') X = np.linspace(-5.,5.,100) plt.rcParams['figure.figsize'] = (10,5) df_all['f_jet2_eta'][(df_all['isSignal'] == 0)].plot.hist(bins=X, label='bkg',histtype='step', density=1) df_all['f_jet2_eta'][(df_all['isSignal'] == 1)].plot.hist(bins=X, label='signal',histtype='step', density=1) df_sel['f_jet2_eta'].plot.hist(bins=X, label='NN',histtype='step', density=1) df_sel_rf['f_jet2_eta'].plot.hist(bins=X, label='RF',histtype='step', density=1) plt.title('$jet2(\eta)$ normalized distribution',fontsize=12,fontweight='bold', color='r') plt.legend(loc='upper right') plt.xlim(-5,5)
Optional Exercise 1 - Change the decay channel
Question to students: What happens if you switch to the decay channel? You can submit your model (see the ML challenge below) for this physical process as well!
Optional Exercise 2 - Merge the backgrounds
Question to students: Merge the backgrounds used up to now for the training of our ML algos together with the ROOT File named ttH_HToZZ_4L.root. In this case you will use also the QCD irreducible background. Uncomment the correct lines of code to proceed!
Machine Learning challenge
Once you manage to improve the network (random forest) performances, you can submit your results and participate to our ML challenge. The challenge samples are available in this workspace, but the true labels (isSignal
) are removed, so that you can't compute the AUC.
- You can participate as a single participant or as a team
- The winner is the one scoring the best AUC in the challenge samples!
- In the next box, you will find some lines of code for preparing an output csv file, containing your y_predic for this new data sets!
- Choose a meaningful name for your result csv file (i.e. your name, or your team name, the model used for the training phase,and the decay channel - 4 or 4 - but avoid to submit
results.csv
) - Download the csv file and upload it here: https://recascloud.ba.infn.it/index.php/s/CnoZuNrlr3x7uPI
- You can submit multiple results, paying attention to name them accordingly (add the version number, such as
v1
,v34
, etc.) - You can use this exercise as a starting point (train over constituents)
- We will consider your best result for the final score.
- The winner will be asked to present the ML architecture!
Have fun!
### Evaluate performance on an independent sample # DO NOT CHANGE BELOW! from google.colab import files files = { "input_hl.csv":"dBHt9vsvKDUkJNt" #high level features } !rm -f *.root import os for file in files.items(): if not os.path.exists(file[0]): b = os.system ( "wget -O %s --no-check-certificate 'https://recascloud.ba.infn.it/index.php/s/%s/download'" % file ) if b: raise IOError ( "Error in downloading the file %s : (%s)" % file ) filename = {} df_challenge = {} #Open the file with dat aset without y_true (only features used for the training of the previous NN model) filename['input'] = 'input_hl.csv' df_challenge['input'] = pd.read_csv(filename['input']) print(df_challenge['input'].shape) df_challenge['input'].columns= NN_VARS X_challenge = np.asarray( df_challenge['input'].values ).astype(np.float32) ret = model.predict(X_challenge[:,0:NDIM]) print(ret.shape) print(ret) #Convert the y_pred in a dataframe df_answer= pd.DataFrame(ret) df_answer.head()
(164560, 5)
(164560, 1)
[[1.7398037e-05]
[3.2408145e-01]
[1.1487612e-04]
...
[2.4130943e-01]
[1.4921818e-05]
[8.3920550e-01]]
Out[ ]:
0 | |
---|---|
0 | 1.739804e-05 |
1 | 3.240815e-01 |
2 | 1.148761e-04 |
3 | 6.713818e-10 |
4 | 4.403101e-01 |
#Check of the input data set without y_true df_challenge['input'].head()
Out[ ]:
f_massjj | f_deltajj | f_mass4l | f_Z1mass | f_Z2mass | |
---|---|---|---|---|---|
455391 | 37.84924 | 0.265574 | 250.71579 | 90.258550 | 92.479004 |
420 | 184.81750 | 2.526370 | 124.09136 | 51.867190 | 19.181890 |
577983 | 98.07868 | 1.013983 | 242.51971 | 90.871605 | 84.223770 |
588664 | 311.60022 | 3.564375 | 445.38763 | 96.455420 | 85.431280 |
694879 | 314.22964 | 1.997842 | 87.87457 | 60.768753 | 17.779630 |
# Converting the dataframe into a csv file # Modify the 'answer.csv' string in the line code below and insert your name and the ML model trained (rf or nn)! # Example: df_answer.to_csv('mario_rossi_rf_4mu.csv') df_answer.to_csv('answer_2017_trial.csv') print('Your y_pred has been created! Download it from your Drive directory!\n') !ls -l
Your y_pred has been created! Download it from your Drive directory!
total 12884
-rw-r--r-- 1 root root 37013 Apr 22 18:11 05.08-decision-tree.png
-rw-r--r-- 1 root root 46864 Apr 22 18:10 ANN_model.h5
-rw-r--r-- 1 root root 2981984 Apr 22 18:13 answer_2017_trial.csv
-rw-r--r-- 1 root root 9160896 Apr 22 18:13 input_hl.csv
-rw-r--r-- 1 root root 44288 Apr 22 18:08 model.png
-rw-r--r-- 1 root root 902391 Apr 22 18:12 rf_model.pkl
drwxr-xr-x 1 root root 4096 Apr 21 13:39 sample_data
Upload your results here:
https://recascloud.ba.infn.it/index.php/s/CnoZuNrlr3x7uPI