Page History

...

They are computed on statistics derived from the training dataset and therefore do not necessarily inform us on which features are most important to make good predictions on the held-out dataset.
They favor high cardinality features, that is features featured with many unique values. Permutation feature importance is an alternative to impurity-based feature importance that does not suffer from these flaws.

For this complexity, we will not use show it in this exercise. Learn more here.

...

A machine learning model has two types of parameters. The first type of parameters is the parameters that are learned through a machine learning model while the second type of parameters are is the hyper parameter hyperparameter that we pass to the machine learning model.

Normally we set the value for these hyper parameters hyperparameters by hand, as we did for our ANN, and see what parameters result in the best performance. However, randomly selecting the parameters for the algorithm can be exhaustive.

Also, it is not easy to compare the performance of different algorithms by randomly setting the hyper parameters hyperparameters because one algorithm may perform better than the other with a different set of parameters. And if the parameters are changed, the algorithm may perform worse than the other algorithms.

...

You can always achieve high accuracy on skewed/unbalanced datasets by predicting the most the same output (the most common one) for every input. Thus the another metricsmetric, F1 can be used when there are more positive examples than negative examples. It is defined in terms of the precision and recall as (2 precision recall) / (precision + recall). In our case, we will use a semplification simplification of this metrics metric that is the product signal*efficiency.

...

(164560, 5)
(164560, 1)
[[1.7398037e-05]
 [3.2408145e-01]
 [1.1487612e-04]
 ...
 [2.4130943e-01]
 [1.4921818e-05]
 [8.3920550e-01]]

Out[ ]:

	0
0	1.739804e-05
1	3.240815e-01
2	1.148761e-04
3	6.713818e-10
4	4.403101e-01

#Check of the input data set without y_true
df_challenge['input'].head()

Out[ ]:

	f_massjj	f_deltajj	f_mass4l	f_Z1mass	f_Z2mass
455391	37.84924	0.265574	250.71579	90.258550	92.479004
420	184.81750	2.526370	124.09136	51.867190	19.181890
577983	98.07868	1.013983	242.51971	90.871605	84.223770
588664	311.60022	3.564375	445.38763	96.455420	85.431280
694879	314.22964	1.997842	87.87457	60.768753	17.779630

# Converting the dataframe into a csv file
# Modify the 'answer.csv' string in the line code below and insert your name and the ML model trained (rf or nn)!
# Example: df_answer.to_csv('mario_rossi_rf_4mu.csv')
df_answer.to_csv('answer_2017_trial.csv')
print('Your y_pred has been created! Download it from your Drive directory!\n')
!ls -l

...

Space shortcuts

Page tree

Versions Compared

Old Version 56

New Version 57

Key