Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • They are computed on statistics derived from the training dataset and therefore do not necessarily inform us on which features are most important to make good predictions on the held-out dataset.
  • They favor high cardinality features, that is features featured with many unique values. Permutation feature importance is an alternative to impurity-based feature importance that does not suffer from these flaws.

For this complexity, we will not use show it in this exercise. Learn more here.

...

A machine learning model has two types of parameters. The first type of parameters is the parameters that are learned through a machine learning model while the second type of parameters are is the hyper parameter hyperparameter that we pass to the machine learning model.

Normally we set the value for these hyper parameters hyperparameters by hand, as we did for our ANN, and see what parameters result in the best performance. However, randomly selecting the parameters for the algorithm can be exhaustive.

Also, it is not easy to compare the performance of different algorithms by randomly setting the hyper parameters hyperparameters because one algorithm may perform better than the other with a different set of parameters. And if the parameters are changed, the algorithm may perform worse than the other algorithms.

...

You can always achieve high accuracy on skewed/unbalanced datasets by predicting the most the same output (the most common one) for every input. Thus the another metricsmetric, F1 can be used when there are more positive examples than negative examples. It is defined in terms of the precision and recall as (2 precision recall) / (precision + recall). In our case, we will use a semplification simplification of this metrics metric that is the product signal*efficiency.

...

(164560, 5)
(164560, 1)
[[1.7398037e-05]
 [3.2408145e-01]
 [1.1487612e-04]
 ...
 [2.4130943e-01]
 [1.4921818e-05]
 [8.3920550e-01]]

Out[ ]:



0
01.739804e-05
13.240815e-01
21.148761e-04
36.713818e-10
44.403101e-01
#Check of the input data set without y_true
df_challenge['input'].head()


Out[ ]:


f_massjj

f_deltajj

f_mass4lf_Z1massf_Z2mass
45539137.849240.265574250.7157990.25855092.479004
420184.817502.526370124.0913651.86719019.181890
57798398.078681.013983242.5197190.87160584.223770
588664311.600223.564375445.3876396.45542085.431280
694879314.229641.99784287.8745760.76875317.779630
# Converting the dataframe into a csv file
# Modify the 'answer.csv' string in the line code below and insert your name and the ML model trained (rf or nn)!
# Example: df_answer.to_csv('mario_rossi_rf_4mu.csv')
df_answer.to_csv('answer_2017_trial.csv')
print('Your y_pred has been created! Download it from your Drive directory!\n')
!ls -l

...