...
- They are computed on statistics derived from the training dataset and therefore do not necessarily inform us on which features are most important to make good predictions on the held-out dataset.
- They favor high cardinality features, that is features featured with many unique values. Permutation feature importance is an alternative to impurity-based feature importance that does not suffer from these flaws.
For this complexity, we will not use show it in this exercise. Learn more here.
...
A machine learning model has two types of parameters. The first type of parameters is the parameters that are learned through a machine learning model while the second type of parameters are is the hyper parameter hyperparameter that we pass to the machine learning model.
Normally we set the value for these hyper parameters hyperparameters by hand, as we did for our ANN, and see what parameters result in the best performance. However, randomly selecting the parameters for the algorithm can be exhaustive.
Also, it is not easy to compare the performance of different algorithms by randomly setting the hyper parameters hyperparameters because one algorithm may perform better than the other with a different set of parameters. And if the parameters are changed, the algorithm may perform worse than the other algorithms.
...
You can always achieve high accuracy on skewed/unbalanced datasets by predicting the most the same output (the most common one) for every input. Thus the another metricsmetric, F1 can be used when there are more positive examples than negative examples. It is defined in terms of the precision and recall as (2 precision recall) / (precision + recall). In our case, we will use a semplification simplification of this metrics metric that is the product signal*efficiency.
...
(164560, 5) (164560, 1) [[1.7398037e-05] [3.2408145e-01] [1.1487612e-04] ... [2.4130943e-01] [1.4921818e-05] [8.3920550e-01]]
Out[ ]:
0 | |
---|---|
0 | 1.739804e-05 |
1 | 3.240815e-01 |
2 | 1.148761e-04 |
3 | 6.713818e-10 |
4 | 4.403101e-01 |
#Check of the input data set without y_true df_challenge['input'].head()
Out[ ]:
f_massjj | f_deltajj | f_mass4l | f_Z1mass | f_Z2mass | |
---|---|---|---|---|---|
455391 | 37.84924 | 0.265574 | 250.71579 | 90.258550 | 92.479004 |
420 | 184.81750 | 2.526370 | 124.09136 | 51.867190 | 19.181890 |
577983 | 98.07868 | 1.013983 | 242.51971 | 90.871605 | 84.223770 |
588664 | 311.60022 | 3.564375 | 445.38763 | 96.455420 | 85.431280 |
694879 | 314.22964 | 1.997842 | 87.87457 | 60.768753 | 17.779630 |
# Converting the dataframe into a csv file # Modify the 'answer.csv' string in the line code below and insert your name and the ML model trained (rf or nn)! # Example: df_answer.to_csv('mario_rossi_rf_4mu.csv') df_answer.to_csv('answer_2017_trial.csv') print('Your y_pred has been created! Download it from your Drive directory!\n') !ls -l
...