Commit d81da020 authored by Antoine Guillaume's avatar Antoine Guillaume

Applying scikit-learn standards to custom wrappers

parent 0569d929
......@@ -31,8 +31,8 @@ from sklearn.preprocessing import MinMaxScaler
# In[3]:
#Base path, all necessary folders are supposed to be contained in this one.
base_path = r"!! REPLACE BY YOUR PATH !!"
#base_path = r"!! REPLACE BY YOUR PATH !!"
base_path = r"C:/Utilisateurs/A694772/Documents/ECMLPKDD_datacopy/"
#Path to the life cycles CSV files.
dataset_path = base_path+r"datasets/"
......@@ -40,7 +40,8 @@ dataset_path = base_path+r"datasets/"
result_path = base_path+r"results/"
#If not None, CSV files containing data used by the TS-CHIEF java program will be outputed
TSCHIEF_path = dataset_path+r"TSCHIEF/"
#TSCHIEF_path = dataset_path+r"TSCHIEF/"
TSCHIEF_path = None
#If True, perform cross validation of all defined pipelines
do_cross_validation = True
......@@ -58,14 +59,14 @@ predictive_padding_hours = 48
extended_infected_interval_hours = 24
#Size of the PAA transform output
size=1000
size=500
#Number of cross validation splits
n_splits=10
n_splits=2
# Number of process to launch in parallel for cross validation of each pipeline.
# Set to None if you don't have the setup to allow such speedups.
n_cv_jobs=-1
n_cv_jobs=None
if dataset_path is not None and not exists(dataset_path):
mkdir(dataset_path)
......
......@@ -34,8 +34,6 @@ Configuration parameters are located at the beginning of CV_script, you MUST cha
To change or check the algorithms parameters, they all are redefined in custom wrapper classes to avoid errors, if a parameter is not specified in the constructor, it is left as default.
The representations methods are defined inside utils.representations and the classifications methods inside utils.classifications.
ResNet is left commented in the code, so you can run the other algorithms without a Tensorflow installation or a GPU without any impact.
## Usage
Extract the files of the dataset archive located in ~/datasets in the dataset folder
......@@ -58,6 +56,25 @@ The runtime of this script is extremely long, one iteration take about 4 hours,
```bash
python TSCHIEF_results_to_csv.py
```
## Note on using sktime-dl for InceptionTime and ResNet
Both InceptionTime and ResNet are left commented in the code, so you can run the other algorithms without a Tensorflow installation or a GPU without any impact.
Depending on your installation, you might run into errors while feeding tensorflow models in a cross validation pipeline from scikit-learn. Some of those issues can be fixed by making the wrapper for those models defined in utils.classifications inheriting the KerasClassifier wrapper from tensorflow.
To make those two algorithms part of the experiments, you have to uncomment both their declaration in utils.classifications and the associated pipeline in CV_script.
About InceptionTime : sktime-dl is the package dedicated for deep learning built by the sktime authors, still being in active development at time of writing, we add to make some modifications to the source code to be able to run InceptionTime.
From the latest version available on github we applied the following modification :
* Fix import error from sktime utils : In sktime_dl/utils/_data.py, replace :
```
from sktime.utils.data_container import tabularize, from_3d_numpy_to_nested (_data.py line 6)
```
by
```
from sktime.utils.data_container import tabularize, from_3d_numpy_to_nested (_data.py line 6)
```
* We also modified InceptionTime to use binary_crossentropy (change loss name and use sigmod layer with 1 neuron as an output) and weighted accuracy for early stopping. This is not mandatory but is more suited to our problem.
## Contributing
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment