Minor fixes & typos

93289677 · Antoine Guillaume · 362f5931 · 93289677 · 93289677 · 93289677
Commit 93289677 authored Jan 11, 2021 by Antoine Guillaume
7 changed files
--- a/CV_script.py
+++ b/CV_script.py
@@ -25,11 +25,11 @@ from sklearn.metrics import f1_score, balanced_accuracy_score, make_scorer
 from sklearn.preprocessing import MinMaxScaler


-# # Params
-#
-
 # In[3]:

+# # Parameters
+#    
+
 #Base path, all necessary folders are supposed to be contained in this one.
 base_path = r"/home/prof/guillaume/"

@@ -40,8 +40,7 @@ dataset_path = base_path+r"datasets/"
 result_path = base_path+r"results/"

 #If not None, CSV files containing data used by the TS-CHIEF java program will be outputed
-#dataset_path+r"TSCHIEF/"
-TSCHIEF_path = None
+TSCHIEF_path = dataset_path+r"TSCHIEF/"

 #If True, perform cross validation of all defined pipelines
 do_cross_validation = True
@@ -172,8 +171,6 @@ codes = np.unique(codes) #Unique event codes present in the data, in increasing

 # In[6]:

-##
-
 def get_R1_dict(codes):
    return {x : i for i,x in enumerate(codes)}

@@ -240,6 +237,8 @@ def apply_code_dict(df, code_dic, code_column='cod_evt'):
 # # Define pipelines

 # We now define the pipelines that we will use for crossvalidation
+
+#Number of features selected when selection from extra random trees is performed
 max_features=100

 pipeline_dict = {}

--- a/README.md
+++ b/README.md
@@ -7,13 +7,15 @@ This is the companion repository of the "Time series classification for predicti
 Time series classification (TSC) gained a lot of attention in the past decade and number of methods for representing and classifying time series have been proposed.
 Nowadays, methods based on convolutional networks and ensemble techniques represent the state of the art for time series classification. Techniques transforming time series to image or text also provide reliable ways to extract meaningful features or representations of time series. We compare the state-of-the-art representation and classification methods on a specific application, that is predictive maintenance from sequences of event logs. The contributions of this paper are twofold: introducing a new data set for predictive maintenance on automatic teller machines (ATMs) log data and comparing the performance of different representation methods for predicting the occurrence of a breakdown. The problem is difficult since unlike the classic case of predictive maintenance via signals from sensors, we have sequences of discrete event logs occurring at any time and the length of the sequences, corresponding to life cycles, varies a lot.

-When using this repository or the ATM dataset, please cite:
+When using this repository or the ATM dataset, please cite (to be modified with official link to paper later):

-** Link to paper **
+```
+https://arxiv.org/abs/2011.10996
+```

 ## Required packages

-The experiment were conducted with python 3.8, the following packages are required to run the script:
+The experiments were conducted with python 3.8, the following packages are required to run the script:

 * numpy
 * scikit-learn
@@ -24,21 +26,23 @@ The experiment were conducted with python 3.8, the following packages are requir

 If you wish to run ResNet for images classification, you will also need Tensorflow 2.x, and sktime-dl for InceptionTime.

-## How to get the ATM dataset
-The ATM dataset being a property of equensWorldline, you must first send an email to "intellectual-property-team-worldline@worldline.com" and "antoine.guillaume@equensworldline.com" to ask for authorization. 
-The compressed archive weights around 50Mo for a total weight of 575Mo. The dictionary of event codes will be supplied at the same time.
-
 ## Parameters & Configuration

 Configuration parameters are located at the beginning of CV_script, you MUST change the base_path to match the current directory of this project. Other parameters can be left as is to reproduce the results of the paper.
 To change or check the algorithms parameters, they all are redefined in custom wrapper classes to avoid errors, if a parameter is not specified in the constructor, it is left as default.
-The representations methods are defined inside utils.representations and the classifications methods inside utils.classifications.
+The representation methods are defined inside utils.representations and the classification methods inside utils.classifications.

 To change the parameter of TS-CHIEF, you can change the values of the following arguments in the ts-chief script:
 ```bash
 -trees="300" -s="ee:4,boss:50,rise:50"
 ```
-If you want to give more predictive power to this algorithm, increasing the number of trees and the number of random split generated by each method (boss, rise, ...) is the way to go. We used those value to avoid memory errors, the shorter the input time series, the higher those values can be without causing trouble.
+If you want to give more predictive power to this algorithm, increasing the number of trees and the number of random splits generated by each method (boss, rise, ...) is the way to go. We used those value to avoid memory errors, the shorter the input time series, the higher those values can be without causing trouble.
+
+## How to get the ATM dataset
+The ATM dataset being a property of equensWorldline, you must first send an email to "intellectual-property-team-worldline@worldline.com" and "antoine.guillaume@equensworldline.com" to ask for authorization. 
+The compressed archive weights around 50Mo for a total weight of 575Mo. The dictionary of event codes will be supplied at the same time.
+
+Unziping the dataset into this project folder inside a folder named "dataset" is the default path. It can be changed inside the parameter section of the CV_script file.

 ## Usage


--- a/ROCKET interpretability.py
+++ b/ROCKET interpretability.py
--- a/ts-chief_script.sh
+++ b/ts-chief_script.sh
@@ -10,4 +10,3 @@ do
 		jdk/jdk-15/bin/java -Xms6G -Xmx12G -jar tschief.jar -train="datasets/TSCHIEF/data_Train_"$size"_"$id_cv"_R"$id_r".csv" -test="datasets/TSCHIEF/data_Test_"$size"_"$id_cv"_R"$id_r".csv" -out="results/TSCHIEF/" -repeats="1" -trees="300" -s="ee:4,boss:50,rise:50" -export="1" -verbosity="1" -shuffle="True" -target_column="last" 
 	done
 done
-
--- a/utils/__pycache__/__init__.cpython-38.pyc
+++ b/utils/__pycache__/__init__.cpython-38.pyc
--- a/utils/__pycache__/classifications.cpython-38.pyc
+++ b/utils/__pycache__/classifications.cpython-38.pyc
--- a/utils/__pycache__/representations.cpython-38.pyc
+++ b/utils/__pycache__/representations.cpython-38.pyc