Commit 93289677 authored by Antoine Guillaume's avatar Antoine Guillaume

Minor fixes & typos

parent 362f5931
......@@ -25,11 +25,11 @@ from sklearn.metrics import f1_score, balanced_accuracy_score, make_scorer
from sklearn.preprocessing import MinMaxScaler
# # Params
#
# In[3]:
# # Parameters
#
#Base path, all necessary folders are supposed to be contained in this one.
base_path = r"/home/prof/guillaume/"
......@@ -40,8 +40,7 @@ dataset_path = base_path+r"datasets/"
result_path = base_path+r"results/"
#If not None, CSV files containing data used by the TS-CHIEF java program will be outputed
#dataset_path+r"TSCHIEF/"
TSCHIEF_path = None
TSCHIEF_path = dataset_path+r"TSCHIEF/"
#If True, perform cross validation of all defined pipelines
do_cross_validation = True
......@@ -172,8 +171,6 @@ codes = np.unique(codes) #Unique event codes present in the data, in increasing
# In[6]:
##
def get_R1_dict(codes):
return {x : i for i,x in enumerate(codes)}
......@@ -240,6 +237,8 @@ def apply_code_dict(df, code_dic, code_column='cod_evt'):
# # Define pipelines
# We now define the pipelines that we will use for crossvalidation
#Number of features selected when selection from extra random trees is performed
max_features=100
pipeline_dict = {}
......
......@@ -7,13 +7,15 @@ This is the companion repository of the "Time series classification for predicti
Time series classification (TSC) gained a lot of attention in the past decade and number of methods for representing and classifying time series have been proposed.
Nowadays, methods based on convolutional networks and ensemble techniques represent the state of the art for time series classification. Techniques transforming time series to image or text also provide reliable ways to extract meaningful features or representations of time series. We compare the state-of-the-art representation and classification methods on a specific application, that is predictive maintenance from sequences of event logs. The contributions of this paper are twofold: introducing a new data set for predictive maintenance on automatic teller machines (ATMs) log data and comparing the performance of different representation methods for predicting the occurrence of a breakdown. The problem is difficult since unlike the classic case of predictive maintenance via signals from sensors, we have sequences of discrete event logs occurring at any time and the length of the sequences, corresponding to life cycles, varies a lot.
When using this repository or the ATM dataset, please cite:
When using this repository or the ATM dataset, please cite (to be modified with official link to paper later):
** Link to paper **
```
https://arxiv.org/abs/2011.10996
```
## Required packages
The experiment were conducted with python 3.8, the following packages are required to run the script:
The experiments were conducted with python 3.8, the following packages are required to run the script:
* numpy
* scikit-learn
......@@ -24,21 +26,23 @@ The experiment were conducted with python 3.8, the following packages are requir
If you wish to run ResNet for images classification, you will also need Tensorflow 2.x, and sktime-dl for InceptionTime.
## How to get the ATM dataset
The ATM dataset being a property of equensWorldline, you must first send an email to "intellectual-property-team-worldline@worldline.com" and "antoine.guillaume@equensworldline.com" to ask for authorization.
The compressed archive weights around 50Mo for a total weight of 575Mo. The dictionary of event codes will be supplied at the same time.
## Parameters & Configuration
Configuration parameters are located at the beginning of CV_script, you MUST change the base_path to match the current directory of this project. Other parameters can be left as is to reproduce the results of the paper.
To change or check the algorithms parameters, they all are redefined in custom wrapper classes to avoid errors, if a parameter is not specified in the constructor, it is left as default.
The representations methods are defined inside utils.representations and the classifications methods inside utils.classifications.
The representation methods are defined inside utils.representations and the classification methods inside utils.classifications.
To change the parameter of TS-CHIEF, you can change the values of the following arguments in the ts-chief script:
```bash
-trees="300" -s="ee:4,boss:50,rise:50"
```
If you want to give more predictive power to this algorithm, increasing the number of trees and the number of random split generated by each method (boss, rise, ...) is the way to go. We used those value to avoid memory errors, the shorter the input time series, the higher those values can be without causing trouble.
If you want to give more predictive power to this algorithm, increasing the number of trees and the number of random splits generated by each method (boss, rise, ...) is the way to go. We used those value to avoid memory errors, the shorter the input time series, the higher those values can be without causing trouble.
## How to get the ATM dataset
The ATM dataset being a property of equensWorldline, you must first send an email to "intellectual-property-team-worldline@worldline.com" and "antoine.guillaume@equensworldline.com" to ask for authorization.
The compressed archive weights around 50Mo for a total weight of 575Mo. The dictionary of event codes will be supplied at the same time.
Unziping the dataset into this project folder inside a folder named "dataset" is the default path. It can be changed inside the parameter section of the CV_script file.
## Usage
......
......@@ -10,4 +10,3 @@ do
jdk/jdk-15/bin/java -Xms6G -Xmx12G -jar tschief.jar -train="datasets/TSCHIEF/data_Train_"$size"_"$id_cv"_R"$id_r".csv" -test="datasets/TSCHIEF/data_Test_"$size"_"$id_cv"_R"$id_r".csv" -out="results/TSCHIEF/" -repeats="1" -trees="300" -s="ee:4,boss:50,rise:50" -export="1" -verbosity="1" -shuffle="True" -target_column="last"
done
done
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment