Installation
Contents
Installation¶
We recommend installing pyprocessta in a dedicated virtual environment or conda environment. Note that we tested the code on Python 3.8.
The latest version of pyprocessta can be installed from GitHub using
pip install git+https://github.com/kjappelbaum/pyprocessta.git
Preprocessing¶
For basic preprocessing functions the pyprocessta.preprocess
module can be used.
Aligning to dataframes¶
To align two dataframes, use
from pyprocessta.preprocess.align import align_two_dfs
aligned_dataframe = align_two_dfs(dataframe_a, dataframe_b)
Filtering and smoothing¶
To perform basic filtering operations you can use
from pyprocessta.preprocess.smooth import z_score_filter, exponential_window_smoothing
dataframe_no_spikes = z_score_filter(dataframe)
dataframe_smoothed = exponential_window_smoothing(dataframe)
Detrending¶
Often, it can be useful to remove trend components from time series data. One can distinguish stochastic and deterministic trend components, and we provide utilities to remove both
from pyprocessta.detrend import detrend_stochastic, detrend_linear_deterministc
dataframe_no_linear_trend = detrend_linear_deterministc(input_dataframe)
dataframe_no_stochastic_trend = detrend_stochastic(input_dataframe)
Resampling¶
For many applications it is important to have data sampled on a regular grid. To resample data onto such a grid you can use
from pyprocessta.resample import resample_regular
data_resampled = resample_regular(input_dataframe, interval='2min')
EDA¶
Test for stationarity¶
One of the most important tests before modeling time series data is to check for stationarity (since many of the “simple” time series models assume stationarity).
from pyprocessta.eda.statistics import check_stationarity
test_results = check_stationarity(input_dataseries)
This will perform the Augmented-Dickey Fuller and Kwiatkowski–Phillips–Schmidt–Shin (KPSS).
Granger causality¶
One interesting analysis is to check for “correlations” between different timeseries. In timeseries speak, this means to look for Granger causality. To perform this analysis, you can use
from pyprocessta.eda.statistics import compute_granger_causality_matrix
causality_matrix = compute_granger_causality_matrix(input_dataframe)
The matrix can, for example, be plotted as heatmap and highlights the maximum “correlation” between two series (up to some maximum lag).
Training a TCN model¶
The Temporal convolutional neural network implementation uses the darts library. The only change is that we make it possible to also enable dropout for inference.
from pyprocessta.model.tcn import run_model, get_train_test_data, transform_data, get_data
x_timeseries, y_timeseries = get_data(my_dataframe, targets=my_targets, features=my_features_
train_tuple, test_tuple = get_train_test_data(x_timeseries, y_timeseries, split_date="2010-01-18 12:59:15")
train_tuple, test_tuples, transformers = transform_data(train_tuple, [test_tuple])
model = run_model(train_tuple)
Causal impact analysis¶
Causal impact analysis allows to estimate the effect of some intervention in the absence of a control experiment. For doing so, one builds a model of what the behavior of the system would be without the intervention. The approach used in the original causal impact paper uses Bayesian structured time series models which, simply speaking, model time series via two key equation: a state equation that connects a latent, unobserved, state to the observations and once equation that describes the transition between states. The model is then defined by a model for the state and transitions between the states (e.g., local level and seasonality). An efficient Python implementation of this is provided by the tfcausalimpact package. We provide a wrapper for this
from pyprocessta.causalimpact import run_causal_impact_analysis
ci = run_causal_impact_analysis(
df=data,
x_columns=["a", "b", "c"],
intervention_column="a",
y_column="e",
start=[s_0, s_1],
end=[e_0, e_1],
)
Where ci is an object.
In our work, we used the causal impact framework with TCN models with Monte-Carlo dropout uncertainty estimates. You can find the code for this in the paper directory.