Installation

We recommend installing pyprocessta in a dedicated virtual environment or conda environment. Note that we tested the code on Python 3.8.

The latest version of pyprocessta can be installed from GitHub using

pip install git+https://github.com/kjappelbaum/pyprocessta.git

Preprocessing

For basic preprocessing functions the pyprocessta.preprocess module can be used.

Aligning to dataframes

To align two dataframes, use

from pyprocessta.preprocess.align import align_two_dfs

aligned_dataframe = align_two_dfs(dataframe_a, dataframe_b)

Filtering and smoothing

To perform basic filtering operations you can use

from pyprocessta.preprocess.smooth import z_score_filter, exponential_window_smoothing

dataframe_no_spikes = z_score_filter(dataframe)
dataframe_smoothed = exponential_window_smoothing(dataframe)

Detrending

Often, it can be useful to remove trend components from time series data. One can distinguish stochastic and deterministic trend components, and we provide utilities to remove both

from pyprocessta.detrend import detrend_stochastic, detrend_linear_deterministc

dataframe_no_linear_trend = detrend_linear_deterministc(input_dataframe)
dataframe_no_stochastic_trend = detrend_stochastic(input_dataframe)

Resampling

For many applications it is important to have data sampled on a regular grid. To resample data onto such a grid you can use

from pyprocessta.resample import resample_regular

data_resampled = resample_regular(input_dataframe, interval='2min')

EDA

Test for stationarity

One of the most important tests before modeling time series data is to check for stationarity (since many of the “simple” time series models assume stationarity).

from pyprocessta.eda.statistics import check_stationarity

test_results = check_stationarity(input_dataseries)

This will perform the Augmented-Dickey Fuller and Kwiatkowski–Phillips–Schmidt–Shin (KPSS).

Granger causality

One interesting analysis is to check for “correlations” between different timeseries. In timeseries speak, this means to look for Granger causality. To perform this analysis, you can use

from pyprocessta.eda.statistics import compute_granger_causality_matrix

causality_matrix = compute_granger_causality_matrix(input_dataframe)

The matrix can, for example, be plotted as heatmap and highlights the maximum “correlation” between two series (up to some maximum lag).

Training a TCN model

The Temporal convolutional neural network implementation uses the darts library. The only change is that we make it possible to also enable dropout for inference.

from pyprocessta.model.tcn import run_model, get_train_test_data, transform_data, get_data

x_timeseries, y_timeseries = get_data(my_dataframe, targets=my_targets, features=my_features_
train_tuple, test_tuple = get_train_test_data(x_timeseries, y_timeseries, split_date="2010-01-18 12:59:15")
train_tuple, test_tuples, transformers = transform_data(train_tuple, [test_tuple])

model = run_model(train_tuple)

Causal impact analysis

Causal impact analysis allows to estimate the effect of some intervention in the absence of a control experiment. For doing so, one builds a model of what the behavior of the system would be without the intervention. The approach used in the original causal impact paper uses Bayesian structured time series models which, simply speaking, model time series via two key equation: a state equation that connects a latent, unobserved, state to the observations and once equation that describes the transition between states. The model is then defined by a model for the state and transitions between the states (e.g., local level and seasonality). An efficient Python implementation of this is provided by the tfcausalimpact package. We provide a wrapper for this

from pyprocessta.causalimpact import run_causal_impact_analysis

ci = run_causal_impact_analysis(
    df=data,
    x_columns=["a", "b", "c"],
    intervention_column="a",
    y_column="e",
    start=[s_0, s_1],
    end=[e_0, e_1],
)

Where ci is an object.

In our work, we used the causal impact framework with TCN models with Monte-Carlo dropout uncertainty estimates. You can find the code for this in the paper directory.