assert(type(ts.index) is pd.core.indexes.datetimes.DatetimeIndex)
```
%% Cell type:code id:c5271112 tags:
```python
```
# MS: month start frequency
ts.index.freq = "MS"
```
%% Cell type:code id:919229a4 tags:
```python
```
plt.plot(ts.V1)
```
%% Cell type:markdown id:615221ca tags:
By plotting the data, we can see that the expectancy and the standard deviation do not seem to be constant so the time series is probably not stationary.
But, we perform a augmented Dickey-Fuller test to decide if it is or not a stationary time series.
%% Cell type:code id:61b00901 tags:
```python
```
from statsmodels.tsa.stattools import adfuller
#perform augmented Dickey-Fuller test
test = adfuller(ts.V1, autolag='AIC')
pvalue = test[1]
print(pvalue)
```
%% Cell type:markdown id:3c24d32d tags:
The p-value is approximately 0.78, so much higher than 0.005 so we accept the hypothesis that the data is stationary.
The p-value is approximately 0.78, so much higher than 0.005 so we accept the hypothesis that the data is not stationary.
%% Cell type:markdown id:e19bf18c tags:
We use a Box-Cox transformation in order to obtain stationary time series
test = adfuller(ts_stationary_train, autolag='AIC')
pvalue = test[1]
print(pvalue)
test = adfuller(ts_stationary_test, autolag='AIC')
pvalue = test[1]
print(pvalue)
```
%% Cell type:markdown id:66abd1a8 tags:
If we split the data in two parts we have the first part which seems to be stationary because the p-value of the adfuller test is 0.0 so lower than 0.005.
But the second part is not stationary because the p-value is almost 1.
%% Cell type:code id:4b35ea92 tags:
```python
```
print(type(ts_stationary_train))
plt.plot(ts_stationary_train)
```
%% Cell type:code id:70235c46 tags:
```python
```
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.graphics.tsaplots import plot_acf
"assert(type(ts.index) is pd.core.indexes.datetimes.DatetimeIndex)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c5271112",
"metadata": {},
"outputs": [],
"source": [
"# MS: month start frequency\n",
"ts.index.freq = \"MS\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "919229a4",
"metadata": {},
"outputs": [],
"source": [
"plt.plot(ts.V1)"
]
},
{
"cell_type": "markdown",
"id": "615221ca",
"metadata": {},
"source": [
"By plotting the data, we can see that the expectancy and the standard deviation do not seem to be constant so the time series is probably not stationary.\n",
"But, we perform a augmented Dickey-Fuller test to decide if it is or not a stationary time series."
"The given p-value is 0.79 so we are highly confident that the data is not stationary, as we expected."
]
},
{
"cell_type": "markdown",
"id": "3d4ddcfd",
"metadata": {},
"source": [
"By inspecting the data, we fist see a trend (debit card usage increases over time).\n",
"\n",
"We also see regular peaks.\n",
"We will \"zoom\" on the data to see when those peaks append."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "05987f75",
"metadata": {},
"outputs": [],
"source": [
"plt.rcParams['figure.figsize'] = [12, 5]\n",
"ts_zoom = ts['2000-01-01':'2003-01-01']\n",
"plt.plot(ts_zoom)"
]
},
{
"cell_type": "markdown",
"id": "fdc7d106",
"metadata": {},
"source": [
"We can see that the peaks seems to appear annually in december (which is quite logical).\n",
"We will thus presume a seasonality of 12 months on the data.\n",
"\n",
"We thus have :\n",
"- A global increasing trend over time\n",
"- A seasonal effect with a period of twelve months\n",
"- A (maybe) stationary time series"
]
},
{
"cell_type": "markdown",
"id": "d7839e3a",
"metadata": {},
"source": [
"We will first bet on a constant augmentation.\n",
"We will thus use an integration of order 1 to reduce this effect.\n",
"\n",
"By reading [the documentation on SARIMAX](https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_stata.html#ARIMA-Example-2:-Arima-with-additive-seasonal-effects) we decided to try the following :\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d2d59724",
"metadata": {},
"outputs": [],
"source": [
"from statsmodels.tsa.statespace.sarimax import SARIMAX as sarimax\n",
"\n",
"ar = 1 # Max dregree of the polynomial\n",
"ma = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) # This is a seasonal effect on twelve months\n",
"i = 1\n",
"\n",
"model = sarimax(ts.V1, trend='c', order=(ar, i, ma))\n",
"res = model.fit()\n",
"\n",
"plt.rcParams['figure.figsize'] = [10, 10]\n",
"_ = res.plot_diagnostics()"
]
},
{
"cell_type": "markdown",
"id": "46a5c37e",
"metadata": {},
"source": [
"We can see with this diagnostics that the residuals are not really normally distributed and that there is some correlation on them."
]
},
{
"cell_type": "markdown",
"id": "d9eed6c7",
"metadata": {},
"source": [
"By re-inspecting our data, we see that the variance might not be constant.\n",
"To counter this effect, we will try to use the same ARIMA model on the log of the data. "
assert(type(ts.index) is pd.core.indexes.datetimes.DatetimeIndex)
```
%% Cell type:code id:c5271112 tags:
```
# MS: month start frequency
ts.index.freq = "MS"
```
%% Cell type:code id:919229a4 tags:
```
plt.plot(ts.V1)
```
%% Cell type:markdown id:615221ca tags:
By plotting the data, we can see that the expectancy and the standard deviation do not seem to be constant so the time series is probably not stationary.
But, we perform a augmented Dickey-Fuller test to decide if it is or not a stationary time series.
%% Cell type:code id:61b00901 tags:
```
from statsmodels.tsa.stattools import adfuller
#perform augmented Dickey-Fuller test
test = adfuller(ts.V1, autolag='AIC')
pvalue = test[1]
print(pvalue)
```
%% Cell type:markdown id:a195677c tags:
The given p-value is 0.79 so we are highly confident that the data is not stationary, as we expected.
%% Cell type:markdown id:3d4ddcfd tags:
By inspecting the data, we fist see a trend (debit card usage increases over time).
We also see regular peaks.
We will "zoom" on the data to see when those peaks append.
%% Cell type:code id:05987f75 tags:
```
plt.rcParams['figure.figsize'] = [12, 5]
ts_zoom = ts['2000-01-01':'2003-01-01']
plt.plot(ts_zoom)
```
%% Cell type:markdown id:fdc7d106 tags:
We can see that the peaks seems to appear annually in december (which is quite logical).
We will thus presume a seasonality of 12 months on the data.
We thus have :
- A global increasing trend over time
- A seasonal effect with a period of twelve months
- A (maybe) stationary time series
%% Cell type:markdown id:d7839e3a tags:
We will first bet on a constant augmentation.
We will thus use an integration of order 1 to reduce this effect.
By reading [the documentation on SARIMAX](https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_stata.html#ARIMA-Example-2:-Arima-with-additive-seasonal-effects) we decided to try the following :