"- The amount in february is a little less than the amount in january\n",
"- The amount grows after february"
]
},
{
"cell_type": "markdown",
"id": "b88f48fe",
"metadata": {},
"source": [
"Another method consists of using the [LOESS](https://fr.wikipedia.org/wiki/R%C3%A9gression_locale)"
]
},
{
"cell_type": "markdown",
"id": "6cca3ef7",
"metadata": {},
"source": [
"We decompose the data with the STL function\n",
"\n",
"According to the [manual](https://www.statsmodels.org/v0.11.0/generated/statsmodels.tsa.seasonal.STL.html) ,\"The period (12) is automatically detected from the data’s frequency (‘M’)\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0316887d",
"metadata": {},
"outputs": [],
"source": [
"from statsmodels.tsa.seasonal import STL\n",
"res = STL(ts).fit()\n",
"res.plot()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "4af9c502",
"metadata": {},
"source": [
"We obtain a decomposition with the trend, the seasonal component and a stationary time series (the resid)\n",
"\n",
"We have to verify if is the resid is really stationary"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "58dd2521",
"metadata": {},
"outputs": [],
"source": [
"test = adfuller(res.resid, autolag='AIC')\n",
"pvalue = test[1]\n",
"print(pvalue)"
]
},
{
"cell_type": "markdown",
"id": "3fbbfdfa",
"metadata": {},
"source": [
"The p-value is approximately 7.5e-11 so we can conclude that the resid is a stationary time series (as expected)"
]
},
{
"cell_type": "markdown",
"id": "46547d97",
"metadata": {},
"source": [
"We divide the data into two data sets: a training data set and a test dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0fb793a3",
"metadata": {},
"outputs": [],
"source": [
"ts_train=ts[:\"2009-01-01\"]\n",
"ts_test=ts[\"2009-01-01\":]"
]
},
{
"cell_type": "markdown",
"id": "bc7757d9",
"metadata": {},
"source": [
"We use the `forecast` function of STL on the training data set in order to compare the result with the test data set and we plot them together\n",
"\n",
"According to the [manual] (https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima.model.ARIMA.html), we use an ARIMA model because we are working with integration models"
By plotting the data, we can see that the expectancy and the standard deviation do not seem to be constant so the time series is probably not stationary.
But, we perform a augmented Dickey-Fuller test to decide if it is or not a stationary time series.
%% Cell type:code id:61b00901 tags:
```
```python
fromstatsmodels.tsa.stattoolsimportadfuller
#perform augmented Dickey-Fuller test
test=adfuller(ts.V1,autolag='AIC')
pvalue=test[1]
print(pvalue)
```
%% Cell type:markdown id:a49775f2 tags:
The given p-value is 0.79 so we are highly confident that the data is not stationary, as we expected.
%% Cell type:markdown id:bf951e7a tags:
By inspecting the data, we fist see a trend (debit card usage increases over time).
We also see regular peaks.
We will "zoom" on the data to see when those peaks append.
%% Cell type:code id:68947c59 tags:
```
```python
plt.rcParams['figure.figsize']=[12,5]
ts_zoom=ts['2000-01-01':'2003-01-01']
plt.plot(ts_zoom)
```
%% Cell type:markdown id:701d104d tags:
We can see that the peaks seems to appear annually in december (which is quite logical).
We will thus presume a seasonality of 12 months on the data.
We thus have :
- A global increasing trend over time
- A seasonal effect with a period of twelve months
- A (maybe) stationary time series
%% Cell type:markdown id:066f28ff tags:
We will first bet on a constant augmentation.
We will thus use an integration of order 1 to reduce this effect.
By reading [the documentation on SARIMAX](https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_stata.html#ARIMA-Example-2:-Arima-with-additive-seasonal-effects) we decided to try the following :
The residuals seems to be very close to a normal distribution (especially on the Q-Q plot), but we see some correlation between them.
%% Cell type:markdown id:979b8d56 tags:
Using this model, we can try to predict the cumulated debit card usage for the 4 first months of 2013.
%% Cell type:code id:a23c5733 tags:
```
```python
forecast=np.exp(res.forecast(4))
ts.plot(label='Data',legend=True)
forecast.plot(label='Forecast',legend=True)
```
%% Cell type:markdown id:b83cccc1 tags:
The obtained predictions seems coherent with our data :
- The amount in january is far less than the amount of the peak of december;
- The amount in february is a little less than the amount in january
- The amount grows after february
%% Cell type:markdown id:b88f48fe tags:
Another method consists of using the [LOESS](https://fr.wikipedia.org/wiki/R%C3%A9gression_locale)
%% Cell type:markdown id:6cca3ef7 tags:
We decompose the data with the STL function
According to the [manual](https://www.statsmodels.org/v0.11.0/generated/statsmodels.tsa.seasonal.STL.html) ,"The period (12) is automatically detected from the data’s frequency (‘M’)".
%% Cell type:code id:0316887d tags:
``` python
fromstatsmodels.tsa.seasonalimportSTL
res=STL(ts).fit()
res.plot()
plt.show()
```
%% Cell type:markdown id:4af9c502 tags:
We obtain a decomposition with the trend, the seasonal component and a stationary time series (the resid)
We have to verify if is the resid is really stationary
%% Cell type:code id:58dd2521 tags:
``` python
test=adfuller(res.resid,autolag='AIC')
pvalue=test[1]
print(pvalue)
```
%% Cell type:markdown id:3fbbfdfa tags:
The p-value is approximately 7.5e-11 so we can conclude that the resid is a stationary time series (as expected)
%% Cell type:markdown id:46547d97 tags:
We divide the data into two data sets: a training data set and a test dataset
%% Cell type:code id:0fb793a3 tags:
``` python
ts_train=ts[:"2009-01-01"]
ts_test=ts["2009-01-01":]
```
%% Cell type:markdown id:bc7757d9 tags:
We use the `forecast` function of STL on the training data set in order to compare the result with the test data set and we plot them together
According to the [manual] (https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima.model.ARIMA.html), we use an ARIMA model because we are working with integration models