Photo by Aron Visuals on Unsplash

TIME SERIES-I FUNDAMENTAL CONCEPTS

Nebile Kodaz
Analytics Vidhya
6 min readJul 25, 2021

--

Time series is one of the main topics in data science. Machine learning or deep learning algorithms can predict some measures in business, finance, and science in a period, let’s call it “time series” analysis. Time series keeps finding new practices in real life after parallel technologies improving such as IoT. For example, in a smart factory or in an energy farm, there is censor data that changes in time. More traditionally, a sales forecast is also a time series problem too.

In my previous job experience, I was trying to forecast sales in the company. As I told you above, it is a very large topic and we may not be a time series expert in a week. Time by time I go deeper into the topic and I want to share what I am learning on my journey of analyzing the time series. In this article, I will talk about the principle concepts of time series and then in the following articles, I will make a tutorial in Python for a prediction algorithm.

Time Series

The first concept is the time series self, to begin with; it might be observations or data changes in time. In the series, we have past data, real-time data and we want to have predicted future data up to our goal in a data science project. The data may vary in different periods such as hourly, daily, weekly, monthly, and annually. For example, stocks might be analyzed hourly or daily, sales might be analyzed weekly or monthly.

The Three Components of Time Series; Seasonality, Trend, and Residuals.

Seasonality

Seasonality can be seen on the graph when a time series is plotted. If there is a periodic behavior, it means that there is a seasonality component of the series. In the figure, the number of airline passengers from a well-known time series dataset is plotted. In the middle of the year during summer times, the number of passengers skyrockets then falls in the winter times periodically.

Trend

Trend term can be explained on the same time series above. The variable can increase or decrease over time. It would change the mean of the variable in time too. The blue color line shows the upward trend, which means the interest in airplane transportation is increasing in overall time. For some prediction models, we may need to get rid of the trend effect for the sake of stationarity.

Residuals/Noise

After the seasonality and trend components are dropped from a time series, noise or residual is the non-correlated remaining random part of the time series.

Summary Stats

When we are studying time series, we will use some stat rules. First of all, we need the mean of the series. In the case of a times series with a constant mean, the stationarity of a series that is another term would be explained soon, might be possible to some degree.

One another important term is the standard deviation. Standard deviation is used in the computation of the autocorrelation of a time series by using the Pearson correlation coefficient computation method. A constant standard deviation is a precondition for the stationarity of a time series.

Stationarity

To predict the future periods in a time series, stationarity should be checked in some prediction models. Stationarity tells about the statistics of the series are not changing over time and it makes the series more predictable.

The three preconditions need to be satisfied to determine stationarity. These are the constant mean and standard deviation without a seasonality effect. Dicky fuller test is used for testing the stationary. We would call the adfuller() method from the “statsmodel” library in Python. After that, we would check the p-value for the significance of the dicky fuller test and reject or accept the relevant hypothesis.

Likewise, we can understand the non-stationary series visually after plot it. For example, below the time series in (a), has an increasing mean daily, so we can say it is non-stationary for the 200 days. In (g) and (h) we can see the seasonality effect easily so both series are not stationary too. In the next article, I will show decomposing a series into seasonality, trend, and residual components as well, test it with dicky-fuller method for stationarity.

Figure: Non-stationary time series examples. (Photo byotexts.com)

Autocorrelation (ACF) and Partial Autocorrelation (PACF)

Correlation explains the two variables’ relations statistically. The two variables’ standard deviations and means get into the computation of the correlation coefficients. We need to check these correlational concepts to determine the predictability of the series. If there is no correlation and the observations are random in the series, we won’t be able to model the series mathematically to predict future periods. Auto-correlation is a type of check the relation among the observations in a time series. The existed relation among observations over time is explained with the autocorrelation function (ACF). Auto adds a reflexive effect into the correlation term. The conventional method of Pearson correlation is used to compute it.

https://www.business-science.io/timeseries-analysis/2017/08/30/tidy-timeseries-analysis-pt-4.html
Photo by Matt Dancho

Figure: How do lags work?

Lag is a delay time in the series to compute the correlations. Lags are created after shifted the series, then compare the lagged series with the original not-lagged series. In lag 1, we compare the elements with the one-step-back elements. In lag 2, we go two steps back to compare the elements in the series, and so on. To explain it clearly, look at the figure above, in a monthly time series, k=1 in the lag computation, lag 1 means the correlation between the previous month and the current month. Let’s say the current month is January, in lag 1 we compare the series by shifting from February. Lag 2, we check the correlation after shifting two months in the series. Therefore, shifted series will start in March. Lag 3 means we check the correlations of the elements after a delay in the series from April.

Partial autocorrelation is a bit different from the ACF. When it is computed, we again use lags and k values too. We have a simple regression function with an error part to calculate and plot the partial-autocorrelation function (PACF).

To interpret these ACF and PACF plots, I added the figures below. On the x-axes, we see the lags. Both functions ( ACF-PACF) have the 1 correlation coefficient in the zero-lag as default. The blue color bands show the statistical significance of the correlation coefficients. The values of the lags, outside the blue band, are not significant. Below, we observed the time series is not auto-correlative. Because all coefficients are not significant and are inside the blue band. PACF is significant in only 21,29,31, and 37th lags. We can use these lags as differences levels to make the series stationary in future steps of the analysis.

The other example to read the ACF and PACF plots is below. We can see almost all autocorrelation coefficients are significant, thus we can conclude the series is autocorrelated. Moreover, PACF of the series has 1,2,3,4,6, and 7 lags which are significant so we can use these lags to make the series stationary.

CONCLUSION,

In my first article about time series, I tried to explain fundamental concepts of time series. When we have a time series data set, we would try to read and interpret the data from these perspectives at the first stage. Before we built the model for prediction, we need to explore and prepare the time series for analysis by using these concepts.

The next article would include a Jupyter notebook to practice all these concepts in Python time series analysis.

Originally published at https://www.linkedin.com.

--

--