Computational Tools for Macroeconometrics

Assignment 1

Introduction

This assignment introduces students to practical and theoretical aspects of macroeconometrics, focusing on forecasting using the FRED-MD dataset. Students will learn to handle macroeconomic data, perform necessary transformations, apply univariate models to predict key economic indicators and to evaluate these forecasts.

The FRED-MD dataset

The FRED-MD dataset is a comprehensive monthly database for macroeconomic research compiled by the Federal Reserve Bank of St. Louis. It features a wide array of economic indicators. The list of economic indicators can be obtained from the paper accompanying the data pdf.

The data can be downloaded here. The page contains all the different vintages of the data.

Let us start to download the current.csv file:

import pandas as pd

# Load the dataset
df = pd.read_csv('~/Downloads/current.csv')

# Clean the DataFrame by removing the row with transformation codes
df_cleaned = df.drop(index=0)
df_cleaned.reset_index(drop=True, inplace=True)
df_cleaned['sasdate'] = pd.to_datetime(df_cleaned['sasdate'], format='%m/%d/%Y')
df_cleaned
sasdate RPI W875RX1 DPCERA3M086SBEA CMRMTSPLx RETAILx INDPRO IPFPNSS IPFINAL IPCONGD ... DNDGRG3M086SBEA DSERRG3M086SBEA CES0600000008 CES2000000008 CES3000000008 UMCSENTx DTCOLNVHFNM DTCTHFNM INVEST VIXCLSx
0 1959-01-01 2583.560 2426.0 15.188 2.766768e+05 18235.77392 21.9665 23.3891 22.2688 31.7011 ... 18.294 10.152 2.13 2.45 2.04 NaN 6476.00 12298.00 84.2043 NaN
1 1959-02-01 2593.596 2434.8 15.346 2.787140e+05 18369.56308 22.3966 23.7048 22.4617 31.9337 ... 18.302 10.167 2.14 2.46 2.05 NaN 6476.00 12298.00 83.5280 NaN
2 1959-03-01 2610.396 2452.7 15.491 2.777753e+05 18523.05762 22.7193 23.8483 22.5719 31.9337 ... 18.289 10.185 2.15 2.45 2.07 NaN 6508.00 12349.00 81.6405 NaN
3 1959-04-01 2627.446 2470.0 15.435 2.833627e+05 18534.46600 23.2032 24.1927 22.9026 32.4374 ... 18.300 10.221 2.16 2.47 2.08 NaN 6620.00 12484.00 81.8099 NaN
4 1959-05-01 2642.720 2486.4 15.622 2.853072e+05 18679.66354 23.5528 24.3936 23.1231 32.5925 ... 18.280 10.238 2.17 2.48 2.08 95.3 6753.00 12646.00 80.7315 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
776 2023-09-01 19111.748 15741.9 116.594 1.507530e+06 705304.00000 103.2096 101.0935 101.3665 102.1034 ... 120.395 123.976 29.90 34.55 26.62 67.9 508808.61 913938.95 5074.6108 15.0424
777 2023-10-01 19145.402 15784.6 116.663 1.505477e+06 703528.00000 102.3722 100.5292 100.5527 101.1664 ... 120.040 124.228 29.97 34.67 26.65 63.8 513229.64 918210.64 5015.5456 19.0462
778 2023-11-01 19213.108 15859.9 117.127 1.514733e+06 703336.00000 102.6710 100.9362 101.2159 101.8557 ... 119.325 124.551 30.26 34.96 26.89 61.3 517434.30 922552.40 4999.7208 13.8563
779 2023-12-01 19251.946 15899.0 117.773 1.530296e+06 706180.00000 102.6715 100.8332 101.2843 101.9884 ... 119.193 124.917 30.45 35.01 27.14 69.7 522366.13 928336.14 5077.4222 12.6960
780 2024-01-01 19377.558 15948.8 117.639 NaN 700291.00000 102.5739 100.9984 101.7258 102.6235 ... 118.745 125.662 30.56 35.21 27.22 NaN NaN NaN 5105.3504 13.3453

781 rows × 128 columns

# Extract transformation codes
transformation_codes = df.iloc[0, 1:].to_frame().reset_index()
transformation_codes.columns = ['Series', 'Transformation_Code']

The transformation codes map variables to the transformations we must apply to each variable to render them (approximately) stationary. The data frame transformation_codes has the variable’s name (Series) and its transformation (Transformation_Code). There are six possible transformations (\(x_t\) denotes the variable to which the transformation is to be applied):

  • transformation_code=1: no trasformation
  • transformation_code=2: \(\Delta x_t\)
  • transformation_code=3: \(\Delta^2 x_t\)
  • transformation_code=4: \(log(x_t)\)
  • transformation_code=5: \(\Delta log(x_t)\)
  • transformation_code=6: \(\Delta^2 log(x_t)\)
  • transformation_code=7: \(\Delta (x_t/x_{t-1} - 1)\)

We can apply these transformations using the following code:

import numpy as np

# Function to apply transformations based on the transformation code
def apply_transformation(series, code):
    if code == 1:
        # No transformation
        return series
    elif code == 2:
        # First difference
        return series.diff()
    elif code == 3:
        # Second difference
        return series.diff().diff()
    elif code == 4:
        # Log
        return np.log(series)
    elif code == 5:
        # First difference of log
        return np.log(series).diff()
    elif code == 6:
        # Second difference of log
        return np.log(series).diff().diff()
    elif code == 7:
        # Delta (x_t/x_{t-1} - 1)
        return series.pct_change()
    else:
        raise ValueError("Invalid transformation code")

# Applying the transformations to each column in df_cleaned based on transformation_codes
for series_name, code in transformation_codes.values:
    df_cleaned[series_name] = apply_transformation(df_cleaned[series_name].astype(float), float(code))


1df_cleaned = df_cleaned[2:]
2df_cleaned.reset_index(drop=True, inplace=True)
df_cleaned.head()
1
Since some transformations induce missing values, we drop the first two observations of the dataset
2
We reset the index so that the first observation of the dataset has index 0
sasdate RPI W875RX1 DPCERA3M086SBEA CMRMTSPLx RETAILx INDPRO IPFPNSS IPFINAL IPCONGD ... DNDGRG3M086SBEA DSERRG3M086SBEA CES0600000008 CES2000000008 CES3000000008 UMCSENTx DTCOLNVHFNM DTCTHFNM INVEST VIXCLSx
0 1959-03-01 0.006457 0.007325 0.009404 -0.003374 0.008321 0.014306 0.006035 0.004894 0.000000 ... -0.001148 0.000292 -0.000022 -0.008147 0.004819 NaN 0.004929 0.004138 -0.014792 NaN
1 1959-04-01 0.006510 0.007029 -0.003622 0.019915 0.000616 0.021075 0.014338 0.014545 0.015650 ... 0.001312 0.001760 -0.000022 0.012203 -0.004890 NaN 0.012134 0.006734 0.024929 NaN
2 1959-05-01 0.005796 0.006618 0.012043 0.006839 0.007803 0.014955 0.008270 0.009582 0.004770 ... -0.001695 -0.001867 -0.000021 -0.004090 -0.004819 NaN 0.002828 0.002020 -0.015342 NaN
3 1959-06-01 0.003068 0.003012 0.003642 -0.000097 0.009064 0.001141 0.007034 0.007128 -0.004767 ... 0.003334 0.001946 -0.004619 0.003992 0.004796 NaN 0.009726 0.009007 -0.012252 NaN
4 1959-07-01 -0.000580 -0.000762 -0.003386 0.012155 -0.000330 -0.024240 0.001168 0.008249 0.013054 ... -0.001204 -0.000013 0.000000 -0.004040 -0.004796 NaN -0.004631 -0.001000 0.029341 NaN

5 rows × 128 columns

1import matplotlib.pyplot as plt
import matplotlib.dates as mdates

2series_to_plot = ['INDPRO', 'CPIAUCSL', 'TB3MS']
series_names = ['Industrial Production',
                'Inflation (CPI)',
                '3-month Treasury Bill rate']


# Create a figure and a grid of subplots
3fig, axs = plt.subplots(len(series_to_plot), 1, figsize=(8, 15))

# Iterate over the selected series and plot each one
for ax, series_name, plot_title in zip(axs, series_to_plot, series_names):
4    if series_name in df_cleaned.columns:
5        dates = pd.to_datetime(df_cleaned['sasdate'], format='%m/%d/%Y')
6        ax.plot(dates, df_cleaned[series_name], label=plot_title)
7        ax.xaxis.set_major_locator(mdates.YearLocator(base=5))
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
8        ax.set_title(plot_title)
9        ax.set_xlabel('Year')
        ax.set_ylabel('Transformed Value')
10        ax.legend(loc='upper left')
11        plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, ha='right')
    else:
        ax.set_visible(False)  # Hide plots for which the data is not available

12plt.tight_layout()
13plt.show()
1
We use library matplotlib to plot
2
We consider three series (INDPRO, CPIAUCSL, TB3MS) and assign them human-readable names (“Industrial Production”, “Inflation (CPI)”, “3-month Treasury Bill rate.”).
3
We create a figure with three (len(series_to_plot)) subplots arranged vertically. The figure size is 8x15 inches.
4
We check if the series exists in each series df_cleaned DataFrame columns.
5
We convert the sasdate column to datetime format (not necessary, since sasdate was converter earlier)
6
We plot each series against the sasdate on the corresponding subplot, labeling the plot with its human-readable name.
7
We format the x-axis to display ticks and label the x-axis with dates taken every five years.
8
Each subplot is titled with the name of the economic indicator.
9
We label the x-axis “Year,” and the y-axis “Transformed Value,” to indicate that the data was transformed before plotting.
10
A legend is added to the upper left of each subplot for clarity.
11
We rotate the x-axis labels by 45 degrees to prevent overlap and improve legibility.
12
plt.tight_layout() automatically adjusts subplot parameters to give specified padding and avoid overlap.
13
plt.show() displays the figure with its subplots.

Forecasting in Time Series

Forecasting in time series analysis involves using historical data to predict future values. The objective is to model the conditional expectation of a time series based on past observations.

Direct Forecasts

Direct forecasting involves modeling the target variable directly at the desired forecast horizon. Unlike iterative approaches, which forecast one step ahead and then use those forecasts as inputs for subsequent steps, direct forecasting directly models the relationship between past observations and future value.

ARX Models

Autoregressive Moving with predictors (ARX) models are a class of univariate time series models that extend ARMA models by incorporating exogenous (independent) variables. These models are formulated as follows:

\[ \begin{aligned} Y_{t+h} &= \alpha + \phi_0 Y_t + \phi_1 Y_{t-1} + \dots + \phi_p Y_{t-p} + \theta_{0,1} X_{t,1} + \theta_{1,1} X_{t-1,1} + \dots + \theta_{p,1} X_{t-p,1} + \dots + \theta_{0,k} X_{t,k} + \dots + \theta_{p,k} X_{t-p,k} + u_{t+h}\\ &= \alpha + \sum_{i=0}^p \phi_i Y_{t-i} + \sum_{j=1}^k\sum_{s=0}^p \theta_{s,j} X_{t-s,j} + \epsilon_{t+h} \end{aligned} \tag{1}\]

  • \(Y_{t+h}\): The target variable at time \(t+h\).
  • \(X_{t,j}\): Predictors (variable \(j=1,\ldots,k\) at time \(t\)).
  • \(p\) number of lags of the target and the predictors.1
  • \(\phi_i\), \(i=0,\dots,p\), and \(\theta_{j,s}\), \(j=1,\dots,k\), \(s=1,\ldots,r\): Parameters of the model.
  • \(\epsilon_{t+h}\): error term.

For instance, to predict Industrial Prediction using as predictor inflation and the 3-month t-bill, the target variable is INDPRO, and the predictors are CPIAUSL and TB3MS. Notice that the target and the predictors are the transformed variables. Thus, if we use INDPRO as the target, we are predicting the log-difference of industrial production, which is a good approximation for its month-to-month percentage change.

By convention, the data ranges from \(t=1,\ldots,T\), where \(T\) is the last period, we have data (for the df_cleaned dataset, \(T\) corresponds to January 2024).

Forecasting with ARX

Suppose that we know the parameters of the model for the moment. To obtain a forecast for \(Y_{T+h}\), the \(h\)-step ahead forecast, we calculate \[ \begin{aligned} \hat{Y}_{T+h} &= \alpha + \phi_0 Y_T + \phi_1 Y_{T-1} + \dots + \phi_p Y_{T-p} \\ &\,\,\quad \quad + \theta_{0,1} X_{T,1} + \theta_{1,1} X_{T-1,1} + \dots + \theta_{p,1} X_{T-p,1} \\ &\,\,\quad \quad + \dots + \theta_{0,k} X_{T,k} + \dots + \theta_{p,k} X_{T-p,k}\\ &= \alpha + \sum_{i=0}^p \phi_i Y_{T-i} + \sum_{j=1}^k\sum_{s=0}^p \theta_{s,j} X_{T-s,j} \end{aligned} \]

While this is conceptually easy, implementing the steps needed to calculate the forecast is insidious, and care must be taken to ensure we are calculating the correct forecast.

To start, it is convenient to rewrite the model in Equation 1 as a linear model \[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{u}, \] where \(\boldsymbol{\beta}\) is the vector (of size \(1+(1+p)(1+k)\)) \[ \boldsymbol{\beta}=\begin{pmatrix}\alpha\\ \phi_{0}\\ \vdots\\ \phi_{p}\\ \theta_{0,1}\\ \vdots\\ \theta_{p,1}\\ \vdots\\ \theta_{1,k}\\ \vdots\\ \theta_{p,k} \end{pmatrix}, \] \(\mathbf{y}\) and \(\mathbf{X}\) are respectively given by \[ \mathbf{y} = \begin{pmatrix} y_{p+h+1} \\ y_{p+h+2}\\ \vdots \\ y_{T} \end{pmatrix} \] and \[ \mathbf{X} = \begin{pmatrix}1 & Y_{p+1} & Y_{p} & \cdots & Y_{1} & X_{p+1,1} & X_{p,1} & \cdots & X_{1,1} & X_{p+1,k} & X_{p,k} & \cdots & X_{1,k}\\ \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots\\ 1 & Y_{T-h-1} & Y_{T-h-2} & \cdots & Y_{T-h-p-1} & X_{T-h-1,1} & X_{T-h-2,1} & \cdots & X_{T-h-p-1,1} & X_{T-h-1,k} & X_{T-h-2,k} & \cdots & X_{T-h-p-1,k}\\ 1 & Y_{T-h} & Y_{T-h-1} & \cdots & Y_{T-h-p} & X_{T-h,1} & X_{T-h-1,1} & \cdots & X_{T-h-p,1} & X_{T-h,k} & X_{T-h-1,k} & & X_{T-h-p,k} \end{pmatrix}. \] The size of \(\mathbf{X}\) is \((T-p-h)\times 1+(1+k)(1+p)\) and that of \(\mathbf{y}\) is \(T-h-p\).

The matrix \(\mathbf{X}\) can be obtained in the following way:

Yraw = df_cleaned['INDPRO']
Xraw = df_cleaned[['CPIAUCSL', 'TB3MS']]

num_lags  = 4  ## this is p
num_leads = 1  ## this is h
X = pd.DataFrame()
## Add the lagged values of Y
col = 'INDPRO'
for lag in range(0,num_lags+1):
        # Shift each column in the DataFrame and name it with a lag suffix
        X[f'{col}_lag{lag}'] = Yraw.shift(lag)

for col in Xraw.columns:
    for lag in range(0,num_lags+1):
        # Shift each column in the DataFrame and name it with a lag suffix
        X[f'{col}_lag{lag}'] = Xraw[col].shift(lag)
## Add a column on ones (for the intercept)
X.insert(0, 'Ones', np.ones(len(X)))


## X is now a DataFrame
X.head()
Ones INDPRO_lag0 INDPRO_lag1 INDPRO_lag2 INDPRO_lag3 INDPRO_lag4 CPIAUCSL_lag0 CPIAUCSL_lag1 CPIAUCSL_lag2 CPIAUCSL_lag3 CPIAUCSL_lag4 TB3MS_lag0 TB3MS_lag1 TB3MS_lag2 TB3MS_lag3 TB3MS_lag4
0 1.0 0.014306 NaN NaN NaN NaN -0.000690 NaN NaN NaN NaN 0.10 NaN NaN NaN NaN
1 1.0 0.021075 0.014306 NaN NaN NaN 0.001380 -0.000690 NaN NaN NaN 0.15 0.10 NaN NaN NaN
2 1.0 0.014955 0.021075 0.014306 NaN NaN 0.001723 0.001380 -0.000690 NaN NaN -0.11 0.15 0.10 NaN NaN
3 1.0 0.001141 0.014955 0.021075 0.014306 NaN 0.000339 0.001723 0.001380 -0.00069 NaN 0.37 -0.11 0.15 0.10 NaN
4 1.0 -0.024240 0.001141 0.014955 0.021075 0.014306 -0.001034 0.000339 0.001723 0.00138 -0.00069 -0.01 0.37 -0.11 0.15 0.1

Note that the first \(p=\)4 rows of X have missing values.

The vector \(\mathbf{y}\) can be similarly created as

y = Yraw.shift(-num_leads)
y
0      0.021075
1      0.014955
2      0.001141
3     -0.024240
4     -0.034465
         ...   
774   -0.008147
775    0.002915
776    0.000005
777   -0.000951
778         NaN
Name: INDPRO, Length: 779, dtype: float64

The variable y has missing values in the last h positions (it is not possible to lead the target beyond \(T\)).

Notice also that we must keep the last row of X for constructing the forecast.

Now we create two numpy arrays with the missing values stripped:

## Save last row of X (converted to numpy)
X_T = X.iloc[-1:].values
## Subset getting only rows of X and y from p+1 to h-1
## and convert to numpy array
y = y.iloc[num_lags:-num_leads].values
X = X.iloc[num_lags:-num_leads].values
X_T
array([[ 1.00000000e+00, -9.51056709e-04,  4.86991246e-06,
         2.91450984e-03, -8.14668061e-03,  9.25729878e-04,
         7.21400503e-04,  7.26467817e-04,  8.11330254e-04,
        -2.79891559e-03, -1.51527417e-03, -2.00000000e-02,
        -3.00000000e-02, -7.00000000e-02,  2.00000000e-02,
         2.00000000e-02]])

Now, we have to estimate the parameters and obtain the forecast.

Estimation

The parameters of the model can be estimated by OLS (the OLS estimates the coefficient of the linear projection of \(Y_{t+h}\) on its lags and the lags of \(X_t\)).

The OLS estimator of \(\boldsymbol{\beta}\) is \[ \hat{\boldsymbol{\beta}} = (X'X)^{-1}X'Y. \]

While this is the formula used to describe the OLS estimator, from a computational poijnt of view is much better to define the estimator as the solution of the set of linear equations: \[ (X'X)\boldsymbol{\beta} = X'Y \]

The function solve can be used to solve this linear system of equation.

from numpy.linalg import solve
# Solving for the OLS estimator beta: (X'X)^{-1} X'Y
beta_ols = solve(X.T @ X, X.T @ y)

## Produce the One step ahead forecast
## % change month-to-month INDPRO
forecast = X_T@beta_ols*100
forecast
array([0.08445815])

The variable forecast contains now the one-step ahead (\(h=1\) forecast) of INDPRO. Since INDPRO has been transformed in logarithmic differences, we are forecasting the percentage change (and multiplying by 100 gives the forecast in percentage points).

To obtain the \(h\)-step ahead forecast, we must repeat all the above steps using a different h.

Forecasting Exercise

How good is the forecast that the model is producing? One thing we could do to assess the forecast’s quality is to wait for the new data on industrial production and see how big the forecasting error is. However, this evaluation would not be appropriate because we need to evaluate the forecast as if it were repeatedly used to forecast future values of the target variables. To properly assess the model and its ability to forecast INDPRO, we must keep producing forecasts and calculating the errors as new data arrive. This procedure would take time as we must wait for many months to have a series of errors that is large enough.

A different approach is to do what is called a Real-time evaluation. A Real-time evaluation procedure consists of putting ourselves in the shoes of a forecaster who has been using the forecasting model for a long time.

In practice, that is what are the steps to follow to do a Real-time evaluation of the model:

  1. Set \(T\) such that the last observation of df coincides with December 1999;

  2. Estimate the model using the data up to \(T\)

  3. Produce \(\hat{Y}_{T+1}, \hat{Y}_{T+2}, \dots, \hat{Y}_{T+H}\)

  4. Since we have the actual data for January, February, …, we can calculate the forecasting errors of our model \[ \hat{e}_{T+h} = \hat{Y}_{T+h} - Y_{T+h}, \,\, h = 1,\ldots, H. \]

  5. Set \(T = T+1\) and do all the steps above.

The process results are a series of forecasting errors we can evaluate using several metrics. The most commonly used is the MSFE, which is defined as \[ MSFE_h = \frac{1}{J}\sum_{j=1}^J \hat{e}_{T+j+h}^2, \] where \(J\) is the number of errors we collected through our real-time evaluation.

This assignment asks you to perform a real-time evaluation assessment of our simple forecasting model and calculate the MSFE for steps \(h=1,4,8\).

As a bonus, we can evaluate different models and see how they perform differently. For instance, you might consider different numbers of lags and/or different variables in the model.

Hint

A sensible way to structure the code for real-time evaluation is to use several functions. For instance, you can define a function that calculates the forecast given the DataFrame.

def calculate_forecast(df_cleaned, p = 4, H = [1,4,8], end_date = '12/1/1999',target = 'INDPRO', xvars = ['CPIAUCSL', 'TB3MS']):

    ## Subset df_cleaned to use only data up to end_date
    rt_df = df_cleaned[df_cleaned['sasdate'] <= pd.Timestamp(end_date)]
    ## Get the actual values of target at different steps ahead
    Y_actual = []
    for h in H:
        os = pd.Timestamp(end_date) + pd.DateOffset(months=h)
        Y_actual.append(df_cleaned[df_cleaned['sasdate'] == os][target]*100)
        ## Now Y contains the true values at T+H (multiplying * 100)

    Yraw = rt_df[target]
    Xraw = rt_df[xvars]

    X = pd.DataFrame()
    ## Add the lagged values of Y
    for lag in range(0,p):
        # Shift each column in the DataFrame and name it with a lag suffix
        X[f'{target}_lag{lag}'] = Yraw.shift(lag)

    for col in Xraw.columns:
        for lag in range(0,p):
            X[f'{col}_lag{lag}'] = Xraw[col].shift(lag)
    
    ## Add a column on ones (for the intercept)
    X.insert(0, 'Ones', np.ones(len(X)))
    
    ## Save last row of X (converted to numpy)
    X_T = X.iloc[-1:].values

    ## While the X will be the same, Y needs to be leaded differently
    Yhat = []
    for h in H:
        y_h = Yraw.shift(-h)
        ## Subset getting only rows of X and y from p+1 to h-1
        y = y_h.iloc[p:-h].values
        X_ = X.iloc[p:-h].values
        # Solving for the OLS estimator beta: (X'X)^{-1} X'Y
        beta_ols = solve(X_.T @ X_, X_.T @ y)
        ## Produce the One step ahead forecast
        ## % change month-to-month INDPRO
        Yhat.append(X_T@beta_ols*100)

    ## Now calculate the forecasting error and return

    return np.array(Y_actual) - np.array(Yhat)

With this function, you can calculate real-time errors by looping over the end_date to ensure you end the loop at the right time.

t0 = pd.Timestamp('12/1/1999')
e = []
T = []
for j in range(0, 10):
    t0 = t0 + pd.DateOffset(months=1)
    print(f'Using data up to {t0}')
    ehat = calculate_forecast(df_cleaned, p = 4, H = [1,4,8], end_date = t0)
    e.append(ehat.flatten())
    T.append(t0)

## Create a pandas DataFrame from the list
edf = pd.DataFrame(e)
## Calculate the RMSFE, that is, the square root of the MSFE
np.sqrt(edf.apply(np.square).mean())
Using data up to 2000-01-01 00:00:00
Using data up to 2000-02-01 00:00:00
Using data up to 2000-03-01 00:00:00
Using data up to 2000-04-01 00:00:00
Using data up to 2000-05-01 00:00:00
Using data up to 2000-06-01 00:00:00
Using data up to 2000-07-01 00:00:00
Using data up to 2000-08-01 00:00:00
Using data up to 2000-09-01 00:00:00
Using data up to 2000-10-01 00:00:00
0    0.337110
1    0.512690
2    0.624035
dtype: float64

You may change the function calculate_forecast to output also the actual data end the forecast, so you can, for instance, construct a plot.

Working with github

The https://github.com/uniroma/comptools-assignments repository contains four files:

  1. comptools_ass1.qmd
  2. assignment1_julia.jl
  3. assignment1_python.py
  4. assignment1_r.r

The comptools_ass1.qmd is this file (in quarto format). The repository also contains the pdf and the html version of this file.

The other files, assignment1_julia.jl, assignment1_julia.py, and assignment1_julia.py, are the starter kit of the code you have to write in Julia, R, and Python. You can use them to start your work.

Using Visual Studio Code

Unless you are familiar with the command line and you are using Linux or MacOS, the best way to interact with github is through Visual Studio Code. Instructions on how to install Visual Studio Code on Windows are here. For MacOS the instructions are here.

Visual Studio Code has an extension system. The extensions extend VSCode adding features that simplify writing and interacting with code.

The extensions you should install are

There are many other extensions that you might find useful. For those, google is your friend.

Cloning the repository

Cloning a repository from GitHub into Visual Studio Code (VSCode) allows you to work on projects directly from your local machine. Here’s a detailed step-by-step guide on how to clone the repository https://github.com/uniroma/comptools-assignments into VSCode:

  1. Open Visual Studio Code
  • Start by opening Visual Studio Code on your computer.
  1. Access the Command Palette
  • With VSCode open, access the Command Palette by pressing Ctrl+Shift+P on Windows/Linux or Cmd+Shift+P on macOS. This is where you can quickly access various commands in VSCode.
  1. Clone Repository
  • In the Command Palette, type “Git: Clone” and select the option Git: Clone from the list that appears. This action will prompt VSCode to clone a repository.
  1. Enter the Repository URL
  • A text box asking for the repository URL will appear at the top of the VSCode window. Enter https://github.com/uniroma/comptools-assignments and press Enter. (This is the URL of the assignment 1 repository).
  1. Choose a Directory
  • Next, VSCode will ask you to select a directory where you want to clone the repository. Navigate through your file system and choose a directory that will be the local storage place for the repository. The directory should exist. Create it if it doesn’t. Once selected, the cloning process will start.
  1. Open the Cloned Repository
  • After the repository has been successfully cloned, a notification will pop up in the bottom right corner of VSCode with the option to Open Repository. Click on it. If you missed the notification, you can navigate to the directory where you cloned the repository and open it manually from within VSCode by going to File > Open Folder.
  1. Start Working
  • Now that the repository is cloned and opened in VSCode, you can start working on the project. You can edit files, commit changes, and manage branches directly from VSCode.
Tip
  • Ensure you have Git installed on your computer to use the Git features in VSCode. If you do not have Git installed, you can download it from the official Git website. Instructions to install Git.
  • If you are working with GitHub repositories frequently, consider authenticating with GitHub in VSCode to streamline your workflow. This can be done through the Command Palette by finding the GitHub: Sign in command.

Make changes and commit them to the repository

  1. Make Your Changes
    • Open the repository you have cloned in VSCode.
    • Navigate to the file(s) you wish to change within the VSCode Explorer pane.
    • Make the necessary modifications or additions to the file(s). These changes can be anything from fixing a bug to adding new features.
  2. Review Your Changes
  • After making changes, you can see which files have been modified by looking at the Source Control panel. You can access this panel by clicking on the Source Control icon (it looks like a branch or a fork) on the sidebar or by pressing Ctrl+Shift+G (Windows/Linux) or Cmd+Shift+G (macOS) and searching for Show control panel.
  • Modified files are listed within the Source Control panel. Click on a file to view the changes (differences) between your working version and the last commit. Lines added are highlighted in green, and lines removed are highlighted in red.
  1. Stage Your Changes
  • Before committing, you need to stage your changes. Staging is like preparing and reviewing exactly what changes you will commit to without making the commit final.
  • You can stage changes by right-clicking on a modified file in the Source Control panel and selecting Stage Changes. Alternatively, you can stage all changes at once by clicking the + icon next to the “Changes” header.
  1. Commit Your Changes
  • After staging your changes, commit them to the repository. To do this, type a commit message in the message box at the top of the Source Control panel. This message should briefly describe the changes you’ve made.
  • Press Ctrl+Enter (Windows/Linux) or Cmd+Enter (macOS) to commit the staged changes (search for Git: Commit). Alternatively, you can click the checkmark icon (Commit) at the top of the Source Control panel.
  • Before committing, you should enter a commit message that briefly describes the changes that you have made. Commit messages are essential for making the project’s history understandable for yourself and the other collaborators.
  1. Push Your Changes
  • If you’re working with a remote repository (like one hosted on GitHub), you must push your commits to update the remote repository with your local changes.
  • You can push changes by clicking on the three dots (...) menu in the Source Control panel, navigating to Push and selecting it. If you’re using Git in VSCode for the first time, you might be prompted to enter your GitHub credentials or authenticate in another way.
Tip

It’s a good practice to pull changes from the remote repository before starting your work session (to ensure you’re working with the latest version) and before pushing your changes (to ensure no conflicts). You can pull changes by clicking on the three dots (...) menu in the Source Control panel and selecting Pull.

The following video explores in more detail how to use git in VSCode.

Footnotes

  1. Theoretically, the number of lags for the target variables and the predictors could be different. Here, we consider the simpler case in which both are equal.↩︎