Computational Tools for Macroeconometrics

Assignment 1

Introduction

This assignment introduces students to practical and theoretical aspects of macroeconometrics, focusing on forecasting using the FRED-MD dataset. Students will learn to handle macroeconomic data, perform necessary transformations, apply univariate models to predict key economic indicators and to evaluate these forecasts.

The FRED-MD dataset

The FRED-MD dataset is a comprehensive monthly database for macroeconomic research compiled by the Federal Reserve Bank of St. Louis. It features a wide array of economic indicators. The list of economic indicators can be obtained from the paper accompanying the data pdf.

The data can be downloaded here. The page contains all the different vintages of the data.

Let us start to download the current.csv file:

import pandas as pd

# Load the dataset
df = pd.read_csv('~/Downloads/current.csv')

# Clean the DataFrame by removing the row with transformation codes
df_cleaned = df.drop(index=0)
df_cleaned.reset_index(drop=True, inplace=True)
df_cleaned['sasdate'] = pd.to_datetime(df_cleaned['sasdate'], format='%m/%d/%Y')
df_cleaned

	sasdate	RPI	W875RX1	DPCERA3M086SBEA	CMRMTSPLx	RETAILx	INDPRO	IPFPNSS	IPFINAL	IPCONGD	...	DNDGRG3M086SBEA	DSERRG3M086SBEA	CES0600000008	CES2000000008	CES3000000008	UMCSENTx	DTCOLNVHFNM	DTCTHFNM	INVEST	VIXCLSx
0	1959-01-01	2583.560	2426.0	15.188	2.766768e+05	18235.77392	21.9665	23.3891	22.2688	31.7011	...	18.294	10.152	2.13	2.45	2.04	NaN	6476.00	12298.00	84.2043	NaN
1	1959-02-01	2593.596	2434.8	15.346	2.787140e+05	18369.56308	22.3966	23.7048	22.4617	31.9337	...	18.302	10.167	2.14	2.46	2.05	NaN	6476.00	12298.00	83.5280	NaN
2	1959-03-01	2610.396	2452.7	15.491	2.777753e+05	18523.05762	22.7193	23.8483	22.5719	31.9337	...	18.289	10.185	2.15	2.45	2.07	NaN	6508.00	12349.00	81.6405	NaN
3	1959-04-01	2627.446	2470.0	15.435	2.833627e+05	18534.46600	23.2032	24.1927	22.9026	32.4374	...	18.300	10.221	2.16	2.47	2.08	NaN	6620.00	12484.00	81.8099	NaN
4	1959-05-01	2642.720	2486.4	15.622	2.853072e+05	18679.66354	23.5528	24.3936	23.1231	32.5925	...	18.280	10.238	2.17	2.48	2.08	95.3	6753.00	12646.00	80.7315	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
776	2023-09-01	19111.748	15741.9	116.594	1.507530e+06	705304.00000	103.2096	101.0935	101.3665	102.1034	...	120.395	123.976	29.90	34.55	26.62	67.9	508808.61	913938.95	5074.6108	15.0424
777	2023-10-01	19145.402	15784.6	116.663	1.505477e+06	703528.00000	102.3722	100.5292	100.5527	101.1664	...	120.040	124.228	29.97	34.67	26.65	63.8	513229.64	918210.64	5015.5456	19.0462
778	2023-11-01	19213.108	15859.9	117.127	1.514733e+06	703336.00000	102.6710	100.9362	101.2159	101.8557	...	119.325	124.551	30.26	34.96	26.89	61.3	517434.30	922552.40	4999.7208	13.8563
779	2023-12-01	19251.946	15899.0	117.773	1.530296e+06	706180.00000	102.6715	100.8332	101.2843	101.9884	...	119.193	124.917	30.45	35.01	27.14	69.7	522366.13	928336.14	5077.4222	12.6960
780	2024-01-01	19377.558	15948.8	117.639	NaN	700291.00000	102.5739	100.9984	101.7258	102.6235	...	118.745	125.662	30.56	35.21	27.22	NaN	NaN	NaN	5105.3504	13.3453

781 rows × 128 columns

# Extract transformation codes
transformation_codes = df.iloc[0, 1:].to_frame().reset_index()
transformation_codes.columns = ['Series', 'Transformation_Code']

The transformation codes map variables to the transformations we must apply to each variable to render them (approximately) stationary. The data frame transformation_codes has the variable’s name (Series) and its transformation (Transformation_Code). There are six possible transformations (\(x_t\) denotes the variable to which the transformation is to be applied):

transformation_code=1: no trasformation
transformation_code=2: \(\Delta x_t\)
transformation_code=3: \(\Delta^2 x_t\)
transformation_code=4: \(log(x_t)\)
transformation_code=5: \(\Delta log(x_t)\)
transformation_code=6: \(\Delta^2 log(x_t)\)
transformation_code=7: \(\Delta (x_t/x_{t-1} - 1)\)

We can apply these transformations using the following code:

import numpy as np

# Function to apply transformations based on the transformation code
def apply_transformation(series, code):
    if code == 1:
        # No transformation
        return series
    elif code == 2:
        # First difference
        return series.diff()
    elif code == 3:
        # Second difference
        return series.diff().diff()
    elif code == 4:
        # Log
        return np.log(series)
    elif code == 5:
        # First difference of log
        return np.log(series).diff()
    elif code == 6:
        # Second difference of log
        return np.log(series).diff().diff()
    elif code == 7:
        # Delta (x_t/x_{t-1} - 1)
        return series.pct_change()
    else:
        raise ValueError("Invalid transformation code")

# Applying the transformations to each column in df_cleaned based on transformation_codes
for series_name, code in transformation_codes.values:
    df_cleaned[series_name] = apply_transformation(df_cleaned[series_name].astype(float), float(code))


1df_cleaned = df_cleaned[2:]
2df_cleaned.reset_index(drop=True, inplace=True)
df_cleaned.head()

1: Since some transformations induce missing values, we drop the first two observations of the dataset
2: We reset the index so that the first observation of the dataset has index 0

	sasdate	RPI	W875RX1	DPCERA3M086SBEA	CMRMTSPLx	RETAILx	INDPRO	IPFPNSS	IPFINAL	IPCONGD	...	DNDGRG3M086SBEA	DSERRG3M086SBEA	CES0600000008	CES2000000008	CES3000000008	UMCSENTx	DTCOLNVHFNM	DTCTHFNM	INVEST	VIXCLSx
0	1959-03-01	0.006457	0.007325	0.009404	-0.003374	0.008321	0.014306	0.006035	0.004894	0.000000	...	-0.001148	0.000292	-0.000022	-0.008147	0.004819	NaN	0.004929	0.004138	-0.014792	NaN
1	1959-04-01	0.006510	0.007029	-0.003622	0.019915	0.000616	0.021075	0.014338	0.014545	0.015650	...	0.001312	0.001760	-0.000022	0.012203	-0.004890	NaN	0.012134	0.006734	0.024929	NaN
2	1959-05-01	0.005796	0.006618	0.012043	0.006839	0.007803	0.014955	0.008270	0.009582	0.004770	...	-0.001695	-0.001867	-0.000021	-0.004090	-0.004819	NaN	0.002828	0.002020	-0.015342	NaN
3	1959-06-01	0.003068	0.003012	0.003642	-0.000097	0.009064	0.001141	0.007034	0.007128	-0.004767	...	0.003334	0.001946	-0.004619	0.003992	0.004796	NaN	0.009726	0.009007	-0.012252	NaN
4	1959-07-01	-0.000580	-0.000762	-0.003386	0.012155	-0.000330	-0.024240	0.001168	0.008249	0.013054	...	-0.001204	-0.000013	0.000000	-0.004040	-0.004796	NaN	-0.004631	-0.001000	0.029341	NaN

5 rows × 128 columns

1import matplotlib.pyplot as plt
import matplotlib.dates as mdates

2series_to_plot = ['INDPRO', 'CPIAUCSL', 'TB3MS']
series_names = ['Industrial Production',
                'Inflation (CPI)',
                '3-month Treasury Bill rate']


# Create a figure and a grid of subplots
3fig, axs = plt.subplots(len(series_to_plot), 1, figsize=(8, 15))

# Iterate over the selected series and plot each one
for ax, series_name, plot_title in zip(axs, series_to_plot, series_names):
4    if series_name in df_cleaned.columns:
5        dates = pd.to_datetime(df_cleaned['sasdate'], format='%m/%d/%Y')
6        ax.plot(dates, df_cleaned[series_name], label=plot_title)
7        ax.xaxis.set_major_locator(mdates.YearLocator(base=5))
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
8        ax.set_title(plot_title)
9        ax.set_xlabel('Year')
        ax.set_ylabel('Transformed Value')
10        ax.legend(loc='upper left')
11        plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, ha='right')
    else:
        ax.set_visible(False)  # Hide plots for which the data is not available

12plt.tight_layout()
13plt.show()

1: We use library matplotlib to plot
2: We consider three series (INDPRO, CPIAUCSL, TB3MS) and assign them human-readable names (“Industrial Production”, “Inflation (CPI)”, “3-month Treasury Bill rate.”).
3: We create a figure with three (len(series_to_plot)) subplots arranged vertically. The figure size is 8x15 inches.
4: We check if the series exists in each series df_cleaned DataFrame columns.
5: We convert the sasdate column to datetime format (not necessary, since sasdate was converter earlier)
6: We plot each series against the sasdate on the corresponding subplot, labeling the plot with its human-readable name.
7: We format the x-axis to display ticks and label the x-axis with dates taken every five years.
8: Each subplot is titled with the name of the economic indicator.
9: We label the x-axis “Year,” and the y-axis “Transformed Value,” to indicate that the data was transformed before plotting.
10: A legend is added to the upper left of each subplot for clarity.
11: We rotate the x-axis labels by 45 degrees to prevent overlap and improve legibility.
12: plt.tight_layout() automatically adjusts subplot parameters to give specified padding and avoid overlap.
13: plt.show() displays the figure with its subplots.

Forecasting in Time Series

Forecasting in time series analysis involves using historical data to predict future values. The objective is to model the conditional expectation of a time series based on past observations.

Direct Forecasts

Direct forecasting involves modeling the target variable directly at the desired forecast horizon. Unlike iterative approaches, which forecast one step ahead and then use those forecasts as inputs for subsequent steps, direct forecasting directly models the relationship between past observations and future value.

ARX Models

Autoregressive Moving with predictors (ARX) models are a class of univariate time series models that extend ARMA models by incorporating exogenous (independent) variables. These models are formulated as follows:

\[ \begin{aligned} Y_{t+h} &= \alpha + \phi_0 Y_t + \phi_1 Y_{t-1} + \dots + \phi_p Y_{t-p} + \theta_{0,1} X_{t,1} + \theta_{1,1} X_{t-1,1} + \dots + \theta_{p,1} X_{t-p,1} + \dots + \theta_{0,k} X_{t,k} + \dots + \theta_{p,k} X_{t-p,k} + u_{t+h}\\ &= \alpha + \sum_{i=0}^p \phi_i Y_{t-i} + \sum_{j=1}^k\sum_{s=0}^p \theta_{s,j} X_{t-s,j} + \epsilon_{t+h} \end{aligned} \tag{1}\]

\(Y_{t+h}\): The target variable at time \(t+h\).
\(X_{t,j}\): Predictors (variable \(j=1,\ldots,k\) at time \(t\)).
\(p\) number of lags of the target and the predictors.¹
\(\phi_i\), \(i=0,\dots,p\), and \(\theta_{j,s}\), \(j=1,\dots,k\), \(s=1,\ldots,r\): Parameters of the model.
\(\epsilon_{t+h}\): error term.

For instance, to predict Industrial Prediction using as predictor inflation and the 3-month t-bill, the target variable is INDPRO, and the predictors are CPIAUSL and TB3MS. Notice that the target and the predictors are the transformed variables. Thus, if we use INDPRO as the target, we are predicting the log-difference of industrial production, which is a good approximation for its month-to-month percentage change.

By convention, the data ranges from \(t=1,\ldots,T\), where \(T\) is the last period, we have data (for the df_cleaned dataset, \(T\) corresponds to January 2024).

Forecasting with ARX

Suppose that we know the parameters of the model for the moment. To obtain a forecast for \(Y_{T+h}\), the \(h\)-step ahead forecast, we calculate \[ \begin{aligned} \hat{Y}_{T+h} &= \alpha + \phi_0 Y_T + \phi_1 Y_{T-1} + \dots + \phi_p Y_{T-p} \\ &\,\,\quad \quad + \theta_{0,1} X_{T,1} + \theta_{1,1} X_{T-1,1} + \dots + \theta_{p,1} X_{T-p,1} \\ &\,\,\quad \quad + \dots + \theta_{0,k} X_{T,k} + \dots + \theta_{p,k} X_{T-p,k}\\ &= \alpha + \sum_{i=0}^p \phi_i Y_{T-i} + \sum_{j=1}^k\sum_{s=0}^p \theta_{s,j} X_{T-s,j} \end{aligned} \]

While this is conceptually easy, implementing the steps needed to calculate the forecast is insidious, and care must be taken to ensure we are calculating the correct forecast.

To start, it is convenient to rewrite the model in Equation 1 as a linear model \[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{u}, \] where \(\boldsymbol{\beta}\) is the vector (of size \(1+(1+p)(1+k)\)) \[ \boldsymbol{\beta}=\begin{pmatrix}\alpha\\ \phi_{0}\\ \vdots\\ \phi_{p}\\ \theta_{0,1}\\ \vdots\\ \theta_{p,1}\\ \vdots\\ \theta_{1,k}\\ \vdots\\ \theta_{p,k} \end{pmatrix}, \] \(\mathbf{y}\) and \(\mathbf{X}\) are respectively given by \[ \mathbf{y} = \begin{pmatrix} y_{p+h+1} \\ y_{p+h+2}\\ \vdots \\ y_{T} \end{pmatrix} \] and \[ \mathbf{X} = \begin{pmatrix}1 & Y_{p+1} & Y_{p} & \cdots & Y_{1} & X_{p+1,1} & X_{p,1} & \cdots & X_{1,1} & X_{p+1,k} & X_{p,k} & \cdots & X_{1,k}\\ \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots\\ 1 & Y_{T-h-1} & Y_{T-h-2} & \cdots & Y_{T-h-p-1} & X_{T-h-1,1} & X_{T-h-2,1} & \cdots & X_{T-h-p-1,1} & X_{T-h-1,k} & X_{T-h-2,k} & \cdots & X_{T-h-p-1,k}\\ 1 & Y_{T-h} & Y_{T-h-1} & \cdots & Y_{T-h-p} & X_{T-h,1} & X_{T-h-1,1} & \cdots & X_{T-h-p,1} & X_{T-h,k} & X_{T-h-1,k} & & X_{T-h-p,k} \end{pmatrix}. \] The size of \(\mathbf{X}\) is \((T-p-h)\times 1+(1+k)(1+p)\) and that of \(\mathbf{y}\) is \(T-h-p\).

The matrix \(\mathbf{X}\) can be obtained in the following way:

Yraw = df_cleaned['INDPRO']
Xraw = df_cleaned[['CPIAUCSL', 'TB3MS']]

num_lags  = 4  ## this is p
num_leads = 1  ## this is h
X = pd.DataFrame()
## Add the lagged values of Y
col = 'INDPRO'
for lag in range(0,num_lags+1):
        # Shift each column in the DataFrame and name it with a lag suffix
        X[f'{col}_lag{lag}'] = Yraw.shift(lag)

for col in Xraw.columns:
    for lag in range(0,num_lags+1):
        # Shift each column in the DataFrame and name it with a lag suffix
        X[f'{col}_lag{lag}'] = Xraw[col].shift(lag)
## Add a column on ones (for the intercept)
X.insert(0, 'Ones', np.ones(len(X)))


## X is now a DataFrame
X.head()

	Ones	INDPRO_lag0	INDPRO_lag1	INDPRO_lag2	INDPRO_lag3	INDPRO_lag4	CPIAUCSL_lag0	CPIAUCSL_lag1	CPIAUCSL_lag2	CPIAUCSL_lag3	CPIAUCSL_lag4	TB3MS_lag0	TB3MS_lag1	TB3MS_lag2	TB3MS_lag3	TB3MS_lag4
0	1.0	0.014306	NaN	NaN	NaN	NaN	-0.000690	NaN	NaN	NaN	NaN	0.10	NaN	NaN	NaN	NaN
1	1.0	0.021075	0.014306	NaN	NaN	NaN	0.001380	-0.000690	NaN	NaN	NaN	0.15	0.10	NaN	NaN	NaN
2	1.0	0.014955	0.021075	0.014306	NaN	NaN	0.001723	0.001380	-0.000690	NaN	NaN	-0.11	0.15	0.10	NaN	NaN
3	1.0	0.001141	0.014955	0.021075	0.014306	NaN	0.000339	0.001723	0.001380	-0.00069	NaN	0.37	-0.11	0.15	0.10	NaN
4	1.0	-0.024240	0.001141	0.014955	0.021075	0.014306	-0.001034	0.000339	0.001723	0.00138	-0.00069	-0.01	0.37	-0.11	0.15	0.1

Note that the first \(p=\)4 rows of X have missing values.

The vector \(\mathbf{y}\) can be similarly created as

y = Yraw.shift(-num_leads)
y

0      0.021075
1      0.014955
2      0.001141
3     -0.024240
4     -0.034465
         ...   
774   -0.008147
775    0.002915
776    0.000005
777   -0.000951
778         NaN
Name: INDPRO, Length: 779, dtype: float64

The variable y has missing values in the last h positions (it is not possible to lead the target beyond \(T\)).

Notice also that we must keep the last row of X for constructing the forecast.

Now we create two numpy arrays with the missing values stripped:

## Save last row of X (converted to numpy)
X_T = X.iloc[-1:].values
## Subset getting only rows of X and y from p+1 to h-1
## and convert to numpy array
y = y.iloc[num_lags:-num_leads].values
X = X.iloc[num_lags:-num_leads].values

X_T

array([[ 1.00000000e+00, -9.51056709e-04,  4.86991246e-06,
         2.91450984e-03, -8.14668061e-03,  9.25729878e-04,
         7.21400503e-04,  7.26467817e-04,  8.11330254e-04,
        -2.79891559e-03, -1.51527417e-03, -2.00000000e-02,
        -3.00000000e-02, -7.00000000e-02,  2.00000000e-02,
         2.00000000e-02]])

Now, we have to estimate the parameters and obtain the forecast.

Estimation

The parameters of the model can be estimated by OLS (the OLS estimates the coefficient of the linear projection of \(Y_{t+h}\) on its lags and the lags of \(X_t\)).

The OLS estimator of \(\boldsymbol{\beta}\) is \[ \hat{\boldsymbol{\beta}} = (X'X)^{-1}X'Y. \]

While this is the formula used to describe the OLS estimator, from a computational poijnt of view is much better to define the estimator as the solution of the set of linear equations: \[ (X'X)\boldsymbol{\beta} = X'Y \]

The function solve can be used to solve this linear system of equation.

from numpy.linalg import solve
# Solving for the OLS estimator beta: (X'X)^{-1} X'Y
beta_ols = solve(X.T @ X, X.T @ y)

## Produce the One step ahead forecast
## % change month-to-month INDPRO
forecast = X_T@beta_ols*100
forecast

array([0.08445815])

The variable forecast contains now the one-step ahead (\(h=1\) forecast) of INDPRO. Since INDPRO has been transformed in logarithmic differences, we are forecasting the percentage change (and multiplying by 100 gives the forecast in percentage points).

To obtain the \(h\)-step ahead forecast, we must repeat all the above steps using a different h.

Forecasting Exercise

How good is the forecast that the model is producing? One thing we could do to assess the forecast’s quality is to wait for the new data on industrial production and see how big the forecasting error is. However, this evaluation would not be appropriate because we need to evaluate the forecast as if it were repeatedly used to forecast future values of the target variables. To properly assess the model and its ability to forecast INDPRO, we must keep producing forecasts and calculating the errors as new data arrive. This procedure would take time as we must wait for many months to have a series of errors that is large enough.

A different approach is to do what is called a Real-time evaluation. A Real-time evaluation procedure consists of putting ourselves in the shoes of a forecaster who has been using the forecasting model for a long time.

In practice, that is what are the steps to follow to do a Real-time evaluation of the model:

Set \(T\) such that the last observation of df coincides with December 1999;
Estimate the model using the data up to \(T\)
Produce \(\hat{Y}_{T+1}, \hat{Y}_{T+2}, \dots, \hat{Y}_{T+H}\)
Since we have the actual data for January, February, …, we can calculate the forecasting errors of our model \[ \hat{e}_{T+h} = \hat{Y}_{T+h} - Y_{T+h}, \,\, h = 1,\ldots, H. \]
Set \(T = T+1\) and do all the steps above.

The process results are a series of forecasting errors we can evaluate using several metrics. The most commonly used is the MSFE, which is defined as \[ MSFE_h = \frac{1}{J}\sum_{j=1}^J \hat{e}_{T+j+h}^2, \] where \(J\) is the number of errors we collected through our real-time evaluation.

This assignment asks you to perform a real-time evaluation assessment of our simple forecasting model and calculate the MSFE for steps \(h=1,4,8\).

As a bonus, we can evaluate different models and see how they perform differently. For instance, you might consider different numbers of lags and/or different variables in the model.

Hint

A sensible way to structure the code for real-time evaluation is to use several functions. For instance, you can define a function that calculates the forecast given the DataFrame.

def calculate_forecast(df_cleaned, p = 4, H = [1,4,8], end_date = '12/1/1999',target = 'INDPRO', xvars = ['CPIAUCSL', 'TB3MS']):

    ## Subset df_cleaned to use only data up to end_date
    rt_df = df_cleaned[df_cleaned['sasdate'] <= pd.Timestamp(end_date)]
    ## Get the actual values of target at different steps ahead
    Y_actual = []
    for h in H:
        os = pd.Timestamp(end_date) + pd.DateOffset(months=h)
        Y_actual.append(df_cleaned[df_cleaned['sasdate'] == os][target]*100)
        ## Now Y contains the true values at T+H (multiplying * 100)

    Yraw = rt_df[target]
    Xraw = rt_df[xvars]

    X = pd.DataFrame()
    ## Add the lagged values of Y
    for lag in range(0,p):
        # Shift each column in the DataFrame and name it with a lag suffix
        X[f'{target}_lag{lag}'] = Yraw.shift(lag)

    for col in Xraw.columns:
        for lag in range(0,p):
            X[f'{col}_lag{lag}'] = Xraw[col].shift(lag)
    
    ## Add a column on ones (for the intercept)
    X.insert(0, 'Ones', np.ones(len(X)))
    
    ## Save last row of X (converted to numpy)
    X_T = X.iloc[-1:].values

    ## While the X will be the same, Y needs to be leaded differently
    Yhat = []
    for h in H:
        y_h = Yraw.shift(-h)
        ## Subset getting only rows of X and y from p+1 to h-1
        y = y_h.iloc[p:-h].values
        X_ = X.iloc[p:-h].values
        # Solving for the OLS estimator beta: (X'X)^{-1} X'Y
        beta_ols = solve(X_.T @ X_, X_.T @ y)
        ## Produce the One step ahead forecast
        ## % change month-to-month INDPRO
        Yhat.append(X_T@beta_ols*100)

    ## Now calculate the forecasting error and return

    return np.array(Y_actual) - np.array(Yhat)

With this function, you can calculate real-time errors by looping over the end_date to ensure you end the loop at the right time.

t0 = pd.Timestamp('12/1/1999')
e = []
T = []
for j in range(0, 10):
    t0 = t0 + pd.DateOffset(months=1)
    print(f'Using data up to {t0}')
    ehat = calculate_forecast(df_cleaned, p = 4, H = [1,4,8], end_date = t0)
    e.append(ehat.flatten())
    T.append(t0)

## Create a pandas DataFrame from the list
edf = pd.DataFrame(e)
## Calculate the RMSFE, that is, the square root of the MSFE
np.sqrt(edf.apply(np.square).mean())

Using data up to 2000-01-01 00:00:00
Using data up to 2000-02-01 00:00:00
Using data up to 2000-03-01 00:00:00
Using data up to 2000-04-01 00:00:00
Using data up to 2000-05-01 00:00:00
Using data up to 2000-06-01 00:00:00
Using data up to 2000-07-01 00:00:00
Using data up to 2000-08-01 00:00:00
Using data up to 2000-09-01 00:00:00
Using data up to 2000-10-01 00:00:00

0    0.337110
1    0.512690
2    0.624035
dtype: float64

You may change the function calculate_forecast to output also the actual data end the forecast, so you can, for instance, construct a plot.

Working with github

The https://github.com/uniroma/comptools-assignments repository contains four files:

comptools_ass1.qmd
assignment1_julia.jl
assignment1_python.py
assignment1_r.r

The comptools_ass1.qmd is this file (in quarto format). The repository also contains the pdf and the html version of this file.

The other files, assignment1_julia.jl, assignment1_julia.py, and assignment1_julia.py, are the starter kit of the code you have to write in Julia, R, and Python. You can use them to start your work.

Using Visual Studio Code

Unless you are familiar with the command line and you are using Linux or MacOS, the best way to interact with github is through Visual Studio Code. Instructions on how to install Visual Studio Code on Windows are here. For MacOS the instructions are here.

Visual Studio Code has an extension system. The extensions extend VSCode adding features that simplify writing and interacting with code.

The extensions you should install are

Julia Extension Instructions
Python Extension Instruction
R Extension Instructions

There are many other extensions that you might find useful. For those, google is your friend.

Cloning the repository

Cloning a repository from GitHub into Visual Studio Code (VSCode) allows you to work on projects directly from your local machine. Here’s a detailed step-by-step guide on how to clone the repository https://github.com/uniroma/comptools-assignments into VSCode:

Open Visual Studio Code

Start by opening Visual Studio Code on your computer.

Access the Command Palette

With VSCode open, access the Command Palette by pressing Ctrl+Shift+P on Windows/Linux or Cmd+Shift+P on macOS. This is where you can quickly access various commands in VSCode.

Clone Repository

In the Command Palette, type “Git: Clone” and select the option Git: Clone from the list that appears. This action will prompt VSCode to clone a repository.

Enter the Repository URL

A text box asking for the repository URL will appear at the top of the VSCode window. Enter https://github.com/uniroma/comptools-assignments and press Enter. (This is the URL of the assignment 1 repository).

Choose a Directory

Next, VSCode will ask you to select a directory where you want to clone the repository. Navigate through your file system and choose a directory that will be the local storage place for the repository. The directory should exist. Create it if it doesn’t. Once selected, the cloning process will start.

Open the Cloned Repository

After the repository has been successfully cloned, a notification will pop up in the bottom right corner of VSCode with the option to Open Repository. Click on it. If you missed the notification, you can navigate to the directory where you cloned the repository and open it manually from within VSCode by going to File > Open Folder.

Start Working

Now that the repository is cloned and opened in VSCode, you can start working on the project. You can edit files, commit changes, and manage branches directly from VSCode.

Tip

Ensure you have Git installed on your computer to use the Git features in VSCode. If you do not have Git installed, you can download it from the official Git website. Instructions to install Git.
If you are working with GitHub repositories frequently, consider authenticating with GitHub in VSCode to streamline your workflow. This can be done through the Command Palette by finding the GitHub: Sign in command.

Make changes and commit them to the repository

Make Your Changes
- Open the repository you have cloned in VSCode.
- Navigate to the file(s) you wish to change within the VSCode Explorer pane.
- Make the necessary modifications or additions to the file(s). These changes can be anything from fixing a bug to adding new features.
Review Your Changes

After making changes, you can see which files have been modified by looking at the Source Control panel. You can access this panel by clicking on the Source Control icon (it looks like a branch or a fork) on the sidebar or by pressing Ctrl+Shift+G (Windows/Linux) or Cmd+Shift+G (macOS) and searching for Show control panel.
Modified files are listed within the Source Control panel. Click on a file to view the changes (differences) between your working version and the last commit. Lines added are highlighted in green, and lines removed are highlighted in red.

Stage Your Changes

Before committing, you need to stage your changes. Staging is like preparing and reviewing exactly what changes you will commit to without making the commit final.
You can stage changes by right-clicking on a modified file in the Source Control panel and selecting Stage Changes. Alternatively, you can stage all changes at once by clicking the + icon next to the “Changes” header.

Commit Your Changes

After staging your changes, commit them to the repository. To do this, type a commit message in the message box at the top of the Source Control panel. This message should briefly describe the changes you’ve made.
Press Ctrl+Enter (Windows/Linux) or Cmd+Enter (macOS) to commit the staged changes (search for Git: Commit). Alternatively, you can click the checkmark icon (Commit) at the top of the Source Control panel.
Before committing, you should enter a commit message that briefly describes the changes that you have made. Commit messages are essential for making the project’s history understandable for yourself and the other collaborators.

Push Your Changes

If you’re working with a remote repository (like one hosted on GitHub), you must push your commits to update the remote repository with your local changes.
You can push changes by clicking on the three dots (...) menu in the Source Control panel, navigating to Push and selecting it. If you’re using Git in VSCode for the first time, you might be prompted to enter your GitHub credentials or authenticate in another way.

Tip

It’s a good practice to pull changes from the remote repository before starting your work session (to ensure you’re working with the latest version) and before pushing your changes (to ensure no conflicts). You can pull changes by clicking on the three dots (...) menu in the Source Control panel and selecting Pull.

The following video explores in more detail how to use git in VSCode.

Footnotes

Theoretically, the number of lags for the target variables and the predictors could be different. Here, we consider the simpler case in which both are equal.↩︎