{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Packages\n",
"To run this notebook the following packages are needed:\n",
"\n",
"- `DataFrames.jl`\n",
"- `Distributions.jl`\n",
"- `Plots.jl`\n",
"- `StatPlots.jl`\n",
"- `PlotlyJS.jl`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Pkg.update();\n",
"#Pkg.add(\"DataFrames\")\n",
"#Pkg.add(\"Distributions\")\n",
"#Pkg.add(\"Plots\")\n",
"using DataFrames, Distributions, StatPlots;\n",
"pyplot();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# The linear model\n",
"Consider the linear model:\n",
"$$\n",
"y_i = \\beta^{0}_{0} + \\beta^{0}_{1} x_{i1} + \\beta^{0}_{2} x_{i2} + u_i = x_i\\beta^{0} + u_i \\quad i=1,\\ldots,n.\n",
"$$\n",
"Usually we make assumptions about the joint distribution of $\\{u_i,x_i\\}$. Instead we will simulate $\\{u_i,x_i\\}$ from different distributions. The choice of these distribution will determine the stochastic properties of ${u_i,x_i}$. This will hopefully clarify the relationship between assumptions and properties of the OLS estimator. \n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Case 1: $u_i$ and $x_i$ are independent. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"using Distributions\n",
"srand(1) ## Set seed \n",
"n = 25; ## number of obs\n",
"β⁰ = [1.0, 0.3, 0.8]\n",
"χ² = Chisq(2);\n",
"N = Normal(0,1)\n",
"X = [ones(n) rand(N, n) rand(N, n)];\n",
"u = (rand(χ², n)-mean(χ²))/√var(χ²);\n",
"y = X*β⁰ + u;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The code generates a vector $u$ of size $(n\\times 1)$, a matrix of regressor, $X$, of dimension $n\\times 3$, and a vector of dependent variables, $y$, of dimension $(n\\times 1)$. The first column of $X$ is composed of ones as we are including an intercept. \n",
"\n",
"We will let $x_i$ denote the $i$-th row of $X$.\n",
"\n",
"We are referring to $\\beta^0$ to stress that we want to recover the value used to generate $y$. Throughout, `β⁰ = [1.0, 0.3, 0.8]`. \n",
"\n",
"Since $u$ and $x$ are simulated independently, $u_i$ and $x_i$ are indipendent, that is, $(u_i, x_i)$ is an independent sequence along $i$, but also $u_i$ and $x_i$ are independent for a given $i$. This implies that $$E[u_i | x_i] = 0.$$\n",
"\n",
"In this case $\\beta^0$ is the partial effect which can be consistently estimated by OLS. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### OLS Estimator"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"\n",
"## mimick (X'X)^-1 X'Y\n",
"βhat = inv(X'X)X'y \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"## More efficient solve (X'X)β = X'Y\n",
"βhat = X'X\\X'y"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"## But you can simply do\n",
"βhat = X\\y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, they all give the same estimate of $\\beta^0$, but `X\\y` is both faster and computationally more reliable. \n",
"\n",
"Arguably, $\\hat{\\beta}_0$, $\\hat{\\beta}_2$, $\\hat{\\beta}_3$ and are _relatively_ close to $\\beta^0_{0}$ and $\\beta^0_{1}$, and $\\beta^0_{2}$. But \"closedness\" is a probabilistic concept --- we need to take into account the sampling variation. We know that under the assumptions of this section, $\\hat{\\beta}$ is unbiased for $\\beta^0$. Let see weather this is the case. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"simulations = 100_000 ## This should be infinity, but 100_000 would work just fine\n",
"βhat = Array(Float64, simulations, 3);\n",
"for j = 1:simulations\n",
" X = [ones(n) rand(N, n) rand(N, n)];\n",
" u = (rand(χ², n)-mean(χ²))/√var(χ²);\n",
" y = X*β⁰ + u;\n",
" βhat[j, 1:3] = X\\y;\n",
"end\n",
"## the mean of β approximates E[\\hat{β}] ≈ β₀\n",
"mean(βhat,1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Excellent! On average the OLS estimator is equal to $\\beta^0$. What about the distribution of $\\hat{\\beta}$?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"using Plots\n",
"Plots.histogram(βhat, layout = 3, normalize = true)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"As suggested by the theory, the histograms of $\\hat{\\beta}$ tend to be normal. Let's see how good is this approximation. To this, recall what is the statement of the CLT\n",
"$$\n",
"\\sqrt{n}(\\hat{\\beta} - \\beta^0) \\xrightarrow{d} N(0, \\Sigma).\n",
"$$\n",
"A consistent estimator of $\\Sigma$\n",
"$$\n",
"\\hat{\\Sigma} = \\left(\\frac{X'X}{n}\\right)^{-1} \\left(\\frac{1}{n}\\sum_{i=1}^n \\hat{u}^2_i x_i'x_i\\right) \\left(\\frac{X'X}{n}\\right)^{-1},\n",
"$$\n",
"Since we have simulated the data, we know that $var(u_i|x_i) = \\sigma^2$ and so there is not conditional heteroskedasticity. But we will play fool and we will for the moment assume that we do not know this and we accordingly use a heteroskedastic robust variance estimator. This is not a problem because the matrix estimator above is consistent regardless of whether $u_i$ is conditional heteroskedastic or not. \n",
"\n",
"By consistency of $\\hat{\\Sigma}$ we have that\n",
"$$\n",
"\\sqrt{n}\\hat{\\Sigma}^{-1/2}(\\hat{\\beta} - \\beta^0) \\xrightarrow{d} N(0, I_k),\n",
"$$\n",
"where $\\hat{\\Sigma}^{-1/2}$ is the Cholesky decomposition of $\\hat{\\Sigma}^{-1}$. The joint normality above implies that each element of the vector $\\hat{beta}$ is asymptotically normal:\n",
"$$\n",
"\\sqrt{n}\\frac{(\\hat{\\beta}_j - \\beta^0_j)}{\\sqrt{\\hat{\\Sigma}_{jj}}} \\xrightarrow{d} N(0, 1).\n",
"$$\n",
"\n",
"Let's now code $\\hat{\\Sigma}$. We will write a function taking two arguments: $X$ and $\\hat{u}$. We will also write a function that given y, X and $\\hat{\\beta}$ returns $\\hat{u}$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"function Sigmahat(X, uhat)\n",
" n = size(X,1) ## Number of observations\n",
" XXi = inv(X'X/n) ## (∑xᵢ'xᵢ/n)⁻¹\n",
" Xu = X.*uhat ## Xu = xᵢ ûᵢ\n",
" XuuX= Xu'Xu/n ## XuuX = ∑ûᵢ² xᵢ'xᵢ/n\n",
" XXi*XuuX*XXi ## Σ̂\n",
"end\n",
"\n",
"function uh(y, X, βhat)\n",
" y-X*βhat\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"n = 25\n",
"simulations = 100_000\n",
"βhat = Array(Float64, simulations, 3);\n",
"σ²hat = Array(Float64, simulations, 3);\n",
"for j = 1:simulations\n",
" X = [ones(n) rand(N, n) rand(N, n)];\n",
" u = (rand(χ², n)-mean(χ²))/√var(χ²);\n",
" y = X*β⁰ + u;\n",
" βhat[j, 1:3] = X\\y;\n",
" uhat = uh(y, X, βhat[j, 1:3])\n",
" σ²hat[j, 1:3] = diag(Sigmahat(X, uhat)) \n",
"end\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"The output of the following code is a matrix, βhat, and a matrix σ²hat. Each row of σhat contains the diagonal elements of $\\hat{\\Sigma}$ relative to the sample drawn corresponding to the given row. This means that we can standardize βhat and then compare the resulting histogram with that of a standard normal. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"z = √n*(βhat.-β⁰')./√σ²hat;\n",
"Plots.histogram(z, normalize = true, layout = 3)\n",
"Plots.plot!(Normal(0,1), subplot = 1)\n",
"Plots.plot!(Normal(0,1), subplot = 2)\n",
"Plots.plot!(Normal(0,1), subplot = 3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We should have expected somewhat big divergences between the densities of $\\hat{\\beta}_0$, $\\hat{\\beta}_1$, and $\\hat{\\beta}_2$ and those of $N(0,1)$: we are using the asymptotic approximation when the actual sample size is $n=25$.\n",
"\n",
"Let's do the same exercise, but with $n=200$. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"n = 200; ## <- one hundred observations\n",
"simulations = 100_000;\n",
"βhat = Array(Float64, simulations, 3);\n",
"σ²hat = Array(Float64, simulations, 3);\n",
"for j = 1:simulations\n",
" X = [ones(n) rand(N, n) rand(N, n)];\n",
" u = (rand(χ², n)-mean(χ²))/√var(χ²);\n",
" y = X*β⁰ + u;\n",
" βhat[j, 1:3] = X\\y;\n",
" uhat = uh(y, X, βhat[j, 1:3])\n",
" σ²hat[j, 1:3] = diag(Sigmahat(X, uhat)) \n",
"end\n",
"z = √n*(βhat.-β⁰')./√σ²hat;\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"Plots.histogram(z, normalize = true, layout = 3)\n",
"Plots.plot!(Normal(0,1), subplot = 1)\n",
"Plots.plot!(Normal(0,1), subplot = 2)\n",
"Plots.plot!(Normal(0,1), subplot = 3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now the densitites of $\\hat{\\beta}_0$, $\\hat{\\beta}_1$, and $\\hat{\\beta}_2$ are much closer to those of a standard normal. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Case 2: $E[x_i'u_i]=0$, but $E[u_i|x_i]\\neq 0$\n",
"We now generate the data in such a way that $E[x_i'u_i]=0$ but $u_i$ and $x_i$ are mean dependent. In particular, the data are generated as follows\n",
"$$\n",
"y_i = x_i\\beta^0 + u_i\n",
"$$\n",
"where $u_i = x_{i2}^2 - 1 + \\epsilon_i$, $\\epsilon_i \\sim N(0,1)$, $x_{i2} \\sim N(0,1)$. \n",
"It should be clear that $E[u_i|x_i] \\neq 0$ because\n",
"$$\n",
"E[u_i|x_i] = E[x_{i2}^2-1+\\epsilon_i|x_i] = x_{i2}^2 -1 + E[\\epsilon_i|x_i] = x_{i2}^2 - 1,\n",
"$$\n",
"but since $E[x_{i}'u_{i}] = \\pmatrix{0 \\\\ Ex_{i2}^3 \\\\ 0}$ which is zero because the third moment of a normal is zero. \n",
"The partial effect is given by\n",
"$$\n",
"\\frac{\\partial E[y_i|x_i]}{\\partial x_{i2}} = \\beta^0_{2} + 2x_{i2},\n",
"$$\n",
"which as expected is not equal to $\\beta^0_{2}$.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"n = 25\n",
"simulations = 100_000\n",
"βhat = Array(Float64, simulations, 3);\n",
"σ²hat = Array(Float64, simulations, 3);\n",
"for j = 1:simulations\n",
" X = [ones(n) rand(N, n) rand(N, n)];\n",
" u = X[:,2].^2 + rand(N,n)-1;\n",
" y = X*β⁰ + u;\n",
" βhat[j, 1:3] = X\\y;\n",
" uhat = uh(y, X, βhat[j, 1:3])\n",
" σ²hat[j, 1:3] = diag(Sigmahat(X, uhat)) \n",
"end\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"mean(βhat,1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is a surprise! It seems the $\\hat{\\beta}_0$, $\\hat{\\beta}_1$ and $\\hat{\\beta}_2$ are _nearly_ unbiased. The are consistent, that is as $n\\to\\infty$ the difference between $\\hat{\\beta}$ and $\\beta^0$ gets smaller and smaller. \n",
"\n",
"We can recover $\\hat{\\beta}_1$ and $\\hat{\\beta}_2$ very well, but they are useless for estimating the partial effect of $x_{i2}$ on the conditional expectation. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Case 3: $E[x_i'u_i]\\neq0$ and $E[u_i|x_i]\\neq 0$\n",
"We now generate the data in such a way that both assumptions are violated. In particular, the data are generated as follows\n",
"$$\n",
"y_i = x_i\\beta^0 + u_i\n",
"$$\n",
"where $u_i = x_{i2}^3 + \\epsilon_i$, $\\epsilon_i \\sim N(0,1)$, $x_{i2} \\sim N(0,1)$. \n",
"It should be clear that $E[u_i|x_i] \\neq 0$ because\n",
"$$\n",
"E[u_i|x_i] = E[x_{i2}^3+\\epsilon_i|x_i] = x_{i2}^2 + E[\\epsilon_i|x_i] = x_{i2}^3,\n",
"$$\n",
"but also $E[x_{i}'u_{i}] = \\pmatrix{0 \\\\ Ex_{i2}^4 \\\\ 0}$ which is not zero because the fourt moment of a normal is not zero. \n",
"The partial effect is given by\n",
"$$\n",
"\\frac{\\partial E[y_i|x_i]}{\\partial x_{i2}} = \\beta^0_{2} + 3x_{i2}^2.\n",
"$$\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"n = 25\n",
"simulations = 100_000\n",
"βhat = Array(Float64, simulations, 3);\n",
"σ²hat = Array(Float64, simulations, 3);\n",
"for j = 1:simulations\n",
" X = [ones(n) rand(N, n) rand(N, n)];\n",
" u = X[:,2].^3 + rand(N,n);\n",
" y = X*β⁰ + u;\n",
" βhat[j, 1:3] = X\\y;\n",
" uhat = uh(y, X, βhat[j, 1:3])\n",
" σ²hat[j, 1:3] = diag(Sigmahat(X, uhat)) \n",
"end"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"mean(βhat,1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case, $\\hat{\\beta}$ is estimating the linear projection that is given by $\\beta = (1,3,0.8)$. The linear projection is not equal to $\\beta^0$ and not equal to the partial effect. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 0.5.0",
"language": "julia",
"name": "julia-0.5"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "0.5.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}