Skip to content

Jinja #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2,970 changes: 1,620 additions & 1,350 deletions Ch02-statlearn-lab.ipynb

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions Ch03-linreg-lab.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,7 @@ As mentioned above, there is an existing function to add a line to a plot --- `a


Next we examine some diagnostic plots, several of which were discussed
in Section~\ref{Ch3:problems.sec}.
in Section 3.3.3.
We can find the fitted values and residuals
of the fit as attributes of the `results` object.
Various influence measures describing the regression model
Expand Down Expand Up @@ -440,7 +440,7 @@ We can access the individual components of `results` by name
and
`np.sqrt(results.scale)` gives us the RSE.

Variance inflation factors (section~\ref{Ch3:problems.sec}) are sometimes useful
Variance inflation factors (section 3.3.3) are sometimes useful
to assess the effect of collinearity in the model matrix of a regression model.
We will compute the VIFs in our multiple regression fit, and use the opportunity to introduce the idea of *list comprehension*.

Expand Down
4 changes: 2 additions & 2 deletions Ch03-linreg-lab.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1533,7 +1533,7 @@
"metadata": {},
"source": [
"Next we examine some diagnostic plots, several of which were discussed\n",
"in Section~\\ref{Ch3:problems.sec}.\n",
"in Section 3.3.3.\n",
"We can find the fitted values and residuals\n",
"of the fit as attributes of the `results` object.\n",
"Various influence measures describing the regression model\n",
Expand Down Expand Up @@ -2142,7 +2142,7 @@
"and\n",
"`np.sqrt(results.scale)` gives us the RSE.\n",
"\n",
"Variance inflation factors (section~\\ref{Ch3:problems.sec}) are sometimes useful\n",
"Variance inflation factors (section 3.3.3) are sometimes useful\n",
"to assess the effect of collinearity in the model matrix of a regression model.\n",
"We will compute the VIFs in our multiple regression fit, and use the opportunity to introduce the idea of *list comprehension*.\n",
"\n",
Expand Down
24 changes: 12 additions & 12 deletions Ch04-classification-lab.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -405,7 +405,7 @@ lda.fit(X_train, L_train)

```
Here we have used the list comprehensions introduced
in Section~\ref{Ch3-linreg-lab:multivariate-goodness-of-fit}. Looking at our first line above, we see that the right-hand side is a list
in Section 3.6.4. Looking at our first line above, we see that the right-hand side is a list
of length two. This is because the code `for M in [X_train, X_test]` iterates over a list
of length two. While here we loop over a list,
the list comprehension method works when looping over any iterable object.
Expand Down Expand Up @@ -454,7 +454,7 @@ lda.scalings_

```

These values provide the linear combination of `Lag1` and `Lag2` that are used to form the LDA decision rule. In other words, these are the multipliers of the elements of $X=x$ in (\ref{Ch4:bayes.multi}).
These values provide the linear combination of `Lag1` and `Lag2` that are used to form the LDA decision rule. In other words, these are the multipliers of the elements of $X=x$ in (4.24).
If $-0.64\times `Lag1` - 0.51 \times `Lag2` $ is large, then the LDA classifier will predict a market increase, and if it is small, then the LDA classifier will predict a market decline.

```{python}
Expand All @@ -463,7 +463,7 @@ lda_pred = lda.predict(X_test)
```

As we observed in our comparison of classification methods
(Section~\ref{Ch4:comparison.sec}), the LDA and logistic
(Section 4.5), the LDA and logistic
regression predictions are almost identical.

```{python}
Expand Down Expand Up @@ -522,7 +522,7 @@ The LDA classifier above is the first classifier from the
`sklearn` library. We will use several other objects
from this library. The objects
follow a common structure that simplifies tasks such as cross-validation,
which we will see in Chapter~\ref{Ch5:resample}. Specifically,
which we will see in Chapter 5. Specifically,
the methods first create a generic classifier without
referring to any data. This classifier is then fit
to data with the `fit()` method and predictions are
Expand Down Expand Up @@ -808,7 +808,7 @@ feature_std.std()

```

Notice that the standard deviations are not quite $1$ here; this is again due to some procedures using the $1/n$ convention for variances (in this case `scaler()`), while others use $1/(n-1)$ (the `std()` method). See the footnote on page~\pageref{Ch4-varformula}.
Notice that the standard deviations are not quite $1$ here; this is again due to some procedures using the $1/n$ convention for variances (in this case `scaler()`), while others use $1/(n-1)$ (the `std()` method). See the footnote on page 183.
In this case it does not matter, as long as the variables are all on the same scale.

Using the function `train_test_split()` we now split the observations into a test set,
Expand Down Expand Up @@ -875,7 +875,7 @@ This is double the rate that one would obtain from random guessing.
The number of neighbors in KNN is referred to as a *tuning parameter*, also referred to as a *hyperparameter*.
We do not know *a priori* what value to use. It is therefore of interest
to see how the classifier performs on test data as we vary these
parameters. This can be achieved with a `for` loop, described in Section~\ref{Ch2-statlearn-lab:for-loops}.
parameters. This can be achieved with a `for` loop, described in Section 2.3.8.
Here we use a for loop to look at the accuracy of our classifier in the group predicted to purchase
insurance as we vary the number of neighbors from 1 to 5:

Expand All @@ -902,7 +902,7 @@ As a comparison, we can also fit a logistic regression model to the
data. This can also be done
with `sklearn`, though by default it fits
something like the *ridge regression* version
of logistic regression, which we introduce in Chapter~\ref{Ch6:varselect}. This can
of logistic regression, which we introduce in Chapter 6. This can
be modified by appropriately setting the argument `C` below. Its default
value is 1 but by setting it to a very large number, the algorithm converges to the same solution as the usual (unregularized)
logistic regression estimator discussed above.
Expand Down Expand Up @@ -946,7 +946,7 @@ confusion_table(logit_labels, y_test)

```
## Linear and Poisson Regression on the Bikeshare Data
Here we fit linear and Poisson regression models to the `Bikeshare` data, as described in Section~\ref{Ch4:sec:pois}.
Here we fit linear and Poisson regression models to the `Bikeshare` data, as described in Section 4.6.
The response `bikers` measures the number of bike rentals per hour
in Washington, DC in the period 2010--2012.

Expand Down Expand Up @@ -987,7 +987,7 @@ variables constant, there are on average about 7 more riders in
February than in January. Similarly there are about 16.5 more riders
in March than in January.

The results seen in Section~\ref{sec:bikeshare.linear}
The results seen in Section 4.6.1
used a slightly different coding of the variables `hr` and `mnth`, as follows:

```{python}
Expand Down Expand Up @@ -1041,7 +1041,7 @@ np.allclose(M_lm.fittedvalues, M2_lm.fittedvalues)
```


To reproduce the left-hand side of Figure~\ref{Ch4:bikeshare}
To reproduce the left-hand side of Figure 4.13
we must first obtain the coefficient estimates associated with
`mnth`. The coefficients for January through November can be obtained
directly from the `M2_lm` object. The coefficient for December
Expand Down Expand Up @@ -1081,7 +1081,7 @@ ax_month.set_ylabel('Coefficient', fontsize=20);

```

Reproducing the right-hand plot in Figure~\ref{Ch4:bikeshare} follows a similar process.
Reproducing the right-hand plot in Figure 4.13 follows a similar process.

```{python}
coef_hr = S2[S2.index.str.contains('hr')]['coef']
Expand Down Expand Up @@ -1116,7 +1116,7 @@ M_pois = sm.GLM(Y, X2, family=sm.families.Poisson()).fit()

```

We can plot the coefficients associated with `mnth` and `hr`, in order to reproduce Figure~\ref{Ch4:bikeshare.pois}. We first complete these coefficients as before.
We can plot the coefficients associated with `mnth` and `hr`, in order to reproduce Figure 4.15. We first complete these coefficients as before.

```{python}
S_pois = summarize(M_pois)
Expand Down
24 changes: 12 additions & 12 deletions Ch04-classification-lab.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2007,7 +2007,7 @@
"metadata": {},
"source": [
"Here we have used the list comprehensions introduced\n",
"in Section~\\ref{Ch3-linreg-lab:multivariate-goodness-of-fit}. Looking at our first line above, we see that the right-hand side is a list\n",
"in Section 3.6.4. Looking at our first line above, we see that the right-hand side is a list\n",
"of length two. This is because the code `for M in [X_train, X_test]` iterates over a list\n",
"of length two. While here we loop over a list,\n",
"the list comprehension method works when looping over any iterable object.\n",
Expand Down Expand Up @@ -2173,7 +2173,7 @@
"id": "f0a4abaf",
"metadata": {},
"source": [
"These values provide the linear combination of `Lag1` and `Lag2` that are used to form the LDA decision rule. In other words, these are the multipliers of the elements of $X=x$ in (\\ref{Ch4:bayes.multi}).\n",
"These values provide the linear combination of `Lag1` and `Lag2` that are used to form the LDA decision rule. In other words, these are the multipliers of the elements of $X=x$ in (4.24).\n",
" If $-0.64\\times `Lag1` - 0.51 \\times `Lag2` $ is large, then the LDA classifier will predict a market increase, and if it is small, then the LDA classifier will predict a market decline."
]
},
Expand All @@ -2200,7 +2200,7 @@
"metadata": {},
"source": [
"As we observed in our comparison of classification methods\n",
" (Section~\\ref{Ch4:comparison.sec}), the LDA and logistic\n",
" (Section 4.5), the LDA and logistic\n",
"regression predictions are almost identical."
]
},
Expand Down Expand Up @@ -2421,7 +2421,7 @@
"`sklearn` library. We will use several other objects\n",
"from this library. The objects\n",
"follow a common structure that simplifies tasks such as cross-validation,\n",
"which we will see in Chapter~\\ref{Ch5:resample}. Specifically,\n",
"which we will see in Chapter 5. Specifically,\n",
"the methods first create a generic classifier without\n",
"referring to any data. This classifier is then fit\n",
"to data with the `fit()` method and predictions are\n",
Expand Down Expand Up @@ -4349,7 +4349,7 @@
"id": "c225f2b2",
"metadata": {},
"source": [
"Notice that the standard deviations are not quite $1$ here; this is again due to some procedures using the $1/n$ convention for variances (in this case `scaler()`), while others use $1/(n-1)$ (the `std()` method). See the footnote on page~\\pageref{Ch4-varformula}.\n",
"Notice that the standard deviations are not quite $1$ here; this is again due to some procedures using the $1/n$ convention for variances (in this case `scaler()`), while others use $1/(n-1)$ (the `std()` method). See the footnote on page 183.\n",
"In this case it does not matter, as long as the variables are all on the same scale.\n",
"\n",
"Using the function `train_test_split()` we now split the observations into a test set,\n",
Expand Down Expand Up @@ -4570,7 +4570,7 @@
"The number of neighbors in KNN is referred to as a *tuning parameter*, also referred to as a *hyperparameter*.\n",
"We do not know *a priori* what value to use. It is therefore of interest\n",
"to see how the classifier performs on test data as we vary these\n",
"parameters. This can be achieved with a `for` loop, described in Section~\\ref{Ch2-statlearn-lab:for-loops}.\n",
"parameters. This can be achieved with a `for` loop, described in Section 2.3.8.\n",
"Here we use a for loop to look at the accuracy of our classifier in the group predicted to purchase\n",
"insurance as we vary the number of neighbors from 1 to 5:"
]
Expand Down Expand Up @@ -4629,7 +4629,7 @@
"data. This can also be done\n",
"with `sklearn`, though by default it fits\n",
"something like the *ridge regression* version\n",
"of logistic regression, which we introduce in Chapter~\\ref{Ch6:varselect}. This can\n",
"of logistic regression, which we introduce in Chapter 6. This can\n",
"be modified by appropriately setting the argument `C` below. Its default\n",
"value is 1 but by setting it to a very large number, the algorithm converges to the same solution as the usual (unregularized)\n",
"logistic regression estimator discussed above.\n",
Expand Down Expand Up @@ -4849,7 +4849,7 @@
"metadata": {},
"source": [
"## Linear and Poisson Regression on the Bikeshare Data\n",
"Here we fit linear and Poisson regression models to the `Bikeshare` data, as described in Section~\\ref{Ch4:sec:pois}.\n",
"Here we fit linear and Poisson regression models to the `Bikeshare` data, as described in Section 4.6.\n",
"The response `bikers` measures the number of bike rentals per hour\n",
"in Washington, DC in the period 2010--2012."
]
Expand Down Expand Up @@ -5322,7 +5322,7 @@
"February than in January. Similarly there are about 16.5 more riders\n",
"in March than in January.\n",
"\n",
"The results seen in Section~\\ref{sec:bikeshare.linear}\n",
"The results seen in Section 4.6.1\n",
"used a slightly different coding of the variables `hr` and `mnth`, as follows:"
]
},
Expand Down Expand Up @@ -5834,7 +5834,7 @@
"id": "41fb2787",
"metadata": {},
"source": [
"To reproduce the left-hand side of Figure~\\ref{Ch4:bikeshare}\n",
"To reproduce the left-hand side of Figure 4.13\n",
"we must first obtain the coefficient estimates associated with\n",
"`mnth`. The coefficients for January through November can be obtained\n",
"directly from the `M2_lm` object. The coefficient for December\n",
Expand Down Expand Up @@ -5988,7 +5988,7 @@
"id": "6c68761a",
"metadata": {},
"source": [
"Reproducing the right-hand plot in Figure~\\ref{Ch4:bikeshare} follows a similar process."
"Reproducing the right-hand plot in Figure 4.13 follows a similar process."
]
},
{
Expand Down Expand Up @@ -6088,7 +6088,7 @@
"id": "8552fb8b",
"metadata": {},
"source": [
"We can plot the coefficients associated with `mnth` and `hr`, in order to reproduce Figure~\\ref{Ch4:bikeshare.pois}. We first complete these coefficients as before."
"We can plot the coefficients associated with `mnth` and `hr`, in order to reproduce Figure 4.15. We first complete these coefficients as before."
]
},
{
Expand Down
24 changes: 12 additions & 12 deletions Ch05-resample-lab.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ for i, d in enumerate(range(1,6)):
cv_error

```
As in Figure~\ref{Ch5:cvplot}, we see a sharp drop in the estimated test MSE between the linear and
As in Figure 5.4, we see a sharp drop in the estimated test MSE between the linear and
quadratic fits, but then no clear improvement from using higher-degree polynomials.

Above we introduced the `outer()` method of the `np.power()`
Expand Down Expand Up @@ -278,7 +278,7 @@ cv_error
Notice that the computation time is much shorter than that of LOOCV.
(In principle, the computation time for LOOCV for a least squares
linear model should be faster than for $k$-fold CV, due to the
availability of the formula~(\ref{Ch5:eq:LOOCVform}) for LOOCV;
availability of the formula~(5.2) for LOOCV;
however, the generic `cross_validate()` function does not make
use of this formula.) We still see little evidence that using cubic
or higher-degree polynomial terms leads to a lower test error than simply
Expand Down Expand Up @@ -325,7 +325,7 @@ incurred by picking different random folds.

## The Bootstrap
We illustrate the use of the bootstrap in the simple example
{of Section~\ref{Ch5:sec:bootstrap},} as well as on an example involving
{of Section 5.2,} as well as on an example involving
estimating the accuracy of the linear regression model on the `Auto`
data set.
### Estimating the Accuracy of a Statistic of Interest
Expand All @@ -340,8 +340,8 @@ in a dataframe.
To illustrate the bootstrap, we
start with a simple example.
The `Portfolio` data set in the `ISLP` package is described
in Section~\ref{Ch5:sec:bootstrap}. The goal is to estimate the
sampling variance of the parameter $\alpha$ given in formula~(\ref{Ch5:min.var}). We will
in Section 5.2. The goal is to estimate the
sampling variance of the parameter $\alpha$ given in formula~(5.7). We will
create a function
`alpha_func()`, which takes as input a dataframe `D` assumed
to have columns `X` and `Y`, as well as a
Expand All @@ -360,7 +360,7 @@ def alpha_func(D, idx):
```
This function returns an estimate for $\alpha$
based on applying the minimum
variance formula (\ref{Ch5:min.var}) to the observations indexed by
variance formula (5.7) to the observations indexed by
the argument `idx`. For instance, the following command
estimates $\alpha$ using all 100 observations.

Expand Down Expand Up @@ -430,7 +430,7 @@ intercept and slope terms for the linear regression model that uses
`horsepower` to predict `mpg` in the `Auto` data set. We
will compare the estimates obtained using the bootstrap to those
obtained using the formulas for ${\rm SE}(\hat{\beta}_0)$ and
${\rm SE}(\hat{\beta}_1)$ described in Section~\ref{Ch3:secoefsec}.
${\rm SE}(\hat{\beta}_1)$ described in Section 3.1.2.

To use our `boot_SE()` function, we must write a function (its
first argument)
Expand Down Expand Up @@ -499,7 +499,7 @@ This indicates that the bootstrap estimate for ${\rm SE}(\hat{\beta}_0)$ is
0.85, and that the bootstrap
estimate for ${\rm SE}(\hat{\beta}_1)$ is
0.0074. As discussed in
Section~\ref{Ch3:secoefsec}, standard formulas can be used to compute
Section 3.1.2, standard formulas can be used to compute
the standard errors for the regression coefficients in a linear
model. These can be obtained using the `summarize()` function
from `ISLP.sm`.
Expand All @@ -513,21 +513,21 @@ model_se


The standard error estimates for $\hat{\beta}_0$ and $\hat{\beta}_1$
obtained using the formulas from Section~\ref{Ch3:secoefsec} are
obtained using the formulas from Section 3.1.2 are
0.717 for the
intercept and
0.006 for the
slope. Interestingly, these are somewhat different from the estimates
obtained using the bootstrap. Does this indicate a problem with the
bootstrap? In fact, it suggests the opposite. Recall that the
standard formulas given in
{Equation~\ref{Ch3:se.eqn} on page~\pageref{Ch3:se.eqn}}
{Equation 3.8 on page 75}
rely on certain assumptions. For example,
they depend on the unknown parameter $\sigma^2$, the noise
variance. We then estimate $\sigma^2$ using the RSS. Now although the
formulas for the standard errors do not rely on the linear model being
correct, the estimate for $\sigma^2$ does. We see
{in Figure~\ref{Ch3:polyplot} on page~\pageref{Ch3:polyplot}} that there is
{in Figure 3.8 on page 99} that there is
a non-linear relationship in the data, and so the residuals from a
linear fit will be inflated, and so will $\hat{\sigma}^2$. Secondly,
the standard formulas assume (somewhat unrealistically) that the $x_i$
Expand All @@ -540,7 +540,7 @@ the results from `sm.OLS`.
Below we compute the bootstrap standard error estimates and the
standard linear regression estimates that result from fitting the
quadratic model to the data. Since this model provides a good fit to
the data (Figure~\ref{Ch3:polyplot}), there is now a better
the data (Figure 3.8), there is now a better
correspondence between the bootstrap estimates and the standard
estimates of ${\rm SE}(\hat{\beta}_0)$, ${\rm SE}(\hat{\beta}_1)$ and
${\rm SE}(\hat{\beta}_2)$.
Expand Down
Loading