cvCovEst()
identifies the optimal covariance matrix
estimator from among a set of candidate estimators.
A numeric data.frame
, matrix
, or similar object.
A list
of estimator functions to be considered in
the cross-validated estimator selection procedure.
A named list
of arguments corresponding to
the hyperparameters of covariance matrix estimators in estimators
.
The name of each list element should match the name of an estimator passed
to estimators
. Each element of the estimator_params
is itself
a named list
, with the names corresponding to a given estimator's
hyperparameter(s). The hyperparameter(s) may be in the form of a single
numeric
or a numeric
vector. If no hyperparameter is needed
for a given estimator, then the estimator need not be listed.
A function
indicating the loss function to be used.
This defaults to the Frobenius loss, cvMatrixFrobeniusLoss()
.
An observation-based version, cvFrobeniusLoss()
, is also made
available. Additionally, the cvScaledMatrixFrobeniusLoss()
is
included for situations in which dat
's variables are of different
scales.
A character
indicating the cross-validation scheme
to be employed. There are two options: (1) V-fold cross-validation, via
"v_folds"
; and (2) Monte Carlo cross-validation, via "mc"
.
Defaults to Monte Carlo cross-validation.
A numeric
between 0 and 1 indicating the proportion
of observations to be included in the validation set of each Monte Carlo
cross-validation fold.
An integer
larger than or equal to 1 indicating the
number of folds to use for cross-validation. The default is 10, regardless
of the choice of cross-validation scheme.
A logical
option indicating whether to run the main
cross-validation loop with future_lapply()
. This
is passed directly to cross_validate()
.
Not currently used. Permits backward compatibility.
A list
of results containing the following elements:
estimate
- A matrix
corresponding to the estimate of
the optimal covariance matrix estimator.
estimator
- A character
indicating the optimal
estimator and corresponding hyperparameters, if any.
risk_df
- A tibble
providing the
cross-validated risk estimates of each estimator.
cv_df
- A tibble
providing each
estimators' loss over the folds of the cross-validated procedure.
args
- A named list
containing arguments passed to
cvCovEst
.
cvCovEst(
dat = mtcars,
estimators = c(
linearShrinkLWEst, thresholdingEst, sampleCovEst
),
estimator_params = list(
thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1))
)
)
#> $estimate
#> mpg cyl disp hp drat wt
#> mpg 36.324103 -9.1723790 -633.09721 -320.732056 2.19506351 -5.1166847
#> cyl -9.172379 3.1895161 199.66028 101.931452 -0.66836694 1.3673710
#> disp -633.097208 199.6602823 15360.79983 6721.158669 -47.06401915 107.6842040
#> hp -320.732056 101.9314516 6721.15867 4700.866935 -16.45110887 44.1926613
#> drat 2.195064 -0.6683669 -47.06402 -16.451109 0.28588135 -0.3727207
#> wt -5.116685 1.3673710 107.68420 44.192661 -0.37272073 0.9573790
#> qsec 4.509149 -1.8868548 -96.05168 -86.770081 0.08714073 -0.3054816
#> vs 2.017137 -0.7298387 -44.37762 -24.987903 0.11864919 -0.2736613
#> am 1.803931 -0.4657258 -36.56401 -8.320565 0.19015121 -0.3381048
#> gear 2.135685 -0.6491935 -50.80262 -6.358871 0.27598790 -0.4210806
#> carb -5.363105 1.5201613 79.06875 83.036290 -0.07840726 0.6757903
#> qsec vs am gear carb
#> mpg 4.50914919 2.01713710 1.80393145 2.1356855 -5.36310484
#> cyl -1.88685484 -0.72983871 -0.46572581 -0.6491935 1.52016129
#> disp -96.05168145 -44.37762097 -36.56401210 -50.8026210 79.06875000
#> hp -86.77008065 -24.98790323 -8.32056452 -6.3588710 83.03629032
#> drat 0.08714073 0.11864919 0.19015121 0.2759879 -0.07840726
#> wt -0.30548161 -0.27366129 -0.33810484 -0.4210806 0.67579032
#> qsec 3.19316613 0.67056452 -0.20495968 -0.2804032 -1.89411290
#> vs 0.67056452 0.25403226 0.04233871 0.0766129 -0.46370968
#> am -0.20495968 0.04233871 0.24899194 0.2923387 0.04637097
#> gear -0.28040323 0.07661290 0.29233871 0.5443548 0.32661290
#> carb -1.89411290 -0.46370968 0.04637097 0.3266129 2.60887097
#>
#> $estimator
#> [1] "sampleCovEst, hyperparameters = NA"
#>
#> $risk_df
#> # A tibble: 5 × 3
#> estimator hyperparameters cv_risk
#> <chr> <chr> <dbl>
#> 1 sampleCovEst hyperparameters = NA 252072263.
#> 2 thresholdingEst gamma = 0.1 252072263.
#> 3 thresholdingEst gamma = 0.2 252072263.
#> 4 thresholdingEst gamma = 0.3 252072265.
#> 5 linearShrinkLWEst hyperparameters = NA 255542897.
#>
#> $cv_df
#> # A tibble: 50 × 4
#> estimator hyperparameters loss fold
#> <chr> <chr> <dbl> <int>
#> 1 linearShrinkLWEst hyperparameters = NA 40551934. 1
#> 2 thresholdingEst gamma = 0.1 46314151. 1
#> 3 thresholdingEst gamma = 0.2 46314152. 1
#> 4 thresholdingEst gamma = 0.3 46314154. 1
#> 5 sampleCovEst hyperparameters = NA 46314152. 1
#> 6 linearShrinkLWEst hyperparameters = NA 100116843. 2
#> 7 thresholdingEst gamma = 0.1 90248341. 2
#> 8 thresholdingEst gamma = 0.2 90248341. 2
#> 9 thresholdingEst gamma = 0.3 90248341. 2
#> 10 sampleCovEst hyperparameters = NA 90248341. 2
#> # … with 40 more rows
#>
#> $args
#> $args$cv_loss
#> <quosure>
#> expr: ^cvMatrixFrobeniusLoss
#> env: 0x7fa1851f8128
#>
#> $args$cv_scheme
#> [1] "v_fold"
#>
#> $args$mc_split
#> [1] 0.5
#>
#> $args$v_folds
#> [1] 10
#>
#> $args$parallel
#> [1] FALSE
#>
#>
#> attr(,"class")
#> [1] "cvCovEst"