cvCovEst() identifies the optimal covariance matrix estimator from among a set of candidate estimators.

cvCovEst(
  dat,
  estimators = c(linearShrinkEst, thresholdingEst, sampleCovEst),
  estimator_params = list(linearShrinkEst = list(alpha = 0), thresholdingEst = list(gamma
    = 0)),
  cv_loss = cvMatrixFrobeniusLoss,
  cv_scheme = "v_fold",
  mc_split = 0.5,
  v_folds = 10L,
  parallel = FALSE,
  ...
)

Arguments

dat

A numeric data.frame, matrix, or similar object.

estimators

A list of estimator functions to be considered in the cross-validated estimator selection procedure.

estimator_params

A named list of arguments corresponding to the hyperparameters of covariance matrix estimators in estimators. The name of each list element should match the name of an estimator passed to estimators. Each element of the estimator_params is itself a named list, with the names corresponding to a given estimator's hyperparameter(s). The hyperparameter(s) may be in the form of a single numeric or a numeric vector. If no hyperparameter is needed for a given estimator, then the estimator need not be listed.

cv_loss

A function indicating the loss function to be used. This defaults to the Frobenius loss, cvMatrixFrobeniusLoss(). An observation-based version, cvFrobeniusLoss(), is also made available. Additionally, the cvScaledMatrixFrobeniusLoss() is included for situations in which dat's variables are of different scales.

cv_scheme

A character indicating the cross-validation scheme to be employed. There are two options: (1) V-fold cross-validation, via "v_folds"; and (2) Monte Carlo cross-validation, via "mc". Defaults to Monte Carlo cross-validation.

mc_split

A numeric between 0 and 1 indicating the proportion of observations to be included in the validation set of each Monte Carlo cross-validation fold.

v_folds

An integer larger than or equal to 1 indicating the number of folds to use for cross-validation. The default is 10, regardless of the choice of cross-validation scheme.

parallel

A logical option indicating whether to run the main cross-validation loop with future_lapply(). This is passed directly to cross_validate().

...

Not currently used. Permits backward compatibility.

Value

A list of results containing the following elements:

  • estimate - A matrix corresponding to the estimate of the optimal covariance matrix estimator.

  • estimator - A character indicating the optimal estimator and corresponding hyperparameters, if any.

  • risk_df - A tibble providing the cross-validated risk estimates of each estimator.

  • cv_df - A tibble providing each estimators' loss over the folds of the cross-validated procedure.

  • args - A named list containing arguments passed to cvCovEst.

Examples

cvCovEst(
  dat = mtcars,
  estimators = c(
    linearShrinkLWEst, thresholdingEst, sampleCovEst
  ),
  estimator_params = list(
    thresholdingEst = list(gamma = seq(0.1, 0.3, 0.1))
  )
)
#> $estimate
#>              mpg         cyl        disp          hp         drat          wt
#> mpg    36.324103  -9.1723790  -633.09721 -320.732056   2.19506351  -5.1166847
#> cyl    -9.172379   3.1895161   199.66028  101.931452  -0.66836694   1.3673710
#> disp -633.097208 199.6602823 15360.79983 6721.158669 -47.06401915 107.6842040
#> hp   -320.732056 101.9314516  6721.15867 4700.866935 -16.45110887  44.1926613
#> drat    2.195064  -0.6683669   -47.06402  -16.451109   0.28588135  -0.3727207
#> wt     -5.116685   1.3673710   107.68420   44.192661  -0.37272073   0.9573790
#> qsec    4.509149  -1.8868548   -96.05168  -86.770081   0.08714073  -0.3054816
#> vs      2.017137  -0.7298387   -44.37762  -24.987903   0.11864919  -0.2736613
#> am      1.803931  -0.4657258   -36.56401   -8.320565   0.19015121  -0.3381048
#> gear    2.135685  -0.6491935   -50.80262   -6.358871   0.27598790  -0.4210806
#> carb   -5.363105   1.5201613    79.06875   83.036290  -0.07840726   0.6757903
#>              qsec           vs           am        gear        carb
#> mpg    4.50914919   2.01713710   1.80393145   2.1356855 -5.36310484
#> cyl   -1.88685484  -0.72983871  -0.46572581  -0.6491935  1.52016129
#> disp -96.05168145 -44.37762097 -36.56401210 -50.8026210 79.06875000
#> hp   -86.77008065 -24.98790323  -8.32056452  -6.3588710 83.03629032
#> drat   0.08714073   0.11864919   0.19015121   0.2759879 -0.07840726
#> wt    -0.30548161  -0.27366129  -0.33810484  -0.4210806  0.67579032
#> qsec   3.19316613   0.67056452  -0.20495968  -0.2804032 -1.89411290
#> vs     0.67056452   0.25403226   0.04233871   0.0766129 -0.46370968
#> am    -0.20495968   0.04233871   0.24899194   0.2923387  0.04637097
#> gear  -0.28040323   0.07661290   0.29233871   0.5443548  0.32661290
#> carb  -1.89411290  -0.46370968   0.04637097   0.3266129  2.60887097
#> 
#> $estimator
#> [1] "sampleCovEst, hyperparameters = NA"
#> 
#> $risk_df
#> # A tibble: 5 × 3
#>   estimator         hyperparameters         cv_risk
#>   <chr>             <chr>                     <dbl>
#> 1 sampleCovEst      hyperparameters = NA 252072263.
#> 2 thresholdingEst   gamma = 0.1          252072263.
#> 3 thresholdingEst   gamma = 0.2          252072263.
#> 4 thresholdingEst   gamma = 0.3          252072265.
#> 5 linearShrinkLWEst hyperparameters = NA 255542897.
#> 
#> $cv_df
#> # A tibble: 50 × 4
#>    estimator         hyperparameters            loss  fold
#>    <chr>             <chr>                     <dbl> <int>
#>  1 linearShrinkLWEst hyperparameters = NA  40551934.     1
#>  2 thresholdingEst   gamma = 0.1           46314151.     1
#>  3 thresholdingEst   gamma = 0.2           46314152.     1
#>  4 thresholdingEst   gamma = 0.3           46314154.     1
#>  5 sampleCovEst      hyperparameters = NA  46314152.     1
#>  6 linearShrinkLWEst hyperparameters = NA 100116843.     2
#>  7 thresholdingEst   gamma = 0.1           90248341.     2
#>  8 thresholdingEst   gamma = 0.2           90248341.     2
#>  9 thresholdingEst   gamma = 0.3           90248341.     2
#> 10 sampleCovEst      hyperparameters = NA  90248341.     2
#> # … with 40 more rows
#> 
#> $args
#> $args$cv_loss
#> <quosure>
#> expr: ^cvMatrixFrobeniusLoss
#> env:  0x7fa1851f8128
#> 
#> $args$cv_scheme
#> [1] "v_fold"
#> 
#> $args$mc_split
#> [1] 0.5
#> 
#> $args$v_folds
#> [1] 10
#> 
#> $args$parallel
#> [1] FALSE
#> 
#> 
#> attr(,"class")
#> [1] "cvCovEst"