Cross-Validated Covariance Matrix Estimation

Authors: Philippe Boileau, Brian Collica, and Nima Hejazi


What’s cvCovEst?

cvCovEst implements an efficient cross-validated procedure for covariance matrix estimation, particularly useful in high-dimensional settings. The general methodology allows for cross-validation to be used to data adaptively identify the optimal estimator of the covariance matrix from a prespecified set of candidate estimators. An overview of the framework is provided in the package vignette. For a more detailed description, see Boileau et al. (2021). A suite of plotting and diagnostic tools are also included.


Installation

For standard use, install cvCovEst from CRAN:

install.packages("cvCovEst")

The development version of the package may be installed from GitHub using remotes:

remotes::install_github("PhilBoileau/cvCovEst")

Example

To illustrate how cvCovEst may be used to select an optimal covariance matrix estimator via cross-validation, consider the following toy example:

library(MASS)
library(cvCovEst)
set.seed(1584)

# generate a 50x50 covariance matrix with unit variances and off-diagonal
# elements equal to 0.5
Sigma <- matrix(0.5, nrow = 50, ncol = 50) + diag(0.5, nrow = 50)

# sample 50 observations from multivariate normal with mean = 0, var = Sigma
dat <- mvrnorm(n = 50, mu = rep(0, 50), Sigma = Sigma)

# run CV-selector
cv_cov_est_out <- cvCovEst(
    dat = dat,
    estimators = c(linearShrinkLWEst, denseLinearShrinkEst,
                   thresholdingEst, poetEst, sampleCovEst),
    estimator_params = list(
      thresholdingEst = list(gamma = c(0.2, 2)),
      poetEst = list(lambda = c(0.1, 0.2), k = c(1L, 2L))
    ),
    cv_loss = cvMatrixFrobeniusLoss,
    cv_scheme = "v_fold",
    v_folds = 5,
  )

# print the table of risk estimates
# NOTE: the estimated covariance matrix is accessible via the `$estimate` slot
cv_cov_est_out$risk_df
#> # A tibble: 9 x 3
#>   estimator            hyperparameters      empirical_risk
#>   <chr>                <chr>                         <dbl>
#> 1 linearShrinkLWEst    hyperparameters = NA           357.
#> 2 poetEst              lambda = 0.2, k = 1            369.
#> 3 poetEst              lambda = 0.2, k = 2            372.
#> 4 poetEst              lambda = 0.1, k = 2            375.
#> 5 poetEst              lambda = 0.1, k = 1            376.
#> 6 denseLinearShrinkEst hyperparameters = NA           379.
#> 7 sampleCovEst         hyperparameters = NA           379.
#> 8 thresholdingEst      gamma = 0.2                    384.
#> 9 thresholdingEst      gamma = 2                      826.

Issues

If you encounter any bugs or have any specific feature requests, please file an issue.


Contributions

Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.


Citation

Please cite the following paper when using the cvCovEst R software package.

@misc{boileau2021,
      title={Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions}, 
      author={Philippe Boileau and Nima S. Hejazi and Mark J. van der Laan and Sandrine Dudoit},
      year={2021},
      eprint={2102.09715},
      archivePrefix={arXiv},
      primaryClass={stat.ME}
}

License

© 2020-2021 Philippe Boileau

The contents of this repository are distributed under the MIT license. See file LICENSE.md for details.


References

Boileau, Philippe, Nima S. Hejazi, Mark J. van der Laan, and Sandrine Dudoit. 2021. “Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions.” https://arxiv.org/abs/2102.09715.