Cross-Validated Covariance Matrix Estimation

Authors: Philippe Boileau, Brian Collica, and Nima Hejazi

## What’s cvCovEst?

cvCovEst implements an efficient cross-validated procedure for covariance matrix estimation, particularly useful in high-dimensional settings. The general methodology allows for cross-validation to be used to data adaptively identify the optimal estimator of the covariance matrix from a prespecified set of candidate estimators. An overview of the framework is provided in the package vignette. For a more detailed description, see Boileau et al. (2021). A suite of plotting and diagnostic tools are also included.

## Installation

For standard use, install cvCovEst from CRAN:

install.packages("cvCovEst")

The development version of the package may be installed from GitHub using remotes:

remotes::install_github("PhilBoileau/cvCovEst")

## Example

To illustrate how cvCovEst may be used to select an optimal covariance matrix estimator via cross-validation, consider the following toy example:

library(MASS)
library(cvCovEst)
set.seed(1584)

# generate a 50x50 covariance matrix with unit variances and off-diagonal
# elements equal to 0.5
Sigma <- matrix(0.5, nrow = 50, ncol = 50) + diag(0.5, nrow = 50)

# sample 50 observations from multivariate normal with mean = 0, var = Sigma
dat <- mvrnorm(n = 50, mu = rep(0, 50), Sigma = Sigma)

# run CV-selector
cv_cov_est_out <- cvCovEst(
dat = dat,
estimators = c(linearShrinkLWEst, denseLinearShrinkEst,
thresholdingEst, poetEst, sampleCovEst),
estimator_params = list(
thresholdingEst = list(gamma = c(0.2, 2)),
poetEst = list(lambda = c(0.1, 0.2), k = c(1L, 2L))
),
cv_loss = cvMatrixFrobeniusLoss,
cv_scheme = "v_fold",
v_folds = 5,
)

# print the table of risk estimates
# NOTE: the estimated covariance matrix is accessible via the $estimate slot cv_cov_est_out$risk_df
#> # A tibble: 9 x 3
#>   estimator            hyperparameters      empirical_risk
#>   <chr>                <chr>                         <dbl>
#> 1 linearShrinkLWEst    hyperparameters = NA           357.
#> 2 poetEst              lambda = 0.2, k = 1            369.
#> 3 poetEst              lambda = 0.2, k = 2            372.
#> 4 poetEst              lambda = 0.1, k = 2            375.
#> 5 poetEst              lambda = 0.1, k = 1            376.
#> 6 denseLinearShrinkEst hyperparameters = NA           379.
#> 7 sampleCovEst         hyperparameters = NA           379.
#> 8 thresholdingEst      gamma = 0.2                    384.
#> 9 thresholdingEst      gamma = 2                      826.

## Issues

If you encounter any bugs or have any specific feature requests, please file an issue.

## Contributions

Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.

## Citation

Please cite the following paper when using the cvCovEst R software package.

@misc{boileau2021,
title={Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions},
author={Philippe Boileau and Nima S. Hejazi and Mark J. van der Laan and Sandrine Dudoit},
year={2021},
eprint={2102.09715},
archivePrefix={arXiv},
primaryClass={stat.ME}
}

The contents of this repository are distributed under the MIT license. See file LICENSE.md for details.