...
 
Commits (2)
## cornet 0.0.4 (2020-03-17) ## cornet 0.0.4 (2020-03-18)
* updated documentation * updated documentation
......
...@@ -96,8 +96,8 @@ ...@@ -96,8 +96,8 @@
#' \code{\link[=predict.cornet]{predict}}. #' \code{\link[=predict.cornet]{predict}}.
#' #'
#' @references #' @references
#' Armin Rauschenberger and Enrico Glaab (2019). #' Armin Rauschenberger and Enrico Glaab (2020).
#' "Lasso and ridge regression for dichotomised outcomes". #' "Predicting artificial binary outcomes from high-dimensional data".
#' \emph{Manuscript in preparation}. #' \emph{Manuscript in preparation}.
#' #'
#' @examples #' @examples
......
...@@ -40,7 +40,7 @@ devtools::install_github("rauschenberger/cornet") ...@@ -40,7 +40,7 @@ devtools::install_github("rauschenberger/cornet")
## Reference ## Reference
Armin Rauschenberger and Enrico Glaab (2019). "Predicting artificial binary outcomes from high-dimensional data". *Submitted.* Armin Rauschenberger and Enrico Glaab (2020). "Predicting artificial binary outcomes from high-dimensional data". *Manuscript in preparation.*
[![CRAN version](https://www.r-pkg.org/badges/version/cornet)](https://CRAN.R-project.org/package=cornet) [![CRAN version](https://www.r-pkg.org/badges/version/cornet)](https://CRAN.R-project.org/package=cornet)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/cornet)](https://CRAN.R-project.org/package=cornet) [![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/cornet)](https://CRAN.R-project.org/package=cornet)
......
...@@ -32,8 +32,9 @@ devtools::install_github("rauschenberger/cornet") ...@@ -32,8 +32,9 @@ devtools::install_github("rauschenberger/cornet")
## Reference ## Reference
Armin Rauschenberger and Enrico Glaab (2019). “Predicting artificial Armin Rauschenberger and Enrico Glaab (2020). “Predicting artificial
binary outcomes from high-dimensional data”. *Submitted.* binary outcomes from high-dimensional data”. *Manuscript in
preparation.*
[![CRAN [![CRAN
version](https://www.r-pkg.org/badges/version/cornet)](https://CRAN.R-project.org/package=cornet) version](https://www.r-pkg.org/badges/version/cornet)](https://CRAN.R-project.org/package=cornet)
......
# Notes # Notes
- new author email for palasso, cornet and joinet - reason for resubmission: updated documentation
\ No newline at end of file
- package runs without errors with palasso 0.0.6
\ No newline at end of file
...@@ -96,7 +96,7 @@ ...@@ -96,7 +96,7 @@
<div id="reference" class="section level2"> <div id="reference" class="section level2">
<h2 class="hasAnchor"> <h2 class="hasAnchor">
<a href="#reference" class="anchor"></a>Reference</h2> <a href="#reference" class="anchor"></a>Reference</h2>
<p>Armin Rauschenberger and Enrico Glaab (2019). “Predicting artificial binary outcomes from high-dimensional data”. <em>Manuscript in preparation.</em></p> <p>Armin Rauschenberger and Enrico Glaab (2020). “Predicting artificial binary outcomes from high-dimensional data”. <em>Manuscript in preparation.</em></p>
</div> </div>
</div> </div>
......
...@@ -92,6 +92,11 @@ ...@@ -92,6 +92,11 @@
<div id="introduction" class="section level2">
<h2 class="hasAnchor">
<a href="#introduction" class="anchor"></a>Introduction</h2>
<p>It is considered bad statistical practice to dichotomise continuous outcomes, but some applications require predicted probabilities rather than predicted values. To obtain predicted values, we recommend to model the original continuous outcome with <em>linear regression</em>. To obtain predicted probabilities, we recommend not to model the artificial binary outcome with <em>logistic regression</em>, but to model the original continuous outcome and the artificial binary outcome with <em>combined regression</em>.</p>
</div>
<div id="installation" class="section level2"> <div id="installation" class="section level2">
<h2 class="hasAnchor"> <h2 class="hasAnchor">
<a href="#installation" class="anchor"></a>Installation</h2> <a href="#installation" class="anchor"></a>Installation</h2>
...@@ -103,21 +108,16 @@ ...@@ -103,21 +108,16 @@
<p>Then load and attach the package:</p> <p>Then load and attach the package:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb3-1" title="1"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span>(cornet)</a></code></pre></div> <div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb3-1" title="1"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span>(cornet)</a></code></pre></div>
</div> </div>
<div id="simulation" class="section level2"> <div id="example" class="section level2">
<h2 class="hasAnchor"> <h2 class="hasAnchor">
<a href="#simulation" class="anchor"></a>Simulation</h2> <a href="#example" class="anchor"></a>Example</h2>
<p>We simulate data for <span class="math inline">\(n=100\)</span> samples and <span class="math inline">\(p=500\)</span> features. The vector <span class="math inline">\(\boldsymbol{y}\)</span> of length <span class="math inline">\(n\)</span> represents the outcome, and the matrix <span class="math inline">\(\boldsymbol{X}\)</span> with <span class="math inline">\(n\)</span> rows and <span class="math inline">\(p\)</span> columns represents the features. As outlined in the manuscript, it is considered bad statistical practice to dichotomise continuous outcomes, but there might be practical reasons to do so. We recommend to model the original continuous outcome with <em>linear regression</em> (default), or the artificial binary outcome with <em>combined regression</em> (exception).</p> <p>We simulate data for <span class="math inline">\(n\)</span> samples and <span class="math inline">\(p\)</span> features, in a high-dimensional settting (<span class="math inline">\(p \gg n\)</span>). The matrix <span class="math inline">\(\boldsymbol{X}\)</span> with <span class="math inline">\(n\)</span> rows and <span class="math inline">\(p\)</span> columns represents the features, and the vector <span class="math inline">\(\boldsymbol{y}\)</span> of length <span class="math inline">\(n\)</span> represents the continuous outcome.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" title="1"><span class="kw"><a href="https://rdrr.io/r/base/Random.html">set.seed</a></span>(<span class="dv">1</span>)</a> <div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" title="1"><span class="kw"><a href="https://rdrr.io/r/base/Random.html">set.seed</a></span>(<span class="dv">1</span>)</a>
<a class="sourceLine" id="cb4-2" title="2">n &lt;-<span class="st"> </span><span class="dv">100</span>; p &lt;-<span class="st"> </span><span class="dv">500</span></a> <a class="sourceLine" id="cb4-2" title="2">n &lt;-<span class="st"> </span><span class="dv">100</span>; p &lt;-<span class="st"> </span><span class="dv">500</span></a>
<a class="sourceLine" id="cb4-3" title="3">X &lt;-<span class="st"> </span><span class="kw"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span>(<span class="kw"><a href="https://rdrr.io/r/stats/Normal.html">rnorm</a></span>(n<span class="op">*</span>p),<span class="dt">nrow=</span>n,<span class="dt">ncol=</span>p)</a> <a class="sourceLine" id="cb4-3" title="3">X &lt;-<span class="st"> </span><span class="kw"><a href="https://rdrr.io/r/base/matrix.html">matrix</a></span>(<span class="kw"><a href="https://rdrr.io/r/stats/Normal.html">rnorm</a></span>(n<span class="op">*</span>p),<span class="dt">nrow=</span>n,<span class="dt">ncol=</span>p)</a>
<a class="sourceLine" id="cb4-4" title="4">beta &lt;-<span class="st"> </span><span class="kw"><a href="https://rdrr.io/r/stats/Binomial.html">rbinom</a></span>(<span class="dt">n=</span>p,<span class="dt">size=</span><span class="dv">1</span>,<span class="dt">prob=</span><span class="fl">0.05</span>)</a> <a class="sourceLine" id="cb4-4" title="4">beta &lt;-<span class="st"> </span><span class="kw"><a href="https://rdrr.io/r/stats/Binomial.html">rbinom</a></span>(<span class="dt">n=</span>p,<span class="dt">size=</span><span class="dv">1</span>,<span class="dt">prob=</span><span class="fl">0.05</span>)</a>
<a class="sourceLine" id="cb4-5" title="5">mean &lt;-<span class="st"> </span>X <span class="op">%*%</span><span class="st"> </span>beta</a> <a class="sourceLine" id="cb4-5" title="5">y &lt;-<span class="st"> </span><span class="kw"><a href="https://rdrr.io/r/stats/Normal.html">rnorm</a></span>(<span class="dt">n=</span>n,<span class="dt">mean=</span>X<span class="op">%*%</span>beta)</a></code></pre></div>
<a class="sourceLine" id="cb4-6" title="6">y &lt;-<span class="st"> </span><span class="kw"><a href="https://rdrr.io/r/stats/Normal.html">rnorm</a></span>(<span class="dt">n=</span>n,<span class="dt">mean=</span>mean)</a></code></pre></div> <p>We use the function <code>cornet</code> for modelling the original continuous outcome and the artifial binary outcome. The argument <code>cutoff</code> splits the samples into two groups, those with an outcome less than or equal to the cutoff, and those with an outcome greater than the cutoff.</p>
</div>
<div id="application" class="section level2">
<h2 class="hasAnchor">
<a href="#application" class="anchor"></a>Application</h2>
<p>We use the function <code>cornet</code> for modelling the underlying continuous outcome and the artifial binary outcome. The argument <code>cutoff</code> splits the samples into two groups, those with an outcome less than or equal to the cutoff, and those with an outcome greater than the cutoff.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb5-1" title="1">model &lt;-<span class="st"> </span><span class="kw"><a href="../reference/cornet.html">cornet</a></span>(<span class="dt">y=</span>y,<span class="dt">cutoff=</span><span class="dv">0</span>,<span class="dt">X=</span>X)</a> <div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb5-1" title="1">model &lt;-<span class="st"> </span><span class="kw"><a href="../reference/cornet.html">cornet</a></span>(<span class="dt">y=</span>y,<span class="dt">cutoff=</span><span class="dv">0</span>,<span class="dt">X=</span>X)</a>
<a class="sourceLine" id="cb5-2" title="2">model</a></code></pre></div> <a class="sourceLine" id="cb5-2" title="2">model</a></code></pre></div>
<p>The function <code>coef</code> returns the estimated coefficients. The first column is for the linear model (beta), and the second column is for the logistic model (gamma). The first row includes the estimated intercepts, and the other rows include the estimated slopes.</p> <p>The function <code>coef</code> returns the estimated coefficients. The first column is for the linear model (beta), and the second column is for the logistic model (gamma). The first row includes the estimated intercepts, and the other rows include the estimated slopes.</p>
...@@ -126,12 +126,12 @@ ...@@ -126,12 +126,12 @@
<div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb7-1" title="1">predict &lt;-<span class="st"> </span><span class="kw"><a href="https://rdrr.io/r/stats/predict.html">predict</a></span>(model,<span class="dt">newx=</span>X)</a></code></pre></div> <div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb7-1" title="1">predict &lt;-<span class="st"> </span><span class="kw"><a href="https://rdrr.io/r/stats/predict.html">predict</a></span>(model,<span class="dt">newx=</span>X)</a></code></pre></div>
<p>The function <code>cv.cornet</code> measures the predictive performance of combined regression by nested cross-validation, in comparison with logistic regression.</p> <p>The function <code>cv.cornet</code> measures the predictive performance of combined regression by nested cross-validation, in comparison with logistic regression.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb8-1" title="1"><span class="kw"><a href="../reference/cv.cornet.html">cv.cornet</a></span>(<span class="dt">y=</span>y,<span class="dt">cutoff=</span><span class="dv">0</span>,<span class="dt">X=</span>X)</a></code></pre></div> <div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb8-1" title="1"><span class="kw"><a href="../reference/cv.cornet.html">cv.cornet</a></span>(<span class="dt">y=</span>y,<span class="dt">cutoff=</span><span class="dv">0</span>,<span class="dt">X=</span>X)</a></code></pre></div>
<p>Here we observe that combined regression outperforms logistic regression (lower logistic deviance). Logistic regression is only slightly better than the intercept-only model.</p> <p>Here we observe that combined regression outperforms logistic regression (lower logistic deviance), and that logistic regression is only slightly better than the intercept-only model.</p>
</div> </div>
<div id="references" class="section level1"> <div id="references" class="section level1">
<h1 class="hasAnchor"> <h1 class="hasAnchor">
<a href="#references" class="anchor"></a>References</h1> <a href="#references" class="anchor"></a>References</h1>
<p>Rauschenberger A, and Glaab E (2019). “Predicting artificial binary outcomes from high-dimensional data”. <em>Manuscript in preparation.</em></p> <p>Rauschenberger A, and Glaab E (2020). “Predicting artificial binary outcomes from high-dimensional data”. <em>Manuscript in preparation.</em></p>
<!-- <!--
# Example # Example
...@@ -257,9 +257,9 @@ sapply(loss,function(x) x$deviance) ...@@ -257,9 +257,9 @@ sapply(loss,function(x) x$deviance)
<h2 class="hasAnchor"> <h2 class="hasAnchor">
<a href="#tocnav" class="anchor"></a>Contents</h2> <a href="#tocnav" class="anchor"></a>Contents</h2>
<ul class="nav nav-pills nav-stacked"> <ul class="nav nav-pills nav-stacked">
<li><a href="#installation">Installation</a></li> <li><a href="#introduction">Introduction</a></li>
<li><a href="#simulation">Simulation</a></li> <li><a href="#installation">Installation</a></li>
<li><a href="#application">Application</a></li> <li><a href="#example">Example</a></li>
<li><a href="#references">References</a></li> <li><a href="#references">References</a></li>
</ul> </ul>
</div> </div>
......
...@@ -100,7 +100,7 @@ ...@@ -100,7 +100,7 @@
<div id="reference" class="section level2"> <div id="reference" class="section level2">
<h2 class="hasAnchor"> <h2 class="hasAnchor">
<a href="#reference" class="anchor"></a>Reference</h2> <a href="#reference" class="anchor"></a>Reference</h2>
<p>Armin Rauschenberger and Enrico Glaab (2019). “Predicting artificial binary outcomes from high-dimensional data”. <em>Submitted.</em></p> <p>Armin Rauschenberger and Enrico Glaab (2020). “Predicting artificial binary outcomes from high-dimensional data”. <em>Manuscript in preparation.</em></p>
<p><a href="https://CRAN.R-project.org/package=cornet"><img src="https://www.r-pkg.org/badges/version/cornet" alt="CRAN version"></a> <a href="https://CRAN.R-project.org/package=cornet"><img src="https://cranlogs.r-pkg.org/badges/cornet" alt="CRAN RStudio mirror downloads"></a> <a href="https://CRAN.R-project.org/package=cornet"><img src="https://cranlogs.r-pkg.org/badges/grand-total/cornet" alt="Total CRAN downloads"></a></p> <p><a href="https://CRAN.R-project.org/package=cornet"><img src="https://www.r-pkg.org/badges/version/cornet" alt="CRAN version"></a> <a href="https://CRAN.R-project.org/package=cornet"><img src="https://cranlogs.r-pkg.org/badges/cornet" alt="CRAN RStudio mirror downloads"></a> <a href="https://CRAN.R-project.org/package=cornet"><img src="https://cranlogs.r-pkg.org/badges/grand-total/cornet" alt="Total CRAN downloads"></a></p>
</div> </div>
......
...@@ -122,9 +122,9 @@ ...@@ -122,9 +122,9 @@
<small>Source: <a href='https://github.com/rauschenberger/cornet/blob/master/NEWS.md'><code>NEWS.md</code></a></small> <small>Source: <a href='https://github.com/rauschenberger/cornet/blob/master/NEWS.md'><code>NEWS.md</code></a></small>
</div> </div>
<div id="cornet-004-2020-03-17" class="section level2"> <div id="cornet-004-2020-03-18" class="section level2">
<h2 class="hasAnchor"> <h2 class="hasAnchor">
<a href="#cornet-004-2020-03-17" class="anchor"></a>cornet 0.0.4 (2020-03-17)</h2> <a href="#cornet-004-2020-03-18" class="anchor"></a>cornet 0.0.4 (2020-03-18)</h2>
<ul> <ul>
<li>updated documentation</li> <li>updated documentation</li>
</ul> </ul>
...@@ -165,7 +165,7 @@ ...@@ -165,7 +165,7 @@
<div id="tocnav"> <div id="tocnav">
<h2>Contents</h2> <h2>Contents</h2>
<ul class="nav nav-pills nav-stacked"> <ul class="nav nav-pills nav-stacked">
<li><a href="#cornet-004-2020-03-17">0.0.4</a></li> <li><a href="#cornet-004-2020-03-18">0.0.4</a></li>
<li><a href="#cornet-003-2019-11-12">0.0.3</a></li> <li><a href="#cornet-003-2019-11-12">0.0.3</a></li>
<li><a href="#cornet-003-2019-10-02">0.0.3</a></li> <li><a href="#cornet-003-2019-10-02">0.0.3</a></li>
<li><a href="#cornet-002-2019-09-26">0.0.2</a></li> <li><a href="#cornet-002-2019-09-26">0.0.2</a></li>
......
...@@ -233,8 +233,8 @@ but the loss is incomparable between linear and logistic regression.</p> ...@@ -233,8 +233,8 @@ but the loss is incomparable between linear and logistic regression.</p>
If at all, use <code>"auc"</code> for external cross-validation only.</p> If at all, use <code>"auc"</code> for external cross-validation only.</p>
<h2 class="hasAnchor" id="references"><a class="anchor" href="#references"></a>References</h2> <h2 class="hasAnchor" id="references"><a class="anchor" href="#references"></a>References</h2>
<p>Armin Rauschenberger and Enrico Glaab (2019). <p>Armin Rauschenberger and Enrico Glaab (2020).
"Lasso and ridge regression for dichotomised outcomes". "Predicting artificial binary outcomes from high-dimensional data".
<em>Manuscript in preparation</em>.</p> <em>Manuscript in preparation</em>.</p>
<h2 class="hasAnchor" id="see-also"><a class="anchor" href="#see-also"></a>See also</h2> <h2 class="hasAnchor" id="see-also"><a class="anchor" href="#see-also"></a>See also</h2>
......
...@@ -91,8 +91,8 @@ net ...@@ -91,8 +91,8 @@ net
} }
\references{ \references{
Armin Rauschenberger and Enrico Glaab (2019). Armin Rauschenberger and Enrico Glaab (2020).
"Lasso and ridge regression for dichotomised outcomes". "Predicting artificial binary outcomes from high-dimensional data".
\emph{Manuscript in preparation}. \emph{Manuscript in preparation}.
} }
\seealso{ \seealso{
......
...@@ -11,6 +11,6 @@ The `cornet` manuscript is in preparation. Click [here](https://CRAN.R-project.o ...@@ -11,6 +11,6 @@ The `cornet` manuscript is in preparation. Click [here](https://CRAN.R-project.o
## Reference ## Reference
Armin Rauschenberger and Enrico Glaab (2019). Armin Rauschenberger and Enrico Glaab (2020).
"Predicting artificial binary outcomes from high-dimensional data". "Predicting artificial binary outcomes from high-dimensional data".
*Manuscript in preparation.* *Manuscript in preparation.*
...@@ -13,6 +13,10 @@ editor_options: ...@@ -13,6 +13,10 @@ editor_options:
knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(echo = TRUE)
``` ```
## Introduction
It is considered bad statistical practice to dichotomise continuous outcomes, but some applications require predicted probabilities rather than predicted values. To obtain predicted values, we recommend to model the original continuous outcome with *linear regression*. To obtain predicted probabilities, we recommend not to model the artificial binary outcome with *logistic regression*, but to model the original continuous outcome and the artificial binary outcome with *combined regression*.
## Installation ## Installation
Install the current release from [CRAN](https://CRAN.R-project.org/package=cornet): Install the current release from [CRAN](https://CRAN.R-project.org/package=cornet):
...@@ -34,22 +38,19 @@ Then load and attach the package: ...@@ -34,22 +38,19 @@ Then load and attach the package:
library(cornet) library(cornet)
``` ```
## Simulation ## Example
We simulate data for $n=100$ samples and $p=500$ features. The vector $\boldsymbol{y}$ of length $n$ represents the outcome, and the matrix $\boldsymbol{X}$ with $n$ rows and $p$ columns represents the features. As outlined in the manuscript, it is considered bad statistical practice to dichotomise continuous outcomes, but there might be practical reasons to do so. We recommend to model the original continuous outcome with *linear regression* (default), or the artificial binary outcome with *combined regression* (exception). We simulate data for $n$ samples and $p$ features, in a high-dimensional settting ($p \gg n$). The matrix $\boldsymbol{X}$ with $n$ rows and $p$ columns represents the features, and the vector $\boldsymbol{y}$ of length $n$ represents the continuous outcome.
```{r,eval=FALSE} ```{r,eval=FALSE}
set.seed(1) set.seed(1)
n <- 100; p <- 500 n <- 100; p <- 500
X <- matrix(rnorm(n*p),nrow=n,ncol=p) X <- matrix(rnorm(n*p),nrow=n,ncol=p)
beta <- rbinom(n=p,size=1,prob=0.05) beta <- rbinom(n=p,size=1,prob=0.05)
mean <- X %*% beta y <- rnorm(n=n,mean=X%*%beta)
y <- rnorm(n=n,mean=mean)
``` ```
## Application We use the function `cornet` for modelling the original continuous outcome and the artifial binary outcome. The argument `cutoff` splits the samples into two groups, those with an outcome less than or equal to the cutoff, and those with an outcome greater than the cutoff.
We use the function `cornet` for modelling the underlying continuous outcome and the artifial binary outcome. The argument `cutoff` splits the samples into two groups, those with an outcome less than or equal to the cutoff, and those with an outcome greater than the cutoff.
```{r,eval=FALSE} ```{r,eval=FALSE}
model <- cornet(y=y,cutoff=0,X=X) model <- cornet(y=y,cutoff=0,X=X)
...@@ -74,11 +75,11 @@ The function `cv.cornet` measures the predictive performance of combined regress ...@@ -74,11 +75,11 @@ The function `cv.cornet` measures the predictive performance of combined regress
cv.cornet(y=y,cutoff=0,X=X) cv.cornet(y=y,cutoff=0,X=X)
``` ```
Here we observe that combined regression outperforms logistic regression (lower logistic deviance). Logistic regression is only slightly better than the intercept-only model. Here we observe that combined regression outperforms logistic regression (lower logistic deviance), and that logistic regression is only slightly better than the intercept-only model.
# References # References
Rauschenberger A, and Glaab E (2019). "Predicting artificial binary outcomes from high-dimensional data". *Manuscript in preparation.* Rauschenberger A, and Glaab E (2020). "Predicting artificial binary outcomes from high-dimensional data". *Manuscript in preparation.*
<!-- <!--
# Example # Example
......
...@@ -304,6 +304,10 @@ code > span.fu { color: #900; font-weight: bold; } code > span.er { color: #a61 ...@@ -304,6 +304,10 @@ code > span.fu { color: #900; font-weight: bold; } code > span.er { color: #a61
<div id="introduction" class="section level2">
<h2>Introduction</h2>
<p>It is considered bad statistical practice to dichotomise continuous outcomes, but some applications require predicted probabilities rather than predicted values. To obtain predicted values, we recommend to model the original continuous outcome with <em>linear regression</em>. To obtain predicted probabilities, we recommend not to model the artificial binary outcome with <em>logistic regression</em>, but to model the original continuous outcome and the artificial binary outcome with <em>combined regression</em>.</p>
</div>
<div id="installation" class="section level2"> <div id="installation" class="section level2">
<h2>Installation</h2> <h2>Installation</h2>
<p>Install the current release from <a href="https://CRAN.R-project.org/package=cornet">CRAN</a>:</p> <p>Install the current release from <a href="https://CRAN.R-project.org/package=cornet">CRAN</a>:</p>
...@@ -314,19 +318,15 @@ code > span.fu { color: #900; font-weight: bold; } code > span.er { color: #a61 ...@@ -314,19 +318,15 @@ code > span.fu { color: #900; font-weight: bold; } code > span.er { color: #a61
<p>Then load and attach the package:</p> <p>Then load and attach the package:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb3-1" title="1"><span class="kw">library</span>(cornet)</a></code></pre></div> <div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb3-1" title="1"><span class="kw">library</span>(cornet)</a></code></pre></div>
</div> </div>
<div id="simulation" class="section level2"> <div id="example" class="section level2">
<h2>Simulation</h2> <h2>Example</h2>
<p>We simulate data for <span class="math inline">\(n=100\)</span> samples and <span class="math inline">\(p=500\)</span> features. The vector <span class="math inline">\(\boldsymbol{y}\)</span> of length <span class="math inline">\(n\)</span> represents the outcome, and the matrix <span class="math inline">\(\boldsymbol{X}\)</span> with <span class="math inline">\(n\)</span> rows and <span class="math inline">\(p\)</span> columns represents the features. As outlined in the manuscript, it is considered bad statistical practice to dichotomise continuous outcomes, but there might be practical reasons to do so. We recommend to model the original continuous outcome with <em>linear regression</em> (default), or the artificial binary outcome with <em>combined regression</em> (exception).</p> <p>We simulate data for <span class="math inline">\(n\)</span> samples and <span class="math inline">\(p\)</span> features, in a high-dimensional settting (<span class="math inline">\(p \gg n\)</span>). The matrix <span class="math inline">\(\boldsymbol{X}\)</span> with <span class="math inline">\(n\)</span> rows and <span class="math inline">\(p\)</span> columns represents the features, and the vector <span class="math inline">\(\boldsymbol{y}\)</span> of length <span class="math inline">\(n\)</span> represents the continuous outcome.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" title="1"><span class="kw">set.seed</span>(<span class="dv">1</span>)</a> <div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" title="1"><span class="kw">set.seed</span>(<span class="dv">1</span>)</a>
<a class="sourceLine" id="cb4-2" title="2">n &lt;-<span class="st"> </span><span class="dv">100</span>; p &lt;-<span class="st"> </span><span class="dv">500</span></a> <a class="sourceLine" id="cb4-2" title="2">n &lt;-<span class="st"> </span><span class="dv">100</span>; p &lt;-<span class="st"> </span><span class="dv">500</span></a>
<a class="sourceLine" id="cb4-3" title="3">X &lt;-<span class="st"> </span><span class="kw">matrix</span>(<span class="kw">rnorm</span>(n<span class="op">*</span>p),<span class="dt">nrow=</span>n,<span class="dt">ncol=</span>p)</a> <a class="sourceLine" id="cb4-3" title="3">X &lt;-<span class="st"> </span><span class="kw">matrix</span>(<span class="kw">rnorm</span>(n<span class="op">*</span>p),<span class="dt">nrow=</span>n,<span class="dt">ncol=</span>p)</a>
<a class="sourceLine" id="cb4-4" title="4">beta &lt;-<span class="st"> </span><span class="kw">rbinom</span>(<span class="dt">n=</span>p,<span class="dt">size=</span><span class="dv">1</span>,<span class="dt">prob=</span><span class="fl">0.05</span>)</a> <a class="sourceLine" id="cb4-4" title="4">beta &lt;-<span class="st"> </span><span class="kw">rbinom</span>(<span class="dt">n=</span>p,<span class="dt">size=</span><span class="dv">1</span>,<span class="dt">prob=</span><span class="fl">0.05</span>)</a>
<a class="sourceLine" id="cb4-5" title="5">mean &lt;-<span class="st"> </span>X <span class="op">%*%</span><span class="st"> </span>beta</a> <a class="sourceLine" id="cb4-5" title="5">y &lt;-<span class="st"> </span><span class="kw">rnorm</span>(<span class="dt">n=</span>n,<span class="dt">mean=</span>X<span class="op">%*%</span>beta)</a></code></pre></div>
<a class="sourceLine" id="cb4-6" title="6">y &lt;-<span class="st"> </span><span class="kw">rnorm</span>(<span class="dt">n=</span>n,<span class="dt">mean=</span>mean)</a></code></pre></div> <p>We use the function <code>cornet</code> for modelling the original continuous outcome and the artifial binary outcome. The argument <code>cutoff</code> splits the samples into two groups, those with an outcome less than or equal to the cutoff, and those with an outcome greater than the cutoff.</p>
</div>
<div id="application" class="section level2">
<h2>Application</h2>
<p>We use the function <code>cornet</code> for modelling the underlying continuous outcome and the artifial binary outcome. The argument <code>cutoff</code> splits the samples into two groups, those with an outcome less than or equal to the cutoff, and those with an outcome greater than the cutoff.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb5-1" title="1">model &lt;-<span class="st"> </span><span class="kw">cornet</span>(<span class="dt">y=</span>y,<span class="dt">cutoff=</span><span class="dv">0</span>,<span class="dt">X=</span>X)</a> <div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb5-1" title="1">model &lt;-<span class="st"> </span><span class="kw">cornet</span>(<span class="dt">y=</span>y,<span class="dt">cutoff=</span><span class="dv">0</span>,<span class="dt">X=</span>X)</a>
<a class="sourceLine" id="cb5-2" title="2">model</a></code></pre></div> <a class="sourceLine" id="cb5-2" title="2">model</a></code></pre></div>
<p>The function <code>coef</code> returns the estimated coefficients. The first column is for the linear model (beta), and the second column is for the logistic model (gamma). The first row includes the estimated intercepts, and the other rows include the estimated slopes.</p> <p>The function <code>coef</code> returns the estimated coefficients. The first column is for the linear model (beta), and the second column is for the logistic model (gamma). The first row includes the estimated intercepts, and the other rows include the estimated slopes.</p>
...@@ -335,11 +335,11 @@ code > span.fu { color: #900; font-weight: bold; } code > span.er { color: #a61 ...@@ -335,11 +335,11 @@ code > span.fu { color: #900; font-weight: bold; } code > span.er { color: #a61
<div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb7-1" title="1">predict &lt;-<span class="st"> </span><span class="kw">predict</span>(model,<span class="dt">newx=</span>X)</a></code></pre></div> <div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb7-1" title="1">predict &lt;-<span class="st"> </span><span class="kw">predict</span>(model,<span class="dt">newx=</span>X)</a></code></pre></div>
<p>The function <code>cv.cornet</code> measures the predictive performance of combined regression by nested cross-validation, in comparison with logistic regression.</p> <p>The function <code>cv.cornet</code> measures the predictive performance of combined regression by nested cross-validation, in comparison with logistic regression.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb8-1" title="1"><span class="kw">cv.cornet</span>(<span class="dt">y=</span>y,<span class="dt">cutoff=</span><span class="dv">0</span>,<span class="dt">X=</span>X)</a></code></pre></div> <div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb8-1" title="1"><span class="kw">cv.cornet</span>(<span class="dt">y=</span>y,<span class="dt">cutoff=</span><span class="dv">0</span>,<span class="dt">X=</span>X)</a></code></pre></div>
<p>Here we observe that combined regression outperforms logistic regression (lower logistic deviance). Logistic regression is only slightly better than the intercept-only model.</p> <p>Here we observe that combined regression outperforms logistic regression (lower logistic deviance), and that logistic regression is only slightly better than the intercept-only model.</p>
</div> </div>
<div id="references" class="section level1"> <div id="references" class="section level1">
<h1>References</h1> <h1>References</h1>
<p>Rauschenberger A, and Glaab E (2019). “Predicting artificial binary outcomes from high-dimensional data”. <em>Manuscript in preparation.</em></p> <p>Rauschenberger A, and Glaab E (2020). “Predicting artificial binary outcomes from high-dimensional data”. <em>Manuscript in preparation.</em></p>
<!-- <!--
# Example # Example
......