<p>It is considered bad statistical practice to dichotomise continuous outcomes, but some applications require predicted probabilities rather than predicted values. To obtain predicted values, we recommend to model the original continuous outcome with <em>linear regression</em>. To obtain predicted probabilities, we recommend not to model the artificial binary outcome with <em>logistic regression</em>, but to model the original continuous outcome and the artificial binary outcome with <em>combined regression</em>.</p>
<p>We simulate data for <spanclass="math inline">\(n=100\)</span> samples and <spanclass="math inline">\(p=500\)</span> features. The vector <spanclass="math inline">\(\boldsymbol{y}\)</span> of length <spanclass="math inline">\(n\)</span> represents the outcome, and the matrix <spanclass="math inline">\(\boldsymbol{X}\)</span> with <spanclass="math inline">\(n\)</span> rows and <spanclass="math inline">\(p\)</span> columns represents the features. As outlined in the manuscript, it is considered bad statistical practice to dichotomise continuous outcomes, but there might be practical reasons to do so. We recommend to model the original continuous outcome with <em>linear regression</em> (default), or the artificial binary outcome with <em>combined regression</em> (exception).</p>
<p>We simulate data for <spanclass="math inline">\(n\)</span> samples and <spanclass="math inline">\(p\)</span> features, in a high-dimensional settting (<spanclass="math inline">\(p \gg n\)</span>). The matrix <spanclass="math inline">\(\boldsymbol{X}\)</span> with <spanclass="math inline">\(n\)</span> rows and <spanclass="math inline">\(p\)</span> columns represents the features, and the vector <spanclass="math inline">\(\boldsymbol{y}\)</span> of length <spanclass="math inline">\(n\)</span> represents the continuous outcome.</p>
<aclass="sourceLine"id="cb4-2"title="2">n <-<spanclass="st"></span><spanclass="dv">100</span>; p <-<spanclass="st"></span><spanclass="dv">500</span></a>
<aclass="sourceLine"id="cb4-2"title="2">n <-<spanclass="st"></span><spanclass="dv">100</span>; p <-<spanclass="st"></span><spanclass="dv">500</span></a>
<p>We use the function <code>cornet</code> for modelling the original continuous outcome and the artifial binary outcome. The argument <code>cutoff</code> splits the samples into two groups, those with an outcome less than or equal to the cutoff, and those with an outcome greater than the cutoff.</p>
<p>We use the function <code>cornet</code> for modelling the underlying continuous outcome and the artifial binary outcome. The argument <code>cutoff</code> splits the samples into two groups, those with an outcome less than or equal to the cutoff, and those with an outcome greater than the cutoff.</p>
<p>The function <code>coef</code> returns the estimated coefficients. The first column is for the linear model (beta), and the second column is for the logistic model (gamma). The first row includes the estimated intercepts, and the other rows include the estimated slopes.</p>
<p>The function <code>coef</code> returns the estimated coefficients. The first column is for the linear model (beta), and the second column is for the logistic model (gamma). The first row includes the estimated intercepts, and the other rows include the estimated slopes.</p>
<p>The function <code>cv.cornet</code> measures the predictive performance of combined regression by nested cross-validation, in comparison with logistic regression.</p>
<p>The function <code>cv.cornet</code> measures the predictive performance of combined regression by nested cross-validation, in comparison with logistic regression.</p>
<p>Here we observe that combined regression outperforms logistic regression (lower logistic deviance). Logistic regression is only slightly better than the intercept-only model.</p>
<p>Here we observe that combined regression outperforms logistic regression (lower logistic deviance), and that logistic regression is only slightly better than the intercept-only model.</p>
<p>It is considered bad statistical practice to dichotomise continuous outcomes, but some applications require predicted probabilities rather than predicted values. To obtain predicted values, we recommend to model the original continuous outcome with <em>linear regression</em>. To obtain predicted probabilities, we recommend not to model the artificial binary outcome with <em>logistic regression</em>, but to model the original continuous outcome and the artificial binary outcome with <em>combined regression</em>.</p>
</div>
<divid="installation"class="section level2">
<divid="installation"class="section level2">
<h2>Installation</h2>
<h2>Installation</h2>
<p>Install the current release from <ahref="https://CRAN.R-project.org/package=cornet">CRAN</a>:</p>
<p>Install the current release from <ahref="https://CRAN.R-project.org/package=cornet">CRAN</a>:</p>
<p>We simulate data for <spanclass="math inline">\(n=100\)</span> samples and <spanclass="math inline">\(p=500\)</span> features. The vector <spanclass="math inline">\(\boldsymbol{y}\)</span> of length <spanclass="math inline">\(n\)</span> represents the outcome, and the matrix <spanclass="math inline">\(\boldsymbol{X}\)</span> with <spanclass="math inline">\(n\)</span> rows and <spanclass="math inline">\(p\)</span> columns represents the features. As outlined in the manuscript, it is considered bad statistical practice to dichotomise continuous outcomes, but there might be practical reasons to do so. We recommend to model the original continuous outcome with <em>linear regression</em> (default), or the artificial binary outcome with <em>combined regression</em> (exception).</p>
<p>We simulate data for <spanclass="math inline">\(n\)</span> samples and <spanclass="math inline">\(p\)</span> features, in a high-dimensional settting (<spanclass="math inline">\(p \gg n\)</span>). The matrix <spanclass="math inline">\(\boldsymbol{X}\)</span> with <spanclass="math inline">\(n\)</span> rows and <spanclass="math inline">\(p\)</span> columns represents the features, and the vector <spanclass="math inline">\(\boldsymbol{y}\)</span> of length <spanclass="math inline">\(n\)</span> represents the continuous outcome.</p>
<aclass="sourceLine"id="cb4-2"title="2">n <-<spanclass="st"></span><spanclass="dv">100</span>; p <-<spanclass="st"></span><spanclass="dv">500</span></a>
<aclass="sourceLine"id="cb4-2"title="2">n <-<spanclass="st"></span><spanclass="dv">100</span>; p <-<spanclass="st"></span><spanclass="dv">500</span></a>
<p>We use the function <code>cornet</code> for modelling the original continuous outcome and the artifial binary outcome. The argument <code>cutoff</code> splits the samples into two groups, those with an outcome less than or equal to the cutoff, and those with an outcome greater than the cutoff.</p>
</div>
<divid="application"class="section level2">
<h2>Application</h2>
<p>We use the function <code>cornet</code> for modelling the underlying continuous outcome and the artifial binary outcome. The argument <code>cutoff</code> splits the samples into two groups, those with an outcome less than or equal to the cutoff, and those with an outcome greater than the cutoff.</p>
<p>The function <code>coef</code> returns the estimated coefficients. The first column is for the linear model (beta), and the second column is for the logistic model (gamma). The first row includes the estimated intercepts, and the other rows include the estimated slopes.</p>
<p>The function <code>coef</code> returns the estimated coefficients. The first column is for the linear model (beta), and the second column is for the logistic model (gamma). The first row includes the estimated intercepts, and the other rows include the estimated slopes.</p>
<p>The function <code>cv.cornet</code> measures the predictive performance of combined regression by nested cross-validation, in comparison with logistic regression.</p>
<p>The function <code>cv.cornet</code> measures the predictive performance of combined regression by nested cross-validation, in comparison with logistic regression.</p>
<p>Here we observe that combined regression outperforms logistic regression (lower logistic deviance). Logistic regression is only slightly better than the intercept-only model.</p>
<p>Here we observe that combined regression outperforms logistic regression (lower logistic deviance), and that logistic regression is only slightly better than the intercept-only model.</p>
</div>
</div>
<divid="references"class="section level1">
<divid="references"class="section level1">
<h1>References</h1>
<h1>References</h1>
<p>Rauschenberger A, and Glaab E (2019). “Predicting artificial binary outcomes from high-dimensional data”. <em>Manuscript in preparation.</em></p>
<p>Rauschenberger A, and Glaab E (2020). “Predicting artificial binary outcomes from high-dimensional data”. <em>Manuscript in preparation.</em></p>