Training@Staburo: How to use cross-validation to obtain reliable subgroup effects

Training@Staburo: How to use cross-validation to obtain reliable subgroup effects

Training@Staburo: How to use cross-validation to obtain reliable subgroup effects

Nicole Krämer, project leader Translational Medicine and Biomarker, explained how we can use cross-validation to obtain reliable subgroup effects.

The identification of patient subgroups, who benefit from a new treatment, is of crucial importance in precision medicine. Many data-driven subgroup identification algorithms are out there and in general, they differ in the underlying estimand and in the way the subgroup is identified. However, due to their data-driven nature, the relative treatment effect (e.g. the hazard ratio or risk difference) within the identified subgroup, is almost always too good to be true and cannot be reproduced in a new trial.

Nicole explained how to apply cross-validation, to obtain more reproducible subgroup effects. In each cross-validation split, the subgroup identification algorithm is applied to the training set. The obtained rule is used, to assign patients in the test set, to the subgroup or its complement. This cross-validated assignment can then be used, to estimate the subgroup effect.

Data analysis, clinical biostatistics and more.

Staburo @ EMA scientific advice

Staburo @ EMA scientific advice

Staburo @ EMA scientific advice meeting in London 

Staburo Managing Director Josef Höfler visited the European Medicines Agency (EMA) in London to have a scientific advice meeting, for a client’s clinical development project. 

Statistical concepts for adaptive designs with sample size re-assessment were commented by EMA’s experts during the meeting.

Furthermore, the pivotal study design with respect to the statistical analyses was discussed. The input from EMA experts was very valuable to increase the chance for less questions from the authorities during the market authorization process – provided that the trial results will be positive.

Data analysis, clinical biostatistics and more.

Staburo @ R meet-up at Roche Diagnostics

Staburo @ R meet-up at Roche Diagnostics

Staburo @ the R Winter Edition meetup event 

Hosted by Roche, Staburo attended the Applied R Winter Edition meetup event to hear talks about the world largest R Shiny app and component-wise boosting in machine learning.

The world’s largest R Shiny app (bioWARP) was presented by Sebastian Wolf of Roche, showing the complexity and large-scale deployment of a truly impressive tool. The application enables employees at Roche Diagnostics to create validated reports for regulatory authorities’ submissions. bioWARP enables people using advanced statistical methods, who cannot program R. It builds a connection to the validated R-packages developed at Roche with an easy to use and elegant user interface. Its modular environment can host an unlimited number of such interfaces. One of its main feature is a module testing homogeneity of a production process by in-house developed equivalence tests.

bioWARP’s most important feature is the ability to move all statistical evaluations right into PDF reports. These are validated and can directly be used for submission to regulatory authorities. bioWARP is called the “largest shiny application in the world” by us as it already consists of 16 modules/tools, has over 100.000 lines of code, >500 buttons and interaction items and is growing and growing and growing.

The second talk of the event covered an interesting approach to building useful statistical models with machine learning techniques. Component-wise boosting applies the boosting framework to statistical models, e.g., general additive models using component-wise smoothing splines. Boosting these kinds of models maintains interpretability and enables unbiased model selection in high dimensional feature spaces.

The R package compboost is an implementation of component-wise boosting written in C++ to obtain high runtime performance and full memory control. The main idea is to provide a modular class system which can be extended without editing the source code.

Data analysis, clinical biostatistics and more.