For any project enquiries phone +49 (0)761 76 999 422
Introduction

Our research

To continually produce top-quality project deliverables requires being up-to-date on new processes and methodologies.
All companies that strive to be leaders rather than followers have research at their heart, and Coreva Scientific is no different.
On this page, you can discover some of the projects that we are working on or completed.

Sampling of commonly used population characteristics: Is a normal approximation valid?

Objectives

Health economic models use a basecase that is generally representative of a subpopulation rather than the whole population. During sensitivity analysis, extrapolation of the model to other subpopulations or the whole population is estimated via sampling. Sampling is performed using summary statistics (e.g. mean and standard deviation) to inform generation of a distribution from which to draw values at random. Key population characteristics for healthcare include age, height, weight, and body mass index (BMI); all of which are commonly assumed to approximate to a normal distribution. Here the plausibility of this common assumption is tested.

Methods

Full data (N=451,075) were obtained from the 2010 Behavioral Risk Factor Surveillance System (BRFSS), a national, US, health-related, telephone survey. Data collected include age, gender, height and weight, with BMI being a calculated variable. Summary statistics and distributions were produced from the whole population. A sample of 2,500 records were extracted for in-depth analysis. Of these, 2,365 had complete data for age, gender, height, and weight. Analyses performed in R and Microsoft Excel® included subsampling, normality and Cullen-Frey tests.

Results

None of the data assessed were normally distributed. Cullen-Frey plots indicate that the best distributions to approximate the data are Beta, Log-normal, Beta, Log-normal for age, weight, height and BMI, respectively. Taking 1,000 subsamples of 300 patients, 67% of samples had a mean age falling outside of the 99% confidence interval for the population. For BMI the percentage was 62%. The ability of progressively smaller subsamples to represent the population was progressively worse.

Conclusions

Many population characteristics of interest to healthcare do not follow a normal distribution. In the BRFSS dataset, the most descriptive distributions are the log-normal for BMI and the Beta distribution with negative skew for age. Age distribution skew may represent the aging population in the US setting.

Men are predictable: modelling cardiovascular disease prevalence from population survey data (BRFSS)

Objectives

Calculating the economic burden of disease requires data regarding disease prevalence. National estimates can be derived from surveys of the general population, which may also access individuals not actively participating in the healthcare system. The Behavioral Risk Factor Surveillance System (BRFSS) is the largest annual country-wide population sampling of health and risk factors. The fidelity of these data, however, may be questionable, relying on accurate self reporting. Cardiovascular disease (CVD) prevalence was examined by gender to assess the feasibility of predicting future trends.

Methods

BRFSS data were trimmed to complete cases for 9 CVD risk factors: gender, age, race, overweight, physical activity, diabetes, high blood pressure, smoking and alcohol consumption. Data from 2011 and 2013 were used to train Bayesian and tree-based algorithms to evaluate predictor performance on unseen data from subsequent years (2013 and 2015) by comparing predicted with reported prevalence.

Results

For algorithms used, predictions of future prevalence were significantly better for males than females (p < 0.001, Šidák multiple testing correction). In the best performing algorithm (Naïve Bayes), the mean percent difference from the actual prevalence for males was 3.8±2.5% and females 151±62% (p < 0.05, two-tailed t-test). Data from 2013 yielded better 2-year predictions (2015) for women than the same time span with 2011 data (2011 to 2013, p < 0.05, two-tailed t-test), while for men, there was no significant difference (p = 0.54, two-tailed t-test). Models trained on the genders combined resulted in underestimates of prevalence (p<0.001, Z-test).

Conclusions

Patient-reported survey data can be used to predict cardiovascular disease prevalence. Accuracy of estimation is better in males versus females. Given that BRFSS data are retrospective, our findings may reflect more substantial lifestyle changes in females or suggest discussion on changes in how survey data from female respondents are collected.