Tips for Reporting P Values, Confidence Intervals, and Power Analyses in Health Professions Education Research: Just Do It!

stethescope and medical chart 2

By: Colin P. West, MD, PhD, Eduardo F. Abbott, MD, and David A. Cook, MD, MHPE

Basic statistical results, including P values, confidence intervals, and power analyses, are variably reported in scientific publications and frequently misunderstood or misapplied. In our current article, Abbott et al, we examined the current prevalence and evolution over time in reporting of P values, confidence intervals, and power analyses in health professions education research (HPER) publications.

We found reporting of P values and confidence intervals in HPER publications increased from the 1970s to 2015, and in 2015, P values were reported in most HPER abstracts and main texts of published research papers. However, reporting of confidence intervals and power analyses remained uncommon and lagged behind reporting in general biomedical research. In addition, most reported P values were statistically significant according to the standard threshold of P ≤ .05, which seems likely to reflect selective (biased) reporting.

Several general recommendations stem from these results. First, more detailed quantitative reporting of key statistical results is needed in both abstracts and main texts of HPER publications. Basic descriptive results (which, depending on the situation, might include group means, proportions, and/or effect size measures, such as differences between means, relative risks, odds ratios, regression parameter estimates, and correlation coefficients) should be reported to allow readers to evaluate educational or clinical significance, which is not automatically conferred by statistical significance. Notably, it remains common for P values to be reported without these basic results, especially in abstracts. In addition, confidence intervals offer far more information to readers than P values alone, and the fact that only a minority of HPER publications report confidence intervals represents a methodological limitation the field must improve. Confidence intervals around effect size measures are particularly important (i.e., more important than confidence intervals around the group means or proportions). As the abstract may be the only part of a publication many readers will review, it is important that these core quantitative results be included there.

Second, selective reporting of (typically) positive results provides a biased view of scientific investigations and their results. We recommend all analyses planned according to the research protocol be reported, ideally in the main text but at least in supplementary files. Of course, this requires that a research protocol exists to guide the study and analyses in the first place. To further reduce this bias, authors, reviewers, and editors should avoid automatically dismissing “negative” studies, and base their appraisal on a study’s scientific relevance and methodological rigor, including reporting of confidence intervals.

Third, sample size and power considerations remain quite uncommon in HPER publications. These should be integral elements of research protocols and reports. The fact that most reported P values are statistically significant, despite low power, for the majority of published HPER further highlights the degree of publication bias likely affecting the field.

Although there is progress to be made in HPER reporting it is notable that the reporting of these basic statistical results has generally improved over the last several decades. With continued attention to these issues, we are optimistic that HPER publications can match or even exceed the reporting quality of other biomedical research. We summarize our key recommendations below.

DO:

Report basic descriptive results summarizing study data (e.g., group means, proportions, measures of variability).
Report effect sizes (e.g., differences between means, relative risks, odds ratios, regression parameter estimates, correlation coefficients).
Report confidence intervals, especially around effect sizes.
Thoughtfully plan hypothesis tests, and account for all planned analyses in the final report.
Distinguish statistical from educational significance.

DON’T:

Rely on P values alone to report study results.
“Cherry-pick” statistically significant P values for reporting.
Dismiss statistically nonsignificant P values from methodologically sound and adequately powered studies.
Conduct and report power analyses after data have been collected.

Discover more from AM Rounds