Statistical light bulb
On this page I present my unstructured thoughts about statistics and epidemiology from time to time.
Finding evidence in the context of uncertainty is challenging.
Population and sample (July 4, 2022)
Collinearity and confounding bias (February 11, 2022)
Role of variables in epidemiological studies (February 7, 2022)
Number converter between risks, incidences, percentages and decimals. (February 4, 2022)
Surprisal S-value: let's toss a coin to understand the value of a p-value (February 4, 2022)
Avoid categorization of a continuous variable (February 15, 2018)
Confusion caused by the p-value (November 17,2017)
Principles of statistical analysis (November 15, 2017)
Avoid categorization of a continuous variable
February 15, 2018
Selected literature
- Harrell FE. Regression Modeling Strategies: With Applications, to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, 2nd ed. Heidelberg: Springer.
- Greenland. Dose-response and trend analysis in epidemiology: alternatives to categorical analysis. Epidemiology. 1995;6(4):356-65.
- Bennette C, Vickers A. Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents. BMC Med Res Methodol. 2012;12:21.
- Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006;25(1):127-41.
- Schmidt CO, Ittermann T, Schulz A, Grabe HJ, Baumeister SE. Linear, nonlinear or categorical: how to treat complex associations? Splines and nonparametricapproaches. Int J Public Health. 2013;58(1):161-5.
- Schmidt CO, Ittermann T, Schulz A, Grabe HJ, Baumeister SE. Linear, nonlinear or categorical: how to treat complex associations in regression analyses? Polynomial transformations and fractional polynomials. Int J Public Health. 2013;58(1):157-60.
Conclusions
Avoid categorization of a continuous variable.
- No "biological" reason for steps in effects due to categorization
- Choice of number of categories and boundaries is subjective (arbitrary).
- Loss of statistical information
- Loss of statistical power
- Increasing number of parameters to estimate
- Test of trend over categories is not equal to test of non-linear effects on continuous scale (e.g. in a regression model)
- Measurement error is not reduced by categorization.
- Especially, dichotomizing does not allow modeling non-linear effects.
Back to top