The BMJ article (https://www.bmj.com/content/338/bmj.b2393) raises the usual complexities of understanding the definitions of the types of missing.
The article offers this definition: "Missing completely at random-There are no systematic differences between the missing values and the observed values. For example, blood pressure measurements may be missing because of breakdown of an automatic sphygmomanometer."
I believe that this is a good approximation but not 100% correct. It is possible to imagine scenarios in which there is no systematic difference between observed and missing data but the fact of missingness is itself predictable, which moves the data from MCAR to MAR.
For example, imagine that patients who are seen on Monday, Wednesday, or Friday always have their blood pressure taken, whereas Tuesday/Thursday patients only have it taken 50% of the time. There are no other differences between the two groups and the mean blood pressures do not differ by weekday.
There would be no systematic differences between the missing values and the observed values, which is stated in the article as the definition of MCAR. *But* the fact of missingness is highly predictable from day of the week. In my usual interpretation, that makes this data MAR, not MCAR.
The Little test for MCAR does not ask about systematic differences nor whether missing values could be predicted, but only whether missingness itself is predictable. In my example, a chi-square test using weekday (coded as 1 for MWF and 2 for Tu/Th) would strongly predict missingness, so MCAR fails.
For MAR, the presence of a predictable systematic difference between observed and missing data (the article's definition) appears to be a sufficient but not necessary condition. I'm having a hard time thinking of a counterexample in which there is such a difference between observed and missing that is predictable from other variables but in which missingness is not also predictable. Normally they would go hand in hand. Hence MAR.
Personally I think most missing data is at least partly MNAR, and there isn't anything we can do about that unless we are able to collect more data on the missing subjects to test for it. So we do analyses that assume the data isn't MNAR and wring our hands about the fact that it probably is.
Ed
------------------------------
Edward Gracely
Associate Professor
Drexel University
------------------------------
Original Message:
Sent: 09-13-2023 06:52
From: Sarah Haile
Subject: Missing Data
My usual references for this are Sterne et al 2009 Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls
and White et al 2011 http://dx.doi.org/10.1002/sim.4067
If you're working in R, this is a good reference for the mice package: https://stefvanbuuren.name/fimd/
Stefvanbuuren | remove preview |
|
Hope that helps!
Sarah
------------------------------
Sarah Haile
Zurich, Switzerland
Original Message:
Sent: 09-12-2023 16:36
From: David Zucker
Subject: Missing Data
Hi,
I'm looking for resources that discuss the practical aspects of handling missing data. I'm especially interested in hearing what are considered best practices for regression problems with scattered missing data in the explanatory variables. I'll be happy for any leads.
With thanks and regards,
David
------------------------------
David Zucker
Department of Statistics and Data Science
Hebrew University of Jerusalem
------------------------------