ASA Connect

View Only

Back to eGroups

Expand all | Collapse all

Missing Data

1. Missing Data

0 Recommend
David Zucker
Posted 09-12-2023 16:37
Hi,

I'm looking for resources that discuss the practical aspects of handling missing data. I'm especially interested in hearing what are considered best practices for regression problems with scattered missing data in the explanatory variables. I'll be happy for any leads.

With thanks and regards,
David

------------------------------
David Zucker
Department of Statistics and Data Science
Hebrew University of Jerusalem
------------------------------
2. RE: Missing Data

0 Recommend
Sarah Haile
Posted 09-13-2023 06:52
My usual references for this are Sterne et al 2009 Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls

and White et al 2011 http://dx.doi.org/10.1002/sim.4067

If you're working in R, this is a good reference for the mice package: https://stefvanbuuren.name/fimd/

Stefvanbuuren remove preview

Flexible Imputation of Missing Data, Second Edition

View this on Stefvanbuuren >

Hope that helps!

Sarah

------------------------------
Sarah Haile
Zurich, Switzerland
------------------------------

Original Message
3. RE: Missing Data

0 Recommend
Philip Waggoner
Posted 09-14-2023 10:03
I recently developed a method similar to MICE, but chaining random forests and doing so for batched subsets of data for gains in computational efficiency. It's a pretty simple extension of missForest (https://academic.oup.com/bioinformatics/article/28/1/112/219101).

Here are a few resources:

the paper: https://link.springer.com/article/10.1007/s00180-023-01325-9

the method / software: https://cran.r-project.org/web/packages/hdImpute/index.html

some vignettes / source code: https://github.com/pdwaggoner/hdImpute

I hope this helps! Good luck.

------------------------------
Philip Waggoner
YouGov // Columbia
https://pdwaggoner.github.io
------------------------------

Original Message
4. RE: Missing Data

0 Recommend
Edward Gracely
Posted 09-14-2023 14:03
The BMJ article (https://www.bmj.com/content/338/bmj.b2393) raises the usual complexities of understanding the definitions of the types of missing.

The article offers this definition: "Missing completely at random-There are no systematic differences between the missing values and the observed values. For example, blood pressure measurements may be missing because of breakdown of an automatic sphygmomanometer."

I believe that this is a good approximation but not 100% correct. It is possible to imagine scenarios in which there is no systematic difference between observed and missing data but the fact of missingness is itself predictable, which moves the data from MCAR to MAR.

For example, imagine that patients who are seen on Monday, Wednesday, or Friday always have their blood pressure taken, whereas Tuesday/Thursday patients only have it taken 50% of the time. There are no other differences between the two groups and the mean blood pressures do not differ by weekday.

There would be no systematic differences between the missing values and the observed values, which is stated in the article as the definition of MCAR. *But* the fact of missingness is highly predictable from day of the week. In my usual interpretation, that makes this data MAR, not MCAR.

The Little test for MCAR does not ask about systematic differences nor whether missing values could be predicted, but only whether missingness itself is predictable. In my example, a chi-square test using weekday (coded as 1 for MWF and 2 for Tu/Th) would strongly predict missingness, so MCAR fails.

For MAR, the presence of a predictable systematic difference between observed and missing data (the article's definition) appears to be a sufficient but not necessary condition. I'm having a hard time thinking of a counterexample in which there is such a difference between observed and missing that is predictable from other variables but in which missingness is not also predictable. Normally they would go hand in hand. Hence MAR.

Personally I think most missing data is at least partly MNAR, and there isn't anything we can do about that unless we are able to collect more data on the missing subjects to test for it. So we do analyses that assume the data isn't MNAR and wring our hands about the fact that it probably is.

Ed

------------------------------
Edward Gracely
Associate Professor
Drexel University
------------------------------

Original Message
5. RE: Missing Data

0 Recommend
Fridtjof Thomas
Posted 09-13-2023 10:15
The Panel on Handling Missing Data in Clinical Trials by the U.S. National Research Council provides a very gfood overview with recommendations: The Prevention and Treatment of Missing Data in Clinical Trials (http://www.nap.edu/catalog/12955.html)

There are aspects specific to clinical trials, but a lot of information is much more general.

Fridtjof

------------------------------
Fridtjof Thomas, Ph.D.
Professor Division of Biostatistics
Department of Preventive Medicine, College of Medicine
University of Tennessee Health Science Center
fthomas4@uthsc.edu
------------------------------

Original Message

6. RE: Missing Data

Recommend

Torsten Neilands

Posted 09-13-2023 11:40

Hi,

In addition top the excellent recommendations you have received so far, I highly recommend Applied Missing Data Analysis by Craig Enders. A second edition was published last year and so that edition is pretty up to date with advances in this area of statistics. It is comprehensive, rigorous, and readable. Applied Missing Data

Appliedmissingdata

remove preview

Applied Missing Data

Companion website for Applied Missing Data Analysis book published by Craig K. Enders.

View this on Appliedmissingdata >

Best wishes,

Tor Neialnds

------------------------------
Torsten Neilands
Professor of Medicine
UCSF Center for AIDS Prevention Studies
------------------------------

Original Message

7. RE: Missing Data

0 Recommend
Stephen Elston
Posted 09-13-2023 21:50
I am no expert on this subject. I have been preparing some course material on this subject which has lead me to explore resources. First, I second the suggestions made so far. I would add the book by Little and Rudin that has helped me understand the theory.

------------------------------
Stephen Elston
Principal Consultant, Quantia Analytics, LLC
Instructor, Havard Extension School
------------------------------

Original Message

ASA Connect

Missing Data

David Zucker09-12-2023 16:37

Sarah Haile09-13-2023 06:52

Philip Waggoner09-14-2023 10:03

Edward Gracely09-14-2023 14:03

Fridtjof Thomas09-13-2023 10:15

Torsten Neilands09-13-2023 11:40

Stephen Elston09-13-2023 21:50

1. Missing Data

2. RE: Missing Data

3. RE: Missing Data

4. RE: Missing Data

5. RE: Missing Data

6. RE: Missing Data

7. RE: Missing Data

Contact Us

Membership

Privacy

Follow Us

ASA Connect

Missing Data

David Zucker09-12-2023 16:37

Sarah Haile09-13-2023 06:52

Philip Waggoner09-14-2023 10:03

Edward Gracely09-14-2023 14:03

Fridtjof Thomas09-13-2023 10:15

Torsten Neilands09-13-2023 11:40

Stephen Elston09-13-2023 21:50

1. Missing Data

2. RE: Missing Data

3. RE: Missing Data

4. RE: Missing Data

5. RE: Missing Data

6. RE: Missing Data

7. RE: Missing Data

Related Content

Missing value imputation in a competing risks setting

Should we impute missing data while presenting descriptive stat?

Propensity Score Matching When Some Data is Missing

Veridical PCS data science (PCS = predictability, computability and stability)

Call for Nominations for the Statistical Computing and Graphics Award

Contact Us

Membership

Privacy

Follow Us