ASA Connect

 View Only
  • 1.  Missing Data

    Posted 09-12-2023 16:37

    Hi,

    I'm looking for resources that discuss the practical aspects of handling missing data. I'm especially interested in hearing what are considered best practices for regression problems with scattered missing data in the explanatory variables. I'll be happy for any leads.

    With thanks and regards,
    David



    ------------------------------
    David Zucker
    Department of Statistics and Data Science
    Hebrew University of Jerusalem
    ------------------------------


  • 2.  RE: Missing Data

    Posted 09-13-2023 06:52

    My usual references for this are Sterne et al 2009 Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls

     and White et al 2011 http://dx.doi.org/10.1002/sim.4067

    If you're working in R, this is a good reference for the mice package: https://stefvanbuuren.name/fimd/

    Stefvanbuuren remove preview
    Flexible Imputation of Missing Data, Second Edition
    View this on Stefvanbuuren >

    Hope that helps!

    Sarah



    ------------------------------
    Sarah Haile
    Zurich, Switzerland
    ------------------------------



  • 3.  RE: Missing Data

    Posted 09-14-2023 10:03

    I recently developed a method similar to MICE, but chaining random forests and doing so for batched subsets of data for gains in computational efficiency. It's a pretty simple extension of missForest (https://academic.oup.com/bioinformatics/article/28/1/112/219101). 

    Here are a few resources: 

    I hope this helps! Good luck. 



    ------------------------------
    Philip Waggoner
    YouGov // Columbia
    https://pdwaggoner.github.io
    ------------------------------



  • 4.  RE: Missing Data

    Posted 09-14-2023 14:03

    The BMJ article (https://www.bmj.com/content/338/bmj.b2393) raises the usual complexities of understanding the definitions of the types of missing.

    The article offers this definition: "Missing completely at random-There are no systematic differences between the missing values and the observed values. For example, blood pressure measurements may be missing because of breakdown of an automatic sphygmomanometer."

    I believe that this is a good approximation but not 100% correct. It is possible to imagine scenarios in which there is no systematic difference between observed and missing data but the fact of missingness is itself predictable, which moves the data from MCAR to MAR.

    For example, imagine that patients who are seen on Monday, Wednesday, or Friday always have their blood pressure taken, whereas Tuesday/Thursday patients only have it taken 50% of the time.  There are no other differences between the two groups and the mean blood pressures do not differ by weekday.

    There would be no systematic differences between the missing values and the observed values, which is stated in the article as the definition of MCAR. *But* the fact of missingness is highly predictable from day of the week.  In my usual interpretation, that makes this data MAR, not MCAR. 

    The Little test for MCAR does not ask about systematic differences nor whether missing values could be predicted, but only whether missingness itself is predictable.  In my example, a chi-square test using weekday (coded as 1 for MWF and 2 for Tu/Th) would strongly predict missingness, so MCAR fails.

    For MAR, the presence of a predictable systematic difference between observed and missing data (the article's definition) appears to be a sufficient but not necessary condition. I'm having a hard time thinking of a counterexample in which there is such a difference between observed and missing that is predictable from other variables but in which missingness is not also predictable. Normally they would go hand in hand. Hence MAR.

    Personally I think most missing data is at least partly MNAR, and there isn't anything we can do about that unless we are able to collect more data on the missing subjects to test for it. So we do analyses that assume the data isn't MNAR and wring our hands about the fact that it probably is.

    Ed



    ------------------------------
    Edward Gracely
    Associate Professor
    Drexel University
    ------------------------------



  • 5.  RE: Missing Data

    Posted 09-13-2023 10:15

    The Panel on Handling Missing Data in Clinical Trials by the U.S. National Research Council provides a very gfood overview with recommendations: The Prevention and Treatment of Missing Data in Clinical Trials (http://www.nap.edu/catalog/12955.html)

    There are aspects specific to clinical trials, but a lot of information is much more general.

    Fridtjof



    ------------------------------
    Fridtjof Thomas, Ph.D.
    Professor Division of Biostatistics
    Department of Preventive Medicine, College of Medicine
    University of Tennessee Health Science Center
    fthomas4@uthsc.edu
    ------------------------------



  • 6.  RE: Missing Data

    Posted 09-13-2023 11:40

    Hi, 

    In addition top the excellent recommendations you have received so far, I highly recommend Applied Missing Data Analysis by Craig Enders. A second edition was published last year and so that edition is pretty up to date with advances in this area of statistics. It is comprehensive, rigorous, and readable. Applied Missing Data

    Appliedmissingdata remove preview
    Applied Missing Data
    Companion website for Applied Missing Data Analysis book published by Craig K. Enders.
    View this on Appliedmissingdata >

    Best wishes,

    Tor Neialnds



    ------------------------------
    Torsten Neilands
    Professor of Medicine
    UCSF Center for AIDS Prevention Studies
    ------------------------------



  • 7.  RE: Missing Data

    Posted 09-13-2023 21:50

    I am no expert on this subject. I have been preparing some course material on this subject which has lead me to explore resources. First, I second the suggestions made so far. I would add the book by Little and Rudin that has helped me understand the theory.    



    ------------------------------
    Stephen Elston
    Principal Consultant, Quantia Analytics, LLC
    Instructor, Havard Extension School
    ------------------------------