Researchers today use a diversity of techniques to address the issue of missingness in longitudinal data. Some techniques are similar while others differ remarkably from each other. Generally, researchers use simulation studies and real data to assess the effectiveness of the used techniques. Examining these techniques enhances the knowledge regarding their appropriateness in different situations and their limitations. The following literature reviews explores these techniques.
Jakobsen et al. (2017) investigated the best practices regarding ways of dealing with missing data during result analysis in randomized clinical trials. Based on their investigation the authors suggested that researchers can validly ignore missing data if the data is below five percent because bias mostly occurs in analyses having more than 10 percent missingness. In this case, mixed models can be used. They can only use multiple imputations if the proportion of the missing data is not very large or over 40 percent as the results, in this case, can generate a hypothesis. The number of imputations must be 5 or higher.
Delegate your assignment to our experts and they will do the rest.
Vidotto et al. (2019) focused on the imputation of longitudinal data and suggested the application of Bayesian mixture latent Markov (BMLM) model. In the study, two simulations and a single application of real data were used to demonstrate the performance of the model. The model considered the first-order Markov chain for the latent state distribution to capture lagged associations in time-varying variables. The researchers found the model to be effective. In general, the mixture model offered a flexible imputation technique that can capture complex associations in data with a simplified model specification. The technique can also be customized for the sampling method used during data collection, which enables it to represent all the required variability to implement the imputations.
De Silva et al. (2020) investigated three multiple imputation techniques including the multivariate normal imputation (MVNI), the twofold fully conditional specification (twofold FCS), and fully conditional specification (FCS) for dealing with missing longitudinal data in which the examination involved the inclusion of sampling weights. The study settings included a longitudinal case study and a simulation study. The researchers found Twofold FCS and FCS to have significant convergence issues concerning the imputation of missing longitudinal data about sampling weights analysis. MVNI, however, performed better except when it was used for the separate imputation in quintile sampling weight sets. MVNI was recommended as an effective model.
Kombo et al. (2016) compared two different imputation techniques, the multivariate normal imputation (MVNI) and the fully conditional specification (FCS) through a simulation study for longitudinal ordinal data with monotone missing data patterns. The study used a robust version of the generalized estimating equation (GEE) and focused only on a missing MAR reaction or MAR covariate. The two techniques were found to be comparable and any of them could be used for multiple imputations. While the study considered missingness on the outcome variable, the authors stated that the two models can be extended to cases involving missing data for covariates and outcomes.
Yamaguchi et al. (2019) conducted simulation analyses to investigate the relative performances of the separate imputation and the simultaneous imputation in the presence of heteroscedasticity between two treatment groups in a randomized controlled trial case. The simulations considered binary and continuous outcome variables in addition to cases involving the generation of incomplete longitudinal continuous data, generation of missing data, sampling designs for complete data, and sample size. The separate imputation was effective in the presence and absence of heteroscedasticity and was recommended for use in RCTs with longitudinal data. The simultaneous imputation, however, led to serious biases with very large or small standard errors.
Shin et al. (2017) investigated how the multiple imputations (MI) and maximum likelihood (ML) procedures concerning missing data performed in longitudinal cases in the presence of latent growth models (LGM). The researchers designed a simulation study with three intermittent missingness processes (MNAR, MAR, MCAR), different multivariate non-normality levels, and small samples. The study demonstrated that multiple imputations performed similar to ML in small samples with non-normality. The study also suggested that maximum likelihood models such as those utilized in random effect approaches and EM algorithm were less subjective than MI in dealing with missing data, particularly if the missingness was MAR or MCAR.
Tan et al. (2018) sought to offer practical recommendations to researchers on how to deal with the issue of missing recurring measurements data in observational research. The authors used a simulation study with data from a real care setting study to demonstrate how different models worked. The researchers suggested that studies should comply with the imputation technique restriction rule to hinder overparameterization from crashing statistical software. When applying the rule under a given imputation model, researchers ought to preserve the correlation model of the data. Additionally, multiple imputations with FCS can be performed to obtain imputed data sets with five iterations based on the sampler per data set.
Erler et al. (2016) described diverse approaches to include a longitudinal result in sequential full Bayesian (SFB) model compared with the multiple imputations with chained equations (MICE). The researchers assessed both techniques with simulation and real data with missing values. They used existing values from other variables to predict missing data for any variable. The study found that scientists can perform imputation and analysis simultaneously through designing the analysis approach together with the incomplete covariates. This approach also ensures compatibility between sub-models. The Bayesian model allows researchers to explicitly specify the joint distribution of the entire data, which supports the utilization of all information available regarding the result in the incomplete covariates’ imputations.
Given the importance and the necessity for a rigorous sensitivity analysis to missing data, Iddrisu and Gumedze (2019) considered three analysis models that had diverse views regarding missing data. The researchers then performed a sensitivity analysis based on these models to the CD4 count missing data in an IMPI trial. They carried out a simulation study to assess how the shared-parameter models performed. Specifically, the researchers examined the pattern-mixture model with multiple imputation (PM-MI) model performances where they found that the model was effective for dealing with missing data in an IMPI trial situation and trials with identical settings. The study results also showed that the PM-MI model can generate subjective estimates for some parameter assessment with an increasing rate of missingness.
Lee et al. (2018) imputed the missing values using a machine learning technique. The authors argued that researchers may not collect biomarker data in a longitudinal study in specific periods for specific patients and that the biomarker data can be subjected to censoring because of detection limits related to LOD. Based on this, the authors designed weighted censored quantile regression (CQR) as their MI approach to account for both missing data and censoring in such situations. The authors performed a simulation study to evaluate the model and applied the model to a real study. They found that the model accounted for the uncertainty regarding the estimation of the unknown missing data and offered a more valid statistical conclusion compared to other MI techniques.
Huque et al. (2018) compared 12 separate multiple imputation models for imputing incomplete longitudinal data with a focus on the LMM analysis model with subject-specific random intercept only. The study found that models based on both LMM (FCS-LMM-LN, FCS-LMM, JM-MLMM-LN, and JM-MLMM) and MI (FCS standard and JM-MVN) consistently predicted the variance and regression component parameters. The comparisons were, however, empirical without theoretical basis to justify the results. The study outcomes were also not generalizable to random intercept and slope analysis approaches.
De Silva et al. (2017) examined the imputation of longitudinal data with missing values. The study focused on evaluating how different MI techniques performed in situations involving incomplete exposure with the non-linear relationship over time. The examined techniques included two-fold FCS, FCS, and MVNI. The authors performed a simulation study to compare these models. The study results also revealed that two-fold FCS was less effective than MVNI and FCS based on adjacent time points. The authors suggested that using algorithms that consider adjacent time overcomes the challenge of reproducing the variability of the activity of interest even when the involved covariates comprise the variables that generate missing data.
References
De Silva, Anurika P., De Livera, A. M., Lee, K. J., Moreno ‐ Betancur, M., & Simpson, J. A. (2020). Multiple Imputation Methods for Handling Missing Values in Longitudinal Studies with Sampling Weights: Comparison of Methods Implemented in Stata. Biometrical Journal . https://doi.org/10.1002/bimj.201900360
De Silva, Anurika Priyanjali, Moreno-Betancur, M., De Livera, A. M., Lee, K. J., & Simpson, J. A. (2017). A Comparison of Multiple Imputation Methods for handling missing Values in Longitudinal Data in the presence of a Time-varying Covariate with a Non-linear association with Time: a Simulation Study. BMC Medical Research Methodology , 17 (1). https://doi.org/10.1186/s12874-017-0372-y
Erler, N. S., Rizopoulos, D., Rosmalen, J. van, Jaddoe, V. W. V., Franco, O. H., & Lesaffre, E. M. E. H. (2016). Dealing with Missing Covariates in Epidemiologic Studies: a Comparison between Multiple Imputation and a Full Bayesian approach. Statistics in Medicine , 35 (17), 2955–2974. https://doi.org/10.1002/sim.6944
Huque, M. H., Carlin, J. B., Simpson, J. A., & Lee, K. J. (2018). A Comparison of Multiple Imputation Methods for Missing Data in Longitudinal Studies. BMC Medical Research Methodology , 18 (1). https://doi.org/10.1186/s12874-018-0615-6
Iddrisu, A.-K., & Gumedze, F. (2019). An Application of a Pattern-Mixture Model with Multiple Imputation for the Analysis of Longitudinal Trials with Protocol Deviations. BMC Medical Research Methodology , 19 (1). https://doi.org/10.1186/s12874-018-0639-y
Jakobsen, J. C., Gluud, C., Wetterslev, J., & Winkel, P. (2017). When and how should Multiple Imputation be used for Handling Missing Data in Randomised Clinical Trials – a Practical Guide with Flowcharts. BMC Medical Research Methodology , 17 (1). https://doi.org/10.1186/s12874-017-0442-1
Kombo, A. Y., Mwambi, H., & Molenberghs, G. (2016). Multiple Imputation for Ordinal Longitudinal Data with Monotone Missing Data Patterns. Journal of Applied Statistics , 44 (2), 270–287. https://doi.org/10.1080/02664763.2016.1168370
Lee, M., Rahbar, M. H., Brown, M., Gensler, L., Weisman, M., Diekman, L., & Reveille, J. D. (2018). A Multiple Imputation Method Based on Weighted Quantile Regression Models for Longitudinal Censored Biomarker Data with Missing Values at Early Visits. BMC Medical Research Methodology , 18 (1). https://doi.org/10.1186/s12874-017-0463-9
Shin, T., Davison, M. L., & Long, J. D. (2017). Maximum Likelihood versus Multiple Imputation for Missing Data in Small Longitudinal Samples with Nonnormality. Psychological Methods , 22 (3), 426–449. https://doi.org/10.1037/met0000094
Tan, F. E. S., Jolani, S., & Verbeek, H. (2018). Guidelines for Multiple Imputations in Repeated Measurements with Time-Dependent Covariates: a Case Study. Journal of Clinical Epidemiology , 102 , 107–114. https://doi.org/10.1016/j.jclinepi.2018.06.006
Vidotto, D., Vermunt, J. K., & Van Deun, K. (2019). Multiple Imputation of Longitudinal Categorical Data through Bayesian Mixture Latent Markov Models. Journal of Applied Statistics , 47 (10), 1720–1738. https://doi.org/10.1080/02664763.2019.1692794
Yamaguchi, Y., Ueno, M., Maruo, K., & Gosho, M. (2019). Multiple Imputation for Longitudinal Data in the Presence of Heteroscedasticity between Treatment Groups. Journal of Biopharmaceutical Statistics , 30 (1), 178–196. https://doi.org/10.1080/10543406.2019.1632878