Partly Censored Data, Cox Hazard Regression and Their Application to Breast Cancer Free Essay Example

Introduction

Statistical problems emerge when examining the occurrence of events and the occurrence of time in a population. An event in this context involves the qualitative transformation of the observed person occurring at a specific period (Emmert-Streib & Dehmer, 2019). In the health care settings, the event can be the time until death or cure of a person, computed from a specific treatment or disease onset. Statistical analysis in these contexts entails survival analysis that is used when examiners are interested in the time until an event occurs (Emmert-Streib & Dehmer, 2019). Survival analysis includes different analysis techniques for examining data with time as the outcome variable. Time in these instances corresponds to the period until a specific event occurs. Examples of events include heart attack, death, product wear out, or parole violation.

Based on these different examples, it is apparent that different fields, such as behavioral sciences, social sciences, marketing, engineering, medicine, and biology use survival analysis (Zhang et al., 2013). For example, survival analysis can be used to determine the sample size in a clinical trial when the test requires the comparison of the mean or a specific percentile concerning the survival distribution (Emmert-Streib & Dehmer, 2019). The basis of the approach to this test is the accelerated failure time model, which can be applied directly to design reliability studies when comparing the reliability of differentially manufactured products. Survival analysis can also be used to examine the failure of mechanical devices and among couples treated for fertility issues to model the time to pregnancy. Another instance of the application of survival analysis is in engineering in which it can be used to test the durability of electrical or mechanical components where the researcher use the method to track items and the life span of material to predict the reliability of the product (Fauzi, Elfaki & Ali, 2015, p.48) these examples show that survival analysis explores and simulates the time for an event to happen and the changes in survival probability over time. Estimation is based on data of participants offering information about the event time. The exact starting point and ending point are not required since observations do not always begin at zero as a participant can enter into the study at any time. Time is relative where all participants are placed to a common initial point in which the time is zero and all participants have survival probabilities equal to one (Emmert-Streib & Dehmer, 2019).

It’s time to jumpstart your paper!

Delegate your assignment to our experts and they will do the rest.

Get custom essay

The uniqueness of survival data concerns that not all participant’s experience the event (like a heart attack) towards the end of the observation time, which means that for some patients their real survival times will be unknown. In turn, this creates the censoring phenomenon, which must be considered during the analysis to ensure valid inferences. Censoring is a factor that complicates the estimation of survival analysis as it causes incomplete information. Censorship, nevertheless, allows the examiner to compute lifetimes for participants who have not experienced the targeted event. Notably, the participants who did not experience the targeted event must be part of the investigation because eliminating biases influence the outcomes of every participant experiencing the targeted event. They must be included in which they can be separated from those who experienced the targeted event through a variable indicating censorship.

Survival analysis uses different censoring techniques. It is crucial to note that censoring is independent of the future importance of the threat for a specific participant (Schober & Vetter, 2018). Right censoring occurs when the participant enters at the beginning of the examination and terminates before the targeted event happens. The participants may not experience the event by staying longer than the examination period or may not have been part of the examination in which they leave early without experiencing the event. Left censoring occurs when the analyst fails to observe the birth event. It is vital to mention the idea of length-biased sampling that happens when the study objective is to analyze the participants who experienced the event already to examine whether they will undergo the event again. Interval censoring is experienced when the period between observations or the follow-up time is discontinuous, which can be quarterly, monthly, or weekly. Left truncation or late entry happens when participants may have experienced the targeted event before being examined.

Partly interval-censored data entail interval-censored observations and exact observations, which mostly occurs in health studies and clinical trials requiring periodic follow-ups with patients (Guure, Ibrahim & Adam, 2006). Here, the failure period is observed exactly for a percentage of participants while the failure period for the remaining participants occurs within a specific period.

It is also vital to note that most survival times are skewed, which limits the effectiveness of analysis techniques that are based on normal data distribution. In turn, this emphasizes the importance of examining statistical techniques for analyzing time-to-event information. Examples of these techniques include parametric methods (Weibull, exponential, log-logistic, and log-normal, Gompertz) (George, Seals & Aban, 2014) non-parametric (Kaplan Meier, Nelson-Alan, life table), and semi-parametric (cox proportional hazard) (Abbas et al., 2019). These models impose varying distributional propositions on the hazard. The final decision, nevertheless, regarding their application is based on the specific research question, how the model fits the actual data, and other practical matters such as challenges when approximating with the available interpretability and software. Parametric models, for example, assume that a survival function is based on a parametric distribution such as a Weibull distribution or an exponential distribution. The advantage of parametric models is that they make the survival functions smooth. It is easy to suggest the behavior of these models rather than using a technique to make the functions smooth after initially estimating the function. Covariates can also be integrated easily in a parametric technique and inference method (Abbas et al., 2019). The only drawback is that parametric models must describe the data effectively, which may be true or untrue since methods such as visualization techniques or hypothesis testing may be required for testing the model (Abbas et al., 2019). Non-parametric models entail non-parametric density assessments in the availability of censoring. The benefit of the model lies in its flexibility and the ability of its complexity to develop with the observation numbers. The main drawback of the model concerns the difficulty in integrating covariates, which makes it challenging to explain how the survival functions of people differ. Another disadvantage is that survival functions are not smooth. The semi-parametric model deals with the integration of covariates issues. The model breaks the instantaneous risk or hazard into a non-parametric baseline that all participants share and a relative risk that explains how each covariate influences risk (Abbas et al., 2019). In turn, this leads to a time-varying baseline risk and enables patients to possess various survival functions in the same fitted model. The drawback of the model is that the survival function is not smooth. Besides, for correct inferences and good predictions, two propositions including proportional hazards and linearity between log-hazard and covariates must be satisfied.

Problem statement and research objectives

To simplify the Cox proportional hazard model for partly interval-censored data

Many practical cases experience partly interval censoring when the involved data comprises of interval-censored observations besides the exact observations. The theory for analyzing partly-interval censored data has been examined over the past decades with several written reviews. It is, nevertheless, common for studies to simplify the structure of the data for partly interval censoring into a standard censoring case by, for example, imputing the censoring interval midpoint. The presence of software for standard censoring might be the main contributing factor for the simplifying practice. Several techniques based on the Cox proportional hazard model have also been created to handle partly interval-censored data to ensure the feasibility of the procedures for use by scientists. While these models have been useful in interpreting data in practice, they pose various challenges due to the existence of two parameters for each covariate. They are also complex. The objective of this study is to present a simplified Cox proportional hazard procedure for examining partly interval-censored data. The study covers different estimating procedures.

To compare the Cox proportional hazard model with Kaplan Meier

Survival analysis offers different methods that can be used to compare the risks of specific events concerning different groups in which the risk varies over time. It is vital to clearly define the beginning and endpoints and note censored observations when measuring survival time. The dominant techniques used for measuring survival times entail the Cox proportional hazards model and the Kaplan-Meier technique. The proportional model enables the inclusion of extra covariate while the Kaplan-Meier method offers a procedure for the estimation of the survival curve using the log-rank test to statistically compare two groups. The assumption for both techniques is that the hazard ratio relating the two groups is constant over time. It is vital to compare the two techniques to discover their weaknesses and strengths to find ways of enhancing them. The second objective is to compare the Cox proportional hazards model with the Kaplan-Meier technique concerning their application in analyzing partly interval-censored data.

To estimate the parameter in the Cox proportional hazard model using maximum likely hood estimator

The design of many large studies focuses on learning about the effects of covariates for different events. The covariates involved are costly to obtain, which increases the cost of studies. Researchers use the Cox proportional hazards model based on time constant covariates to examine survival data. It is, therefore, vital to estimate the parameter in the model with maximum likelihood estimator (MLE) since the MLE usually has a better finite sample features compared to standard partial likelihood procedures for estimation. The MLE also offers increased efficiency when the effects of the covariates are large.

Use the above models for the real application and simulation phase

After presenting the above mentioned models, the paper will use them to simulate partly interval-censored data involving real applications.

Literature Review

Partly interval-censored data

Peto and Peto (1972) mentioned and analyzed data comprising of either exact or interval-censored observations when deriving asymptotically efficient rank invariant test techniques to detect differences between two sets of independent observations. It is possible to correctly observe the failure times for a part of participants using this data while for the other participants the failure times happen in a specific time assessment (Nesi et al., 2015). Partly interval-censored data, thus, comprises of both exactly observed and interval-censored event times (Wu, Chambers & Xu, 2019, p.3) meaning that some targeted events are exactly observed while the remaining events stay within intervals (Fauzi, Elfaki & Ali, 2015, p.48). Partly interval-censored data occurs mostly in situations that entail periodic assessment (Nesi et al., 2015). The Weibull distribution model is used to develop partly interval-censored data (Fauzi et al., 2015, p.48). Some factors such as the estimated size of a sample also affect the accuracy of partly interval-censored data assessment (Nesi et al., 2015). In a study to compare treatment survival functions using the imputation procedure (MI method) for partly interval-censored data based on Weibull distribution assessment, Nesi et al. (2015) established that the accuracy of the estimated sample size affects the power of the sample.

Partly interval-censored data models also require the identification of a sample and a control sample to allow for observations over a vector parameter to be made for both samples times. The proportional hazards model is mostly used in these cases (Lane, Looney & Wansley, 1986). The model then produces a survivor function that estimates the probability of the survival periods in the future (Lane et al., 1986). The term with observed events dominates the likelihood function for partly interval-censored data (Wu et al., 2019, p.4). Disregarding the interval-censored observations from the entire data set, nevertheless, leads to enlarged standard error and estimation bias (Wu et al., 2019, p. 4).

It is challenging to fit the correct model to partly interval-censored data because of factors such as violations of independence and linearity of the data, ignoring vital covariates in studies, which in turn affect variances, biases, and variances’ estimate of the parameter estimates. These factors compel researchers to approximate the model being fitted. Besides, fitting a model may raise both analytic and descriptive value, which emphasizes the importance of avoiding violating the assumptions to ensure that the truth value of the model is high (Binder, 1992, p.140). Imputation methods can also estimate partly interval-censored data with non-parametric approximations (Zyoud, Elfaki & Hrairi, 2016, p.883). The best approaches in this regard entail mean and median imputation and random imputation as they produce better outcomes compared with other techniques (Zyoud, Elfaki & Hrairi, 2016, p.883). Other techniques that can be used to examine partly interval-censored data include the maximum likelihood estimation for Weibull distribution and Weibull probability plotting. The maximum likelihood estimation for Weibull distribution is a vital parameter estimation technique due to its flexibility in usage, invariance property, the ability to approximate its variance routinely through inverting the observed data matric (Fauzi et al., 2015, p.48). Weibull probability plotting is a graphical technique and is beneficial as it presents data, even though it is less accurate than the maximum likelihood model (Fauzi et al., 2015, p.49). The variety of techniques used to examine partly interval-censored data improves the outcomes of the analysis. Another proposed technique to estimate functions for partly interval-censored data is the semi-parametric Cox proportional hazards regression models comprising the weighting technique model and the censoring complete model (Elfaki, Abobakar, Azram & Usman, 2013, pp.347-350). Both models were found to be efficient (Elfaki et al., 2013, p.350). Studies also highlight the importance of the generalized missing data principle in the context of semiparametric models and the application of the generalized profile data for non-identically distributed samples (Elfaki et al., 2013, p.350).

When examining a failure time distribution it is vital to ensure that the sample comprises of both items with known failure time and items with only a lower bound of the failure time. The later items have a censored survival time. Observations that have not failed by the end of the examination or those that are eliminated from the study for other reasons besides failure use censoring (Lane et al., 1986, p.518). When designing the required sample, it is crucial to consider the availability of ties between the survival times observed, which allows for the selection of a fitting model to the data in the study, and the time dependence of variables because the independent variable value stays constant over the study time interval (Lane et al., 1986, p.518).

Cox- proportional hazard model

Cox (1972) developed the proportional hazards model to manage continuous time survival data. The cox proportional hazards model refers to a technique for examining the effect of different variable on the period a specific event takes to occur (Liu, 2017). The assumption behind cox proportional hazards model is that the core hazard rate, not the survival time, represents the covariates’ and independent variables’ function. The formula for the model is

(Liu, 2017) in which,

The hazard function is represented by ) and refers to the probability of the targeted event occurring at time assuming the participant survived at and beyond . The baseline hazard is represented by ) and refers to the hazard for the respective participant given that all independent variables are equal to zero. x 2 x 3 … x k represent covariates while β 1 , β 2 , β 3 ,… β k represent corresponding regression coefficients (Liu, 2017). The hazard ratio (HR) is used to interpret the Cox model. HR refers to the projected hazard function based on two separate values of a predictor variable (George, Seals & Aban, 2014). For instance, en event can possibly occur if the HR is greater than 1 and less probable to occur if the HR is less than 1 (George, Seals & Aban, 2014). The covariates vector is linked to the model in which β represents the unknown parameter and defines the covariates’ effects (Kumar & Klefsjö, 1994). The assumption concerning the multiplicative covariates’ effect and the baseline hazard rate means that the share of the hazard rates of two process variables experienced at time t concerning the set of covariates x 1 and x 2 correspondingly stays constant regarding time and proportional to each other, which demonstrates why the model is referred to as the Cox proportional hazards (Kumar & Klefsjö, 1994).

Cox (1972) found that the proportional hazards model

(Cox, 1972)

was considered to be partly because the function for the partial likelihood used for inferences was considered to be a function of β (the vector of regression parameters). In turn, the researcher is not compelled to handle the baseline hazard function, which leads to efficiency as the resulting β estimator is equivalent asymptomatically to the β estimator offered by the complete likelihood function. The model also lacks underlying assumptions and can estimate the possible failure time, which makes it beneficial in predicting failures (Lane et al., 1986). The explanatory variables affect the model by multiplying the hazard (λ 0 (t)) by the function exp(Z’β) of the explanatory variables’ deviations from their mean values, which is the underlying assumption of the model (Lane et al., 1986). The exponential function exp (Z’β) also simplifies the estimation of the vector of regression parameters (Lane et al., 1986). The model, nevertheless, has several assumption such as the true model differing from the traditional model through missing covariates, dependent observations, nonlinear exponential argument, hazard functions not being proportional, and the assumption that the process producing the censoring in right censored data is separate from the remaining lifetime (Binder, 1992, p.139). The proportional hazards model is not a truly non-parametric model because of its reliance on the vector regression parameter. Its baseline hazard function (λ 0 (t)), nevertheless, is considered to be random without the need for distributional assumptions to estimate it or β. The model is semi-parametric where exp (Z’β) is the parametric part and λ 0 (t) is the semi-parametric part. exp(Z’β) is the independent variable function. The main assumption based on this model is that the independent variable does not change based on the time interval being used as it remains constant over time (Liu, 2017). The model has other assumptions. For example, the proportional hazard assumption states that in a regression based environment, the hazard functions representing survival curves for two or more strata (identified through specific value selections for the interested study) must be proportional over time (constant relative hazard). Based on this assumption, the baseline hazard function is common to all participants in a study, which means that all participants have the same baseline risk (Liu, 2017). Another description is that the baseline hazard function does not rely on the independent variable, different participants in a study are independent, and that an adequate number of participants are needed for making inference (more participants means improved precision) (Liu, 2017).

Compared to other discriminant analysis procedures, the PHM offers extra data about the possible time to an event offered by the model (Lane et al., 1986, p.527). The extra data is contained in the estimated survivor function for a given item with z as an independent variable vector. The estimate offered by the PHM is

(Lane et al., 1986)

In which T refers to the failure time of the event while t refers to the time (Lane et al., 1986, pp.527-16). The PHM should be based on the design approach because violating the assumptions of the model will lead to estimates of β. The design based approach model produces consistent estimate of the exact underlying parameters with few efficiency losses compared to a pure model if the model is universally true for all participants. The model based approach, however, may lead to misleading outcomes if it fails in certain respects (Binder, 1992, p.140). β is model free, which means that the PHM assumptions do not limit it. When the underlying study participants follow the PHM the usefulness of β is enhanced, this in turn does not exempt the researcher from fitting and classifying the applicable explanatory variables (Binder, 1992, p.141). Due to the use of a hazard function the Cox proportional model does not require the analyst to assume a specific survival distribution for the data (George, Seals & Aban, 2014). The model also uses the Aalen-Breslow estimator to estimate the baseline hazard function (George, Seals & Aban, 2014). Studies find the baseline hazard to possess beneficial data because the baseline hazard rate is the reference in a survival model that shifts as a function of time (Royston & Lambert, 2011, p.6). The absolute effect of an exposure relies on the time since the origin and the size of the essential hazard rate even when the proportional hazard assumption is rational (Royston & Lambert, 2011, p.6). Survival analysis analyzes the association between the survival distribution and covariates (Fox, 2002) in which

(Fox, 2002)

Based on the exponential distribution, a model can be derived as

(Fox, 2002)

The exponential distribution of survival times is represented by the constant hazard.

The Cox proportional model, nevertheless, does not specify the baseline hazard function (α( t )= log ( h 0 ( t )) where,

(Fox, 2002)

In which the baseline hazard function h 0 (t) can assume any form with the entry of covariate into the model, making the model semi-parametric. Since the non-parametric part relying on time is the same for both x and t, any two observations with separate x values have hazard ratio independent of t (Fox, 2002).

The Cox proportional hazards model also fits in R using the coxph() function in which the right hand side of the function is similar to that one in a linear model while a survival object that the Surv() function creates is on the left hand side. In turn, the Cox model in R can be used to estimate the distribution of survival times in which the survfit() function approximates S(t). The returned object leads to the graphing of the estimated survival function (Fox, 2002).

The exponential (exp(Z’β)) distribution is used extensively to generate survival times in most simulation analysis, which leads to the underutilization of other distributions (Bender et al., 2005, p.1721). The main reason for this is the lack of obviousness when generating survival times based on pre-specified Cox regression measurements (Bender et al., 2005, p.1722). Bender et al. (2005) developed the general association linking the hazard and the survival time of the Cox proportional hazard model. The relation is based on the formula

(Bender et al., 2005) where,

U refers to a random variable . The formula allows the analyst to transform uniformly distributed random numbers into survival times after the Cox model through inserting the inverse of cumulative baseline hazard function into the formula. The formula can also generate distributed survival times using survival time information for Cox proportional model with Cox exponential model, Cox-Gompertz model, and Cox-Weibull model. The type of distribution for the survival times in all the three models used for the baseline hazard is the type of distribution for the baseline hazard, even though the parameters rely on the covariates x (Bender et al., 2005, p.1715). The technique offers an extensive evaluation of the features of the Cox proportional hazards model. The developed relation can generate survival times using any compatible distribution with the proportional hazards such as Gompertz, Weibull, and exponential distribution (Royston & Lambert, 2011). Simulation studies also ignore the selection of the distribution of developed survival times concerning the Cox model because the traditional Cox model’s partial likelihood is independent from the baseline hazard (Bender et al., 2005, p.1721). Several practical situations, nevertheless, may necessitate using flexible distributions than the exponential distribution when examining the features of the Cox proportional hazards model (Bender et al., 2005, p.1722).

The outcomes of a study may significantly rely on the distribution of the developed survival times when the study violates the basic Cox model assumptions, such as when measurement errors are present. In such situations, using exponential distribution can lead to inaccurate conclusions regarding the attenuation level because of the presence of measurement error (Bender et al., 2005, p.1722). The baseline hazard rate is therefore vital, which requires the use of different distributions besides the exponential distribution to avoid restrictions when examining the features of Cox model estimators. For example, the Weibull parameters can be selected in such a way that the hazards are proportional to enable the computation of the true hazard ratio to compare groups from the parameters before using log (HR) to obtain the true regression coefficient for the Cox proportional hazards model (Bender et al., 2005, p.1714).

Analysts must also check the Cox proportional hazards model assumption because the model is based entirely on it. For instance, if the model is invalid for a group of predictors for a specific dataset, the model may lead to questionable outcomes (George, Seals & Aban, 2014). Solving this issue entails fitting a stratified Cox model that can accommodate a different baseline hazard from stratum to stratum or fitting a model that entail time-varying covariates (Schober & Vetter, 2018,p.14). While the latter deals with different cases, the resulting model may not be interpreted directly. A crucial issue about the Cox model concerns the understanding of the true coefficients in which the impact of the covariates must be translated from the hazards to the survival times. The reason for this is that rather than the hazard function, individual survival time data are required by the software packages regarding the Cox model. It is easy to translate the coefficients from hazard to survival time in the presence of a constant baseline hazard function, which is why the exponential distribution is widely used (Bender et al., 2005, p.1714).

Vaida and Xu (2000, pp.3309-3324) presented a proportional hazards model with random effects in the log relative risk in which the effects affect the design matric subjectively. The model is useful in examining clustered survival data and it is based on the formula:

(Vaida & Xu, 2000)

in which, λ ij ( t ) refers to the jth observation and ith cluster hazard function where i =1…x, j=1…x i ), b i refers to the ith cluster’s random effect, while z ij ,w’ ij refer to the covariate vectors for the random and fixed effects. The formula is based on the proposition that certain proportional hazards model’s regression parameters depend on clusters and can be considered to be random (Vaida & Xu, 2000, p.3310). The formula enables analysts to examine mixed effects in the proportional hazards model and extends the regression space compared to the conventional frailty assumption in which only the baseline hazards experience the random effects (Vaida & Xu, 2000, p.3322).

Studies also show that clustered survival data may emerge from matched pairs in which the log relative risk uses additive random effects and the formulas entails various special cases such as the shared frailty model, the twin model, and the over-dispersion model (Vaida & Xu, 2000, pp.3309-3324). The stratified Cox model is used in some practical cases in which the baseline hazards are assumed to vary among different strata. Vaida and Xu (2000, pp.3309-3324) extends to stratified models immediately in which the parameter numbers do not increase due to various baseline hazards based on the assumption that ties do not exist as the baseline hazards non-parametric maximum likelihood estimate (NPMLE) has masses at the observed periods only (Vaida & Xu, 2000, p.3322). Fitting the stratified model does not increase the computational weight because if practice ties exist, they are usually not substantial. While the variance estimate of the random effects’ variance parameters can be obtained (Vaida & Xu, 2000, pp.3316-3321), the estimate cannot be directly used to examine if random effects exist. Alternatives that can be used include the homogeneity score test and the likelihood ratio with a corrected null distribution that is not a chi-square anymore (Vaida & Xu, 2000, p.3323). Graphical representations of the hazard rate also offer a clear understanding of the subjects being studied than just quoting a hazard ratio (Royston & Lambert, 2011, p.8).

Examining the underlying assumptions of the Cox proportional hazards model for all predictors examined in the model is vital to ensure accuracy. For example, studies recommend plotting the Schoenfeld residual versus time to evaluate the Cox proportional hazards model for a continuous predictor. If random scatters around zero appear in the Schoenfeld residual, then the assumption of the model is valid (Schober & Vetter, 2018, p.15). For categorical predictors, the Kaplan-Meier survival curves’ log-log transformation for various categories can be compared. The curves under the Cox proportional hazards model will be nearly parallel without intersecting after separating (George et al., 2014). It is also vital to consider that crossing may occur at initial points due to noise in the survival approximation, which does not violate the assumption of the proportional model (George et al., 2014). The testing examines the statistical significance of the model to verify whether or not a specific covariate has any substantial impact to the failure of the model. The outcomes of the test lead to the removal of insignificant covariates and the recalculation of β using the significant covariates only (Kumar & Klefsjö, 1994). The model does not assume any specific model even though it is not an actual non-parametric since it assumes that the effects of the predictor variables on survival do not change over time and are additive in a single scale. Another important idea regarding the cox proportional hazards model is that the covariates values can change with time, particularly in follow up situations. There are, therefore, two types of covariate, time-dependent and fixed. Fixed covariates occur if their values do not change with time, for instance race or sex. Time dependent covariates occur if the difference between their values for two separate participants changes with time, for instance cholesterol in serum. Practically, some observations may occur simultaneously, which the classical proportional hazards model cannot handle. In such case, alternative models can be used. The Cox proportional hazards model also faces the issue of collinearity. Fan and Li (2002) developed a smoothly clipped absolute deviation (SCAD) penalty in the Cox proportional hazards model to solve such issues. Fan and Li (2002) extends the nonconcave penalized likelihood method to the Cox proportional hazards model and the Cox proportional hazards frailty model by proposing new variable selection techniques for these models. The technique is represented by the formula:

(Fan & Li, 2002), in which GCV = argmin λ is selected. The freedom level e (λ) represents the number of non-zero penalized partial likelihood estimate that corresponds to the tuning parameter λ. According to Fan and Li (2002), the LASSO technique can possibly produce coefficients with significant bias particularly if the chosen λ is too big. Based on this criticism, Fan and Li (2002) suggest the smoothly clipped absolute deviation (SCAD) penalty that offers enhanced theoretical features compared to the LASSO.

Applications

Tripodi, Kim, and Bender (2010) examined whether being employed is related with criminal behavior for people freed from prison, particularly concerning the amount of time between being free from prison and reincarceration. The study sought to examine the relationship between employment and recidivism for parolees freed from prisons in Texas, whether being employed after being freed from prison is related with reduced potential for re-incarceration, and whether being employed is related with more time to re-incarceration. The researchers analyzed administrative data from a random sample of 250 male parolees freed from prisons in Texas between 2001 and 2005. They obtained pre-prison and in-prison information from the statewide data of the Executive Service Department of the Texas Department of Criminal Justice (TDCJ). They also obtained post-prison information from the Parole Services Department of TDCJ using the case files of the parolees. The researchers then analyzed the combined data for the selected participants. The researchers chose the Cox proportional hazard model to analyze the impact of employment on re-incarceration over time. The model was sufficient for the study since recidivism did not occur for a part of the participants before data collection ended, which censored the data. The study found that while being employed is not related to a substantial reduction in the potential for re-incarceration, being employed is related to a substantial amount of time to re-incarceration. Re-incarcerated parolees who are employed spend more time away from crime in the community before going back to prison.

The study by Benda, Toombs, and Peacock (2002) was about environmental factors that predict the survival of inmates in the community without experiencing recidivism. The objective of the study was to identify the factors that predict the length of time graduates of boot camps remain in the community without being re-incarcerated. Specifically, the study sought to determine the dynamic and static factors that predict re-incarceration or recidivism among boot camp graduates in the Department of Correction. The researchers selected 480 male participants in a boot camp in one southern state through questionnaires administered by a psychologist. The researchers used the questionnaires to collect additional data for the study such as race, marital status, committed offences, return offenses, incarceration time, and age. Besides, the researchers obtained the ratio measurement level of recidivism concerning the survived days in the community. The study used the Cox proportional hazards model to assess the relative recidivism (hazard function) rate throughout the follow-up interval of three years based on the predictors. The researchers chose the model because of its flexibility concerning the reliance of the re-incarceration hazard on time and the ability to allow them to examine the impact of predictors on recidivism. The study found that factors such as the perceptions of inmates regarding boot camps as just an expedient place to early release, resilience, future success expectations, peer association and influence, past criminal history, socio-demographic features, personal attributes, personality, and age at first arrest strongly predicted recidivism.

The study by Benda (2003) was about recidivism among boot camp graduates involving male non-violent offenders. The researchers sought to determine whether adult early starters and late starters in adult boot camps experienced different criminal rate of recidivism, examine the crucial caregiving factors to understand the extent to which they predict criminal recidivism among boot camp graduates, and explore the differences in impacts on criminal recidivism among early and late starters in criminal behaviors. The researchers involved 601 male graduates as participants in the study from the only boot camp in one southern state. The researchers obtained various features of the participants such as age, number of children, legal annual income, education, race, marital status, employment status, family structure, gun carriage, drug selling, and recidivism through questionnaires. The researchers also determined the ratio of the measurement level of recidivism of the survived days in the community. The study used the Cox proportional hazards model to examine the recidivism hazard rate (parole or arrest violation) of different aspects of developmental and general models. The analysis was based on the age at which the participants began engaging in illegal acts. The study found caregiver factors to be inversely related to the recidivism hazard while carrying weapons, drug sales and use, gang membership, peer relationship with criminals, social skills’ deficits, and low self-esteem were found to be positively related to the recidivism hazard. The results were observed irrespective of the age at which participants began engaging in criminal activities.

Another study by Benda, Harm, and Toombs (2005) was about the life-course theory factors that predict recidivism, the gender differences in the predictors, and issues about the impact of boot camp. The aim of the study was to investigate potential gender differences in the components of the life-course theory that predict recidivism; determine whether gender differences in how views of the boot camp program predict recidivism; explore gender differences in the prediction of recidivism abuses that happen at various life span stages; and open discussions regarding the potential detrimental impacts of boot camp. The researchers selected 601 male and 120 female graduates from the only boot camp in one of the southern states. The researchers used two questionnaires to obtain information such as age, education, age for the first arrest, race, childhood physical and sexual abuse, existing sexual and physical abuse, existing living status, job status, gang membership, weapon carriage, and drug selling. The researchers used the Cox proportional hazard model to analyze the gender differences while the non-parametric examination of survival curves was used to explore the time until the first parole violation or felony arrest of participants using standard life table techniques. The study found that specific positive views about the boot camp program were related to low recidivism hazard rates. Present sexual assaults, adolescent physical and sexual maltreatment, and sexual abuse during childhood were also associated with high recidivism hazard rates. Ameliorating experiences such as full-time jobs and the presence of a conventional partner substantially reduced the hazard rates of many examined predictors.

Cloyes, Wong, Latimer, and Abarca (2010) studied the rates of recidivism among offenders suffering from a mental illness and who are returning to prison. The researchers engaged in the study to explore further issues regarding whether specific prisoners with serious mental illness exist at the State prison in Utah, the criteria to be used in identifying this population, and the way to compare with other prisoners. The objective of the study was to determine, measure, and explain the part of the prison population in Utah State Prison between 1998 and 2002 that met the severe mental illness (SMI) criteria and to compute time from prison release to re-incarceration for SMI offenders compared to non-SMI offenders. The researchers involved all individuals released from the Utah State Prison from January 1, 1998, to December 31, 2002, together with all release events that included 14, 621 real meaningful release events and 9,245 unique cases. The researchers also conducted a systematic review of records of all identified cases related to SMI and gathered data concerning mental health intervention in prisons, prison resource use and management, and demographics. The study used the Kaplan-Meier techniques to perform the survival analysis in which time from prison release to re-incarceration for the SMI group was compared to that involving the non-SMI group. The study found substantial differences between the non-SMI and SMI group were due to factors associated with resource use and clinical symptoms, not demographics, release conditions, or offense features. The study also found that SMI offenders had a higher rate of recidivism.

Hill et al. (2008) engaged in a study to identify criminal risk factors by examining forensic psychiatric reports about sexual homicide perpetrators in Germany. The study sought to collect data about the risk factors that predict future sexual homicide; to explore the legal outcomes of the sexual homicide, assess the factors that affect release from prison or a forensic hospital, evaluate the rates of criminal recidivism, and determine the risk factors for violent nonsexual and sexual reoffending. The researchers assessed court reports on 166 men who had been involved in a sexual homicide for the period between 1945 and 1991 to identify clinical, criminal, and socio-demographic factors. The researchers also examined the German federal criminal records for follow-up information regarding the incarceration duration in a forensic hospital or prison following the last sexual homicide and regarding reconvictions and further detentions for 139 offenders. The study used the Kaplan-Meier technique for survival analysis to evaluate the influence of risk factors on the potential to be released and to measure rates of recidivism following release as a function of time at risk. The main findings of the study were that high sexual recidivism was associated with young age at the sexual homicide period while past nonsexual and sexual delinquency, high scores in risk evaluation tools, and psychopathic symptoms led to increased non-sexual violent recidivism. The study also found that high recidivism rate with violent re-offenses was related to age-based factors such as young age during the first sexual offense, at homicide, and during release and detention duration.

The study by Jung, Spjeldnes, and Yamatani (2010) was about the rates of recidivism and survival time among male ex-inmates freed from the Allegheny County Jail in 2003. The study objective was to examine recidivism based on racial disparity among ex-inmates and to explore the relationship between recidivism and race with ex-inmates. The researchers compared recidivism rates across race by first generating inmate historical information concerning their entry and release date documentations. A sample of 12,545 participants was included in which 46.9 per cent were black while 53.1 per cent were white. The study used the Kaplan-Meier involving log-rank tests and the Cox proportional hazard model to explore whether black ex-inmates recidivated within a shorter period than white ex-inmates. The Kaplan-Meier technique compared the survival curves across race while the log-rank test identified the statistical significance of the compared differences. The Cox model investigated racial differences in the risk of recidivism. The study found that the rate of recidivism for three years stood at 55.9 per cent. Black men were also found to experience recidivism at a higher rate compared to white men. The survival analysis also demonstrated the existence of racial disparity in recidivism and the recidivism rate of black male to be within a shorter period than that of the white men. The study also found the covariates and interaction impacts of a race to be substantial.

Mackie et al. (2001) studied post-transplantation alcohol consumption and the risk factors related to recidivism. The objective of the study was to compare survival rates for participants who experienced transplantation for ALD with participants who experienced transplantation for other kinds of chronic liver illnesses. The study also sought to evaluate post-transplantation consumption of alcohol, assess the existing screening procedure, and evaluate the potential risk factors that can be used to identify patients at a higher risk of recidivism. The researchers used a self-report questionnaire to evaluate pre-and post-transplantation alcohol consumption and patient notes to examine recidivism risk factors. The study sample comprised of 49 participants who experienced transplantation for alcoholic liver disease (ALD) between May 1996 and November 1999 and 49 participants who experienced transplantation due to non-alcohol induced chronic liver illness for comparison objectives. The study used the Kaplan-Meier technique to determine survival rates for 1- and 2 years while the log-rank test compared the rates. The study found high rates of recidivism, even though most participants did not drink heavily at a damaging level. The study also found that participants in the ALD group who consumed alcohol took a long time to do so in comprison to participants outside the ALD group, even though participants who returned to heavy drinking in both groups did so rapidly. Women were also found to experience low recidivism rates than men while age and socioeconomic status had no significant effect. Divorce was the only social risk factor that significantly influenced recidivism rates.

The study by Ostermann (2015) was about the post-release life of all former inmates using the existing information for those freed from prison in 2006, in New Jersey. The study sought to examine the performance of former inmate in their transition back into the community. The researchers used three recidivism indicators including technical parole violations, a conviction for new crimes, and arrest for new crimes. The researchers grouped participants into sets based on the release mechanism experienced such as unconditional, mandatory parole, and discretionary release. The study used the Cox proportional hazards model to separate the impact of parole supervision while controlling for identified post-release recidivism predictors. The study found that inmates freed to supervision after a three-year follow-up engaged less in new offenses compared to those freed unconditionally. A high percentage of paroled inmates recidivated immediately after being released.

Rainforth, Alexander, and Cavanaugh (2003) examined recidivism rates among former inmates who learned about the Transcendental Meditation (TM) technique in a prison in California. The study sought to explore participants from the Bleick and Abrams study who incarcerated at Folsom Prison by tracking their re-offending rate for 15 years following their release. 120 inmates at Folsom Prison learned the TM technique between 1975 and 1982. The inmates had been paroled by October 1982. The researchers selected 128 non-meditating participants as the control group. The researchers also obtained extra background and demographic data for both participants including rule violations before entering the study, period served during the considered term, past commitment record, age at parole, age at first commitment, age at first arrest, drug abuse history, military discharge and service, employment history, educational achievement, IQ, marital status, and ethnicity. Cox proportional hazards model was used to estimate the relative reduction in recidivism risk due to treatment to measure the size of the treatment impact. The study also used a split population technique based on the Weibull distribution to describe the data for both groups in the study. The study found that TM led to permanent rehabilitation instead of just postponing the commencement of re-offending. The TM group also experienced less severe re-offending compared to the control. TM combined with group therapy significantly reduced recidivism compared to TM alone and group therapy alone.

References

Abbas, S. A., Subramanian, S., Ravi, P., Ramamoorthy, S., & Munikrishnan, V. (2019). An Introduction to Survival Analytics, Types, and Its Applications. Biomechanics , 33.

Benda, B. B. (2003). Survival analysis of criminal recidivism of boot camp graduates using elements from general and developmental explanatory models. International Journal of Offender Therapy and Comparative Criminology , 47 (1), 89-110.

Benda, B. B., Harm, N. J., & Toombs, N. J. (2005). Survival analysis of recidivism of male and female boot camp graduates using life-course theory. Journal of Offender Rehabilitation , 40 (3-4), 87-113.

Benda, B. B., Toombs, N. J., & Peacock, M. (2002). Ecological factors in recidivism: A survival analysis of boot camp graduates after three years. Journal of Offender Rehabilitation , 35 (1), 63-85.

Bender, R., Augustin, T., & Blettner, M. (2005). Generating survival times to simulate Cox proportional hazards models. Statistics in medicine , 24 (11), 1713-1723.

Binder, D. A. (1992). Fitting Cox's proportional hazards models from survey data. Biometrika , 79 (1), 139-147.

Cloyes, K. G., Wong, B., Latimer, S., & Abarca, J. (2010). Time to prison return for offenders with serious mental illness released from prison: A survival analysis. Criminal Justice and Behavior , 37 (2), 175-187.

Cox, D. R. (1972). Regression models and life ‐ tables. Journal of the Royal Statistical Society: Series B (Methodological) , 34 (2), 187-202.

Elfaki, F. A. M., Abobakar, A., Azram, M., & Usman, M. (2013). Survival Model for Partly Interval-Censored Data with Application to Anti D in Rhesus D Negative Studies. World Academy of Science, Engineering and Technology, International Journal of Biological, Biomolecular, Agricultural, Food and Biotechnological Engineering , 7 (5), 347-350.

Emmert-Streib, F., & Dehmer, M. (2019). Introduction to Survival Analysis in Practice. Machine Learning and Knowledge Extraction , 1 (3), 1013-1038.

Fauzi, N. A. M., Elfaki, F. A. M., & Ali, Y. (2015). Some Method On Survival Analysis Via Weibull Model In the Present of Partly Interval Censored: A Short Review. International Journal of Computer Science and Network Security (IJCSNS) , 15 (4), 48.

Fan, J., & Li, R. (2002). Variable selection for Cox's proportional hazards model and frailty model. The Annals of Statistics , 30 (1), 74-99.

Fox, J. (2002). Cox proportional-hazards regression for survival data. An R and S-PLUS companion to applied regression , 2002 .

George, B., Seals, S., & Aban, I. (2014). Survival analysis and regression models. Journal of Nuclear Cardiology , 21 (4), 686-694.

Guure, C. B., Ibrahim, N. A., & Adam, M. B. (2006). On partly censored data with the Weibull distribution.

Hill, A., Habermann, N., Klusmann, D., Berner, W., & Briken, P. (2008). Criminal recidivism in sexual homicide perpetrators. International Journal of Offender Therapy and Comparative Criminology , 52 (1), 5-20.

Jung, H., Spjeldnes, S., & Yamatani, H. (2010). Recidivism and survival time: Racial disparity among jail ex-inmates. Social Work Research , 34 (3), 181-189.

Kumar, D., & Klefsjö, B. (1994). Proportional hazards model: a review. Reliability Engineering & System Safety , 44 (2), 177-188.

Lane, W. R., Looney, S. W., & Wansley, J. W. (1986). An application of the Cox proportional hazards model to bank failure. Journal of Banking & Finance , 10 (4), 511-531.

Liu, L. (2017). Heart Failure: Epidemiology and research methods . Elsevier Health Sciences.

Mackie, J., Groves, K., Hoyle, A., Garcia, C., Garcia, R., Gunson, B., & Neuberger, J. (2001). Orthotopic liver transplantation for alcoholic liver disease: a retrospective analysis of survival, recidivism, and risk factors predisposing to recidivism. Liver Transplantation , 7 (5), 418-427.

Nesi, C. N., Shimakura, S. E., Ribeiro Junior, P. J., & Mio, L. L. M. D. (2015). Survival analysis: a tool in the study of post-harvest diseases in peaches. Revista Ceres , 62 (1), 52-61.

Ostermann, M. (2015). How do former inmates perform in the community? A survival analysis of rearrests, reconvictions, and technical parole violations. Crime & Delinquency , 61 (2), 163-187.

Peto, R., & Peto, J. (1972). Asymptotically Efficient Rank Invariant Test Procedures. Journal of the Royal Statistical Society: Series A (General) , 135 (2), 185-198.

Rainforth, M. V., Alexander, C. N., & Cavanaugh, K. L. (2003). Effects of the transcendental meditation program on recidivism among former inmates of Folsom Prison: Survival analysis of 15-year follow-up data. Journal of Offender Rehabilitation , 36 (1-4), 181-203.

Royston, P., & Lambert, P. C. (2011). Flexible parametric survival analysis using Stata: beyond the Cox model.

Schober, P., & Vetter, T. R. (2018). Survival analysis and interpretation of time-to-event data: The tortoise and the hare. Anesthesia and analgesia , 127 (3), 792.

Tripodi, S. J., Kim, J. S., & Bender, K. (2010). Is employment associated with reduced recidivism? The complex relationship between employment and crime. International Journal of Offender Therapy and Comparative Criminology , 54 (5), 706-720.

Vaida, F., & Xu, R. (2000). Proportional hazards model with random effects. Statistics in medicine , 19 (24), 3309-3324.

Wu, Y., Chambers, C. D., & Xu, R. (2019). Semiparametric sieve maximum likelihood estimation under cure model with partly interval censored and left truncated data for application to spontaneous abortion. Lifetime data analysis , 25 (3), 507-528.

Zhang, W., Ota, T., Shridhar, V., Chien, J., Wu, B., & Kuang, R. (2013). Network-based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLoS computational biology , 9 (3), e1002975.

Zyoud, A., Elfaki, F. A., & Hrairi, M. (2016). Nonparametric estimate based in imputations tech- niques for interval and partly interval censored data. Science International (Lahore) , 28 (2), 879-884.

Partly Censored Data, Cox Hazard Regression and Their Application

Problem statement and research objectives

Related essays

Scatter Diagram: How to Create a Scatter Plot in Excel

Calculating and Reporting Healthcare Statistics

Survival Rate for COVID-19 Patients: A Comparative Analysis

5 Types of Regression Models You Should Know

The Motion Picture Industry - A Comprehensive Overview

Spearman's Rank Correlation Coefficient (Spearman's Rho)

Running out of time?