Nonparametric Estimation of Cumulative cause Specific Reversed Hazard Rates under Masked Causes of Failure

In the analysis of competing risks data, it is common that the exact cause of failure for certain study subjects is missing. This problem of missing failure type may be due to inadequacy in the diagnostic mechanism or reluctance to report the exact cause of failure. In the present paper, we consider the nonparametric estimation of cumulative cause specific reversed hazard rates for left censored competing risks data under masked causes of failure. We first develop maximum likelihood estimators of cumulative cause specific reversed hazard rates. We then consider the least squares type estimators for cumulative cause specific reversed hazard rates, when the information about the conditional probability of exact failure type given a set of possible failure types is available. Simulation studies are conducted to assess the performance of the proposed estimators. We illustrate the applicability of the proposed methods using a data set. Abstract


Nonparametric Estimation of Cumulative cause Specific Reversed Hazard Rates under Masked Causes of Failure
Sankaran PG * and Anjana S Department of Statistics, Cochin University of Science and Technology, India There are situations in the analysis of competing risks data where the exact cause of failure for certain subjects is missing. For example, due to inadequacy in the diagnostic mechanism, the experimentalists quite often are uncertain about the true failure type or are reluctance to report any specific value of J for some objects. Dinse was among the first to discuss the uncertainty in the information on failure types while estimating survival due to different failure types [4]. In such contexts, information on failure type is either completely available or not available at all. This problem with two failure types was studied subsequently by Miyakawa (1984), Racine -Poon and Hoel (1984), Lo (1991), Mukerjee and Wang (1993), Goetghebuer andRyan (1990, 1995), Dewanji (1992) and Lu and Tsiatis (2001) [5][6][7][8][9][10][11][12]. Flehinger, et al. (1998) have considered a general pattern of missing failure types for the purpose of estimating survival due to different types, with the strong assumption of proportional hazards due to different types [13]. Flehinger, et al. (1998) emphasized on the parametric modelling with more general case, where the competing risks are not assumed to have proportional hazard functions [14]. Later Dewanji and Sengupta (2003) developed nonparametric maximum likelihood estimator for λ j (t) with missing at random assumption and also proposed a nonparametric estimator for cumulative cause specific hazard rates using counting process approach [15]. Recently, Sen, et al. proposed a semiparametric Bayesian approach for analyzing competing risks survival data with masked cause of death [16]. Hyun, et al. developed a semiparametric proportional hazards model for the cause specific hazard function in the analysis of competing risks data with missing cause of failure [17][18][19][20]. In survival studies, it is likely that some study objects encounter the event of interest before the start of the study. For example, in alzheimer studies, an elder person is at risk of multiple events like dementia and death. However, some subjects are demented before the start of the study, which is an incidence of left censoring. Thus, such studies give rise to left censored competing risks data. The existing models and methodologies for the analysis of competing risks data become inadequate in the presence of left censored observations.

Journal of Biostatistics and Biometric Applications
Sankaran and Anjana presented the analysis of competing risks data using cause specific reversed hazard rates under left censoring [29]. Recently, Sankaran and Anjana introduced a proportional cause specific reversed hazards model for modeling and analysis of left censored competing risks data in the presence of covariates [30]. The problem of missing failure types was studied in literature by various researchers under right censoring [11,12,[18][19][20]. Very Recently, Dewanji, et al. considered the regression problem, in which the cause specific hazard rates may depend on some covariates, and consider estimation of the regression coefficients and the cause specific baseline hazards under the general missing pattern using some semi-parametric models [31]. In many occasions, this problem of missing failure type may also arise under left censoring. Motivated by this, in this paper, we present nonparametric inference procedures for left censored competing risks data when causes of failure are masked.
Cause specific reversed hazard rates The concept referred as reversed hazard rate, defined plays a pivotal role in modeling and analysis of left censored failure time data. The function h(t), which was proposed as a dual to the hazard rate by Barlow (1963), used in many contexts. In parallel systems of independent and identically distributed components, the hazard rate of the system lifetime is not proportional to the hazard rate of the lifetime of each component, however the reversed hazard rate of the system lifetime is proportional to the reversed hazard rate of the lifetime of the each component [21]. For various properties and applications of (1.2), [3,[22][23][24][25][26][27][28].
The present paper is organized as follows. We defined the cause specific reversed hazard rates and study their properties, then discussed the nonparametric estimation of the cumulative cause specific reversed hazard rates under masking. We first formulated the likelihood function and considered the maximum likelihood estimation procedure for the estimation of cause specific reversed hazard rates. We then give least squares type estimator for cumulative cause specific reversed hazard rates, then the simulation studies are carried out to investigate the performance of the estimators. Then we applied the proposed procedures to a data set. Finally the conclusion includes our present work.
Let (T , J) be a pair of random variables as described in introduction Section . Let F (t) be the distribution function of T. The cause specific reversed hazard rate of T is defined as The h j (t) specifies the instantaneous rate of failure of a subject at time t due to cause j given that it failed before time [29]. Denote Fj (t)=P[T ≤t, J =j] as the cumulative incidence function of T. We can write (2.1) as where is the cause specific density of T. We assume that the k failures are mutually exclusive and exhaustive so that a subject can have at most one realized failure time with an identifiable cause. Then marginal reversed hazard rate for T is given by, Now the distribution function for T can be expressed in terms of cause specific reversed hazard rates as, The function h j (t) fully describes the distribution of (T , J ) in multiple failure mode settings. For more inference on (2.1) [29].
Our objective is to develop nonparametric estimators for cumulative cause specific reversed hazard rates and cumulative incidence functions for left censored competing risks data, when some of the causes of failure are masked.
Let X be a non-negative random variable with distribution function F(t) which is left censored by the random variable C . Under left censoring one could observe (T , δ), where T = max(X, C) and δ is the censoring indicator(1 for failure and 0 for censoring).
In addition, we observe the set G {1, 2, . . . , k} representing the possible failure types when δ = 1. If failure occurs, G gives the partial information about the failure type. This information is complete when G is a singleton set.
In (Maximum likelihood estimator), we develop EM algorithm for nonparametric maximum likelihood estimation. In (Weighted least squares estimator), we suggest another procedure for estimation of cumulative cause of specific reversed hazard rates using method of weighted least squares. This method facilitates the analysis when information about the probability that a particular cause is responsible for the failure from a given set of possible causes is available to us.
where A i is the set of individuals failed or censored at time t i . From (3.1) we can see that the nonparametric maximum likelihood estimators for the cause specific reversed hazard rates have masses at m distinct observed failure times s 1 < s 2 < ... < s m . Then, we can write h j (s i ) as the discrete cause specific reversed hazard rate of type j at time s i . Thus, using the identity where D i is the set of individuals failed at time s i , d i is the number of individuals failed at time s i and n i be the number of individual failed up to time s i . We use EM algorithm for finding the maximum likelihood estimator for h j (t). By assuming that the cause of failure of each individual is available, the complete data likelihood can be written as,

Nonparametric estimation
In this section we discussed the nonparametric estimation of cumulative cause specific reversed hazard rates and cumulative incidence functions.

Maximum likelihood estimator
Assume that the missing at random assumption for observing g i ( Little and Rubin (1987)) that is given failure time and failure type, probability of observing g is same for all the types contained in g (Dewanji and Sengupta (2003)) [15,32]. The likelihood function for the observed data can be written as, Following the steps of Dewanji and Sengupta (2003), we can show that the observed information matrix corresponding to (3.2) is positive definite and then the likelihood function (3.2) is concave and has a unique maximum [15]. Thus the EM algorithm mentioned above converges to this unique maximum (Dempster et al. (1977), Wu (1983)) [33,34].
The above mentioned EM algorithm gives the nonparametric estimators, for cause of specific reversed hazard rates. Thus the nonparametric estimators for the cumulative cause specific reversed hazard rates is obtained as The asymptotic variance of the estimator can be obtained from the observed information matrix using the technique given in Louis (1982), as ment ioned in Dewanji and Sengupta (2003) [15,35].
Often, the information about the probability that a particular cause is responsible for the failure from a given set of possible causes is available to us. By incorporating this information, in this subsection, we suggest a nonparametric estimation procedure for cumulative cause specific reversed hazard rates using method of weighted least squares. Suppose that, for each individual, we observe a failure time or censoring time and a set g, representing possible causes of failure.
which is the number of individuals failed due to cause j at time s i and The E step of the algorithm takes the conditional expectation of d ji ʹs or x jr ʹs, given the initial estimate of h j (t i )ʹs and the incomplete observed data. Thus the conditional expectation of x jr , denoted by is given as for j Є g r and 0 otherwise. Then the conditional expectation of d ji is obtained as The M step maximizes the conditional expectation of logLc with respect to to get the better estimates The process is repeated until the estimate converges.
We define the conditional probability of observing g Э j as the set of possible causes, given the failure of the component at time t due to cause j as,

Weighted least squares estimator
When j Є g, then P gj (t) = 0 and for fixed j, =1. Assume that the censoring time and missing mechanism are independent. Thus (3.5) can be written as, (3.6) We now define the reversed hazard rate for failure due to cause j with observed as set of possible causes as, g j  So, that h gj (t) is the product of h j (t) and P gj (t). Thus (3.7) becomes, Then the reversed hazard rate at time t with g observed as set of possible causes can be written as, Taking summation of (3.9) over all g, we get, Then the cumulative reversed hazard rate at time t with g observed as set of possible causes is Note that the probabilities P gj (t) ' are to be estimated in practice. In order to estimate these probabilities in practice, we make a convenient assumption that P gj (t) is independent of time t and henceforth denote as P gj . Using (3.9) we have Annex Publishers | www.annexpublishers.com Volume 1 | Issue 2 Where H * g (t) is the (2 k − 1) × 1 vector of Hg (t)'s, H * g (t)'s, is the k × 1 vector of H j (t)'s and P is the (2 k − 1) × k matrix of P gj 's. To estimate H * g (t), we observe n independently and identically distributed observations (T i , δ i , δ i g i ), i = 1, 2, ..n; where T i = max(X i , C i ), δ i = I(X i = T i ) and g be the set of possible causes associated with the failure of i th individual. Consider the (2 k − 1) dimensional counting process {N g (t)}g Є G, where G contains all the non-empty subsets of {1, 2...k} and N g (t) represents the number of events occurring in (t, τ ), with g as the observed set of possible causes. Assume that the point of reference, τ is far away from the time span of interest. Now we define Following Andersen, et al. (2003) we show in Appendix A that M g (t)'s are the local square integrable martingales [36]. Then, the nonparametric estimator of H g (t) is obtained as, Where Y(t) is the number of failures up to time t, C(u) = I(Y(u) > 0) and t 0 = inf (t; F (t) < 1). The details are given in Appendix A. Now, using martingale central limit theorem we see that (3.14) converges to a Gaussian process with mean H g (t) and variance σ 2 (t), which is consistently estimated by, Using (3.12) and (3.14) we get, Where is the vector of and Є (t) is a vector of Gaussian martingales whose variance is consistently estimated by the matrix diag . Now (3.16) is in the form of a linear model with P to be estimated. Let be a consistent estimator of P. Then by using the principle of weighted least squares, a consistent estimator of H * (t) is, where V(t) is the inverse of the estimated diagonal covariance matrix of , which is given by where G contains all the non-empty subsets of {1, 2, ...k}. We denote f g = P [G = g], and q jg = P [J = j|G = g]. Thus P gj can be estimated using (3.18), from the values of f g and q jg . The information about q jg may be available which can utilized to estimate P gj .
For fixed t, converges in distribution to a (2 k − 1) variate normal with mean H g * (t). Then converges in distribution to a k variate normal with mean H * (t) and variance covariance matrix is estimated consistently by . This asymptotic variance can be used for constructing large sample confidence limits.
The nonparametric estimator of F j (t) is given by where is the estimator of cumulative reversed hazard rate for T . Note that the least squares estimator of H j (t), may not be monotone and it may violate non increasing nature at some points. To develop the estimator with monotonic decreasing property one can use the pooling-the-adjacent-violators algorithm.
Simulation study To asses the performance of the estimators, we carried out a simulation study. Suppose there are two causes of failure. We generate random samples from the following parametric family of sub-distribution functions proposed by Dewan & Kulathinal [37].
Let  ) where b is chosen such a way that approximately 20% of the observations are left censored. We generate random sample of sizes n = 100, and 250. The masked set g = {1, 2} is randomly allocated to the observed lifetimes so that the chance for an observed lifetime to be masked is 0.5. We evaluated the estimates of H 1 (t) and H 2 (t) using two different methods in Nonparametric estimation.
We first computed the maximum likelihood estimates of the H j (t), j = 1, 2. Based on 1000 simulation, we compute absolute bias and mean squared error (MSE) of the estimates for different parametric values of λ, a and ϕ. Simulation study shows that the bias and MSE of the estimates do not vary much with different values of λ, a and ϕ. We therefore present results for two parametric combinations of λ, a and ϕ, which are given in Tables 1 and 2. (ii) q 1g = 0.992 and q 2g = 0.008 (greater probability for cause 1 ) and (iii) q 1g = 0.008 and q 2g = 0.992 (greater probability for cause 2) These q jg 's are used to estimate P gj 's. Based on 1000 replications, we computed absolute bias and MSE of the estimates of H j (t), j = 1, 2. Tables 3-5 provide bias and MSE of least squares estimates for different sample sizes. Simulation study shows that both bias and MSE decreases as sample size increases and slightly increases as censoring percentage increases. Moreover, as the time increases bias reduces. It may due to the fact that the left tail of observations are more affected by the censored observations. It may be noted that in the case of least squares estimator, bias is small and pretty close to each other for and when q jg 's are equal.  When probability of occurrence of failure due to cause 1 is larger than that of cause 2 (q 1g > q 2g ), then the bias is slightly less for compared to , as expected. Similarly, bias shows lesser values for if probability of occurrence of failure due to cause 2 is larger than that of cause 1(q 2g > q 1g ).

Data analysis
To illustrate the proposed methodology, we consider the hard drives data given in Flehinger, et al. (1998) [13]. The data provide the failure times of hard drives of computer and corresponding cause of failures. There are 3 causes of failure denoted as 1, 2 and 3. We assume these causes act independently. All together 172 failures are reported in the study period, in which some of the failures are masked. The data are obtained from a two stage experimental procedure and we combine the data of two stages and apply the methods given in Nonparametric estimation to the data. Exact cause of failure is available for 66 out of 172 hard drives. The only observed masked group were {1, 2, 3} and {1, 3}.
The data set presented in first four columns of table given in Appendix A of Flehinger, et al. (1998) [13]. In table, second column represents the failure time. The third column (outcome) gives the cause of failure if it is identified correctly or resolved in Stage 2.
Here -1 in third column indicates the unresolved problems. Fourth column gives the information about masking. We make the 20% of data randomly to be left censored. We first consider Maximum likelihood estimator method and computed the maximum likelihood estimates of H j (t) j = 1, 2, 3.
In order to illustrate the Weighted least squares estimator method for calculating the least squares estimates of H j (t) j = 1, 2, 3, we require the probabilities P gj . We consider P  [13]. Now we compute the least squares estimates of H j (t) j = 1, 2, 3. We use pooling-the-adjacent-violators algorithm to achieve the monotonicity of the least squares estimates. Figure 1 depicts the plots of maximum likelihood estimates and least squares estimates of H j (t) j = 1, 2, 3 along with 95% confidence limits. The plots of estimates of cumulative incidence functions for three different causes are given in Figures 2 and 3. Figures 2 and 3 show that, at early stages, the majority of the failure is due to cause 1 and after a certain time period (around t=2.7), failure due to cause 3 dominates.

Conclusion
We thank the editor and reviewer for their valuable comments and suggestions. The second author would like to thank Department of Science and Technology, Government of India for providing financial support for this work under INSPIRE fellowship.