ESTIMATE gives population-based estimates of the risk of death for hormone receptor-positive (HR+) and triple-negative (TNBC) breast cancer patients based on specific characteristics. Estimates are based on data from the Surveillance, Epidemiology, and End Results (SEER) program. This tool should not be used to make risk predictions for a given individual, but rather can be used to learn about mortality risks among given subgroups. For more information about this tool please visit the About section.
Patients using this tool should consult their doctors.
Click below to explore BCSM and non-BCSM among 264,237 patients with HR+ breast cancer diagnosed 1990-2006. Mortality estimation is available up to 20 years post-diagnosis.
Click below to explore BCSM and non-BCSM among 37,293 patients with TNBC diagnosed 2010-2017. Mortality estimation is available up to 7 years post-diagnosis.
Select among the following characteristics, and specify the timeframe for estimation. Click “Calculate” when ready. This button must be used to re-estimate.
Select among the following characteristics, and specify the timeframe for estimation. Click “Calculate” when ready. This button must be used to re-estimate.
The ESTIMATE (ESTImating Mortality in breAsT cancEr) tool provides a population-based, non-parametric estimate of the cumulative risk of breast cancer-specific mortality (BCSM), non-BCSM, and all-cause mortality for women with HR+ or triple-negative (TNBC) breast cancer based on specific characteristics, and for flexible periods of time after initial diagnosis. The user interface (UI) enables selection from prognostic variables including age at diagnosis, tumor size (T), nodal status (N), and tumor grade, as well desired the estimation timeframe, and the option to include 95% confidence intervals. To define the estimation timeframe, the user inputs the number of years the patient has survived since initial diagnosis and the number of subsequent years over which to estimate risk. These inputs are synergistic, such that the user can only estimate up to 20 years post-diagnosis for HR+ patients, or 7 years for TNBC patients. Based on the specified inputs, ESTIMATE subsets the full study cohort and estimates cumulative incidence of death in the chosen subgroup. Point estimates and plots of the cumulative incidence functions for BCSM, non-BCSM, and all-cause mortality are provided, each equipped with optional confidence intervals. Brief interpretive descriptions are also included with each set of output. This tool was developed using the shiny package in R.
Data for the HR+ cohort were obtained from the National Cancer Institute’s SEER 18-registry (1973-2016) database. The cohort includes women with a microscopically confirmed HR+ (estrogen- or progesterone receptor-positive) breast cancer diagnosed between 1990 and 2006, such that there is a minimum of 10 years of potential follow-up time. Women with any cancers diagnosed prior to the eligible HR+ breast cancer were exlcuded, because cause of death in this SEER database is attributed to the first cancer, only; thus, if patients with prior cancers were included – and such a patient’s cause of death was at all cancer related – it would be linked to their first cancer diagnosis in SEER, regardless of whether that cancer was the true cause of death. In such a situation, if the actual cause of death was the HR+ breast cancer, but attribution went to a previous cancer, then the breast cancer-related death would not be captured. So, including only patients whose first lifetime cancer diagnosis was the HR+ breast cancer of interest was required in order to capture deaths caused by HR+ breast cancer. However, patients may still have had any number of subsequent cancers, and we acknowledge the potential bias this poses to accurate attribution of cause of death. Please see Limitations for additional discussion. The figure to the right diagrams selection of the HR+ study cohort.
Data for the TNBC cohort were obtained from the SEER 18-registry (2000-2017) database. The cohort includes women with a microscopically confirmed TNBC diagnosed between 2010 and 2017. Women may have had prior non-breast cancer diagnoses, but patients with breast cancer diagnosed prior to the eligible TNBC were exlcuded. This is because the cause of death in this SEER database is attributed to all breast cancer diagnoses when a patient dies of breast cancer, meaning that if a patient were to have a breast cancer diagnosed prior to the TNBC – and the patient died of any breast cancer – attribution would be given to all breast cancer diagnoses. By excluding patients with breast cancer diagnoses prior to the TNBC of interest, we hope to reduce the bias caused by this fact. However, we acknowledge that potential bias still remains due to inclusion of patients with breast cancer diagnoses subsequent to the TNBC of interest, as those cancers too will receieve attribution in the case of any breast cancer-related death. Please see Limitations for additional discussion. The figure to the right diagrams selection of the TNBC study cohort.
BCSM is defined as the interval from initial breast cancer diagnosis to death from breast cancer, and patients still alive are censored at last follow-up. Similarly, non-BCSM and all-cause mortality are defined as the intervals from initial diagnosis to death from causes other than breast cancer, or any cause (respectively), and patients still alive are censored at last follow-up. The cumulative incidence of BCSM and non-BCSM in are estimated non-parametrically via the method of Gray (1988), implemented in the cmprsk package in R. All-cause mortality was estimated by taking the complement of the Kaplan-Meier survival function, as implemented in the survival package. Despite age at diagnosis being input continuously in the tool, the age entered is categorized into one of the following age groups (in years) for estimation: <40, 40-49, 50-59, 60-69, 70-79, and 80+. Such categorization is necessary so that the subgroups defined can be reasonably sized (and can capture a sufficient number of deaths to yeild reasonable estimates). However, due to the non-parametric approach used here, it is still possible to specify a subgroup and estimation timeframe for which there are too few patients (and too few deaths) in SEER to yeild reasonable estimates. For this reason, estimates are displayed only when there are at least 20 total deaths within the specified estimation timeframe in the subgroup of interest; when there are fewer than 20 deaths, estimates are withheld and an “Inestimable” message is shown.
HR+ breast cancer accounts for up to 78% of all female breast cancers nationally (1). As a result, this cohort is large and relatively diverse with respect to the patient and tumor characteristics included in the tool, so subgroup mortality estimation is widely available. Moreover, estimation for this cohort is particularly flexible as it is available for up to 20 years post-diagnosis. Refer to the demographics table below for descriptive statistics of the HR+ cohort.
In contrast to HR+ disease, TNBC accounts for only about 10% of all female breast cancers nationally (1). As such, the TNBC sample size is considerably smaller than that of the HR+ cohort, and subgroup mortality estimation is not as widely available. TNBC sample size is also limited because SEER only began capturing HER2 status in 2010, and so TNBC status cannot be verified for patients diagnosed before then. In addition to the smaller total sample size, there is less diversity in patient and tumor characteristics in TNBC disease compared to HR+ (due to disease biology), which further limits the availability of estimates for certain subgroups. In particular, grade I and II disease is uncommon in TNBC (comprising only about 2% of this study cohort), as is a T stage outside of T1b-T2. As such, the default settings for Tumor Grade and T have been set to “Grade III/IV” and ‘T2’ – the largest subgroups for each respective characteristic. Refer to the demographics table below for descriptive statistics of the TNBC cohort.
ESTIMATE has a number of important strengths, particularly when compared to other risk tools. First, the use of a large nationally-representative database like SEER allows for greater statistical confidence in estimates, as well enhanced generalizability. Notably, 26% of the sample was aged ≥70 years at diagnosis (N = 68,786), which is an important group of patients who have been historically underrepresented in clinical trials (2). Another important strength is the tool’s accommodation of user-defined time-periods within 20 (HR+) or 7 (TNBC) years, as opposed to providing rigid 0-20 or 0-7 year estimates. This allows for estimation of residual risks of BCSM and non-BCSM after a given amount of time a patient has already survived since initial diagnosis, which is useful for understanding how risks may change over time. Finally, the tool can be valuable in regions of the world where genomic-based prognostic tools are either unavailable, or difficult to access.
There are several limitations to this tool that are important to consider. First, SEER does not collect data on treatments received, so treatment is not accounted for when estimating mortality. Instead, estimates can be thought of as averaged over all of the types of treatment being utilized for the given disease type during the period for which data are available, with the caveat that the treatments used today are somewhat different than those used during some of the time periods available for estimation; this is one of the primary reasons that the tool should not be used to make predictions for given individuals. Secondly, there is potential bias induced by the attribution of cause of death in these SEER databases. As mentioned above, the database used to extract the HR+ cohort attributes cancer-related deaths to the first cancer diagnosis, only, and the database used to extract the TNBC cohort attributes breast cancer-related deaths to all breast cancer diagnoses when multiple diagnoses are present; both such methods of attribution are imperfect, and it is impossible to know unmistakably that the cause of death recorded is correct. Cohort selection was tailored to attempt to mitigate these biases, but surely they remain to some extent. A third limitation is that, similar to treatment received, comorbidities and disease recurrences are also not captured in SEER, and therefore could not be accounted for in mortality estimation even though both are likely to influence mortality. With regard to the HR+ cohort, specifically, an additional limitation is the fact that all patients were diagnosed before 2010, and so their HER2 status – and receipt of HER2-directed therapy – are unknown. However, a prior study evaluating tumor subtypes reported that among all HR-positive breast cancers in SEER, only 13% were HER2+ and 87% were HER2- (3), a distribution that is likely to be similar in the present cohort. Moreover, HER2 is unlikely to have a large role on late recurrence given the low risk of recurrence seen in years 5 to 10 for HR+, HER2+ breast cancer (4). Finally, there are several TNBC cohort-specific limitations, including a limited amount of follow-up time to date (7 years), a modest total sample size, and especially-small sample sizes for certain patient subgroups.