Introduction

This is an annotated copy of the .Rmd file that generated my slides for the talk at the Clinical Trials Conference at the University of Pennsylvania, April 17 2023. It is intended to be useful for anyone who wants to replicate the analyses done there.

Backdrop

  • “A single yes/no is the least amount of information that can be gleaned from a patient”. Charlie Odoroff
  • “A single yes/no p-value is the least amount that can be learned from a research study”.
  • Learn as much as you can

Multi-state models

  • Natural extension of the PH model
  • They help me to glean more understanding about the disease process under study
  • Disease processes are multi-factorial, the path is much more interesting than any single way station.
  • Easy to use

The colon cancer data set is found in the survival package. Each subject has 2 rows, one for time to recurrence, the other for time to death. Split this into two data sets, and then create the progression free survival time.

Colon cancer data

  • 929 patients: control, levamiole, levamisole + 5FU
  • time to recurrence and death
  • HR for progression free survival (PFS): 1, .97, .62 (.50-.78)

Make the multi-state data set. Each subject is described by a set of rows with variable of subject id, time1, time2, state, and covariates. The primary rule for such a data set is that each the rows for each subject describe a physically possible path through the states. - no overlapping intervals (can’t be two places at once) - no gaps in the time course (have to be somewhere) - if you enter a state, the time in state is >0 (no teleporting)

The tmerge routine is a tool to help create such data sets, but it is not necessary. Other than the state or “status” variable, the data has the same form as timed-dependent covariate data sets; many have their own process for creating those. The survcheck routine checks above conditions, and gives a table of the observed transitions.

Call:
survcheck(formula = Surv(time1, time2, state) ~ 1, data = cdata, 
    id = id)

Unique identifiers       Observations        Transitions 
               929               1390                915 

Transitions table:
       to
from    recur death (censored)
  (s0)    463    43        423
  recur     0   409         52
  death     0     0          0

Number of subjects with 0, 1, ... transitions to each state:
       count
state     0   1   2
  recur 466 463   0
  death 477 452   0
  (any) 423  97 409

The first table tells the observed transtions in the data, each row of the second how many subjects made 0, 1, … transitions into that state.

The statefig routine draws simple state space figures. Those who want prettier ones can employ one of the many routines for DAGs. (The annotation below is about as fancy as you can get with this routine.)

Fit the multistate model, and display the coefficients as a graph. (Fitting is easy, all the work was setting up the data set.) For those who know ggplot: feel free to make a much nicer figure.

Call:
coxph(formula = Surv(time1, time2, state) ~ lev5 + age10 + extent + 
    node4, data = cdata, id = id)

        
1:2          coef exp(coef) se(coef) robust se      z        p
  lev5   -0.52113   0.59385  0.10739   0.10935 -4.766 1.88e-06
  age10  -0.04278   0.95813  0.03915   0.04199 -1.019    0.308
  extent  0.53143   1.70137  0.11399   0.12263  4.334 1.47e-05
  node4   0.82599   2.28413  0.09671   0.10028  8.237  < 2e-16

        
1:3         coef exp(coef) se(coef) robust se     z        p
  lev5   0.08992   1.09409  0.31145   0.31975 0.281   0.7785
  age10  0.95268   2.59264  0.18940   0.18190 5.237 1.63e-07
  extent 0.47976   1.61568  0.36679   0.46724 1.027   0.3045
  node4  0.80634   2.23969  0.33753   0.35185 2.292   0.0219

        
2:3         coef exp(coef) se(coef) robust se     z        p
  lev5   0.23120   1.26011  0.11413   0.12111 1.909   0.0563
  age10  0.09618   1.10096  0.03989   0.04391 2.190   0.0285
  extent 0.09959   1.10472  0.10598   0.11439 0.871   0.3839
  node4  0.40838   1.50438  0.10455   0.10256 3.982 6.83e-05

 States:  1= (s0), 2= recur, 3= death 

Likelihood ratio test=188.9  on 12 df, p=< 2.2e-16
n= 1390, number of events= 915 

Absolute risk

  • Hazard ratios are not enough
  • \(p_k(t; x)\) = probability in state \(k\) at time \(t\)
  • \(E(N_k(t); x)\) = number of visits to state \(k\)
    • P(ever visit state \(k\))
  • E(sojourn time in each state) up to time \(t\)
  • E(sojourn, per vist)

The set of probability in state curves for a fitted multistate PH model can be generated for any target set of covariates. (Predictions are always for a hypothetical someone). Below, I dropped age to make things a bit simpler. Then there is a trick: if we exclude any observation time after recurrence (lines with td.recur==1), then the model is a competing risks model and the probability in state for recurrence will be “ever recurred”. It is an easy way to get the probability of ever recurring.

The objects csurv2 and csurv will each have the curves for 16 subjects, the number of rows in data set dummy; each curve has information on 3 states of (s0), recurrence and death. (If you don’t provide an explicit name for the initial state, and I didn’t, it uses (s0) as a generic label.) The intermediate printing below was not shown on the slides.

  data states 
    16      3 
             n nevent     rmean
1, (s0)   1390      0 6.5680067
2, (s0)   1390      0 7.0445011
3, (s0)   1390      0 5.7484689
4, (s0)   1390      0 6.4688788
5, (s0)   1390      0 4.6183596
6, (s0)   1390      0 5.6179260
7, (s0)   1390      0 3.2591642
8, (s0)   1390      0 4.4647544
9, (s0)   1390      0 5.1645521
10, (s0)  1390      0 6.0429096
11, (s0)  1390      0 3.8857981
12, (s0)  1390      0 5.0234830
13, (s0)  1390      0 2.5047500
14, (s0)  1390      0 3.7340445
15, (s0)  1390      0 1.3681868
16, (s0)  1390      0 2.3763783
1, recur  1390    463 0.4864469
2, recur  1390    463 0.2532152
3, recur  1390    463 0.6983250
4, recur  1390    463 0.3716350
5, recur  1390    463 0.9418232
6, recur  1390    463 0.5227841
7, recur  1390    463 1.1579902
8, recur  1390    463 0.6891639
9, recur  1390    463 0.6936849
10, recur 1390    463 0.3729134
11, recur 1390    463 0.8824530
12, recur 1390    463 0.5039921
13, recur 1390    463 1.0037567
14, recur 1390    463 0.6281135
15, recur 1390    463 0.9977164
16, recur 1390    463 0.6989545
1, death  1390    452 0.9455464
2, death  1390    452 0.7022837
3, death  1390    452 1.5532061
4, death  1390    452 1.1594862
5, death  1390    452 2.4398172
6, death  1390    452 1.8592899
7, death  1390    452 3.5828457
8, death  1390    452 2.8460817
9, death  1390    452 2.1417630
10, death 1390    452 1.5841771
11, death 1390    452 3.2317489
12, death 1390    452 2.4725250
13, death 1390    452 4.4914933
14, death 1390    452 3.6378420
15, death 1390    452 5.6340969
16, death 1390    452 4.9246672
[1] 0.02260495 0.11410118 0.81700753 0.04628633

Now draw the barplot without collapsing over extent of disease.