This is an annotated copy of the .Rmd file that generated my slides for the talk at the Clinical Trials Conference at the University of Pennsylvania, April 17 2023. It is intended to be useful for anyone who wants to replicate the analyses done there.

- “A single yes/no is the least amount of information that can be gleaned from a patient”. Charlie Odoroff
- “A single yes/no p-value is the least amount that can be learned from a research study”.
- Learn as much as you can

- Natural extension of the PH model
- They help me to glean more understanding about the disease process under study
- Disease processes are multi-factorial, the path is much more interesting than any single way station.
- Easy to use

The colon cancer data set is found in the survival package. Each subject has 2 rows, one for time to recurrence, the other for time to death. Split this into two data sets, and then create the progression free survival time.

- 929 patients: control, levamiole, levamisole + 5FU
- time to recurrence and death
- HR for progression free survival (PFS): 1, .97, .62 (.50-.78)

Make the multi-state data set. Each subject is described by a set of rows with variable of subject id, time1, time2, state, and covariates. The primary rule for such a data set is that each the rows for each subject describe a physically possible path through the states. - no overlapping intervals (can’t be two places at once) - no gaps in the time course (have to be somewhere) - if you enter a state, the time in state is >0 (no teleporting)

The tmerge routine is a tool to help create such data sets, but it is not necessary. Other than the state or “status” variable, the data has the same form as timed-dependent covariate data sets; many have their own process for creating those. The survcheck routine checks above conditions, and gives a table of the observed transitions.

```
Call:
survcheck(formula = Surv(time1, time2, state) ~ 1, data = cdata,
id = id)
Unique identifiers Observations Transitions
929 1390 915
Transitions table:
to
from recur death (censored)
(s0) 463 43 423
recur 0 409 52
death 0 0 0
Number of subjects with 0, 1, ... transitions to each state:
count
state 0 1 2
recur 466 463 0
death 477 452 0
(any) 423 97 409
```

The first table tells the observed transtions in the data, each row of the second how many subjects made 0, 1, … transitions into that state.

The statefig routine draws simple state space figures. Those who want prettier ones can employ one of the many routines for DAGs. (The annotation below is about as fancy as you can get with this routine.)

Fit the multistate model, and display the coefficients as a graph. (Fitting is easy, all the work was setting up the data set.) For those who know ggplot: feel free to make a much nicer figure.

```
Call:
coxph(formula = Surv(time1, time2, state) ~ lev5 + age10 + extent +
node4, data = cdata, id = id)
1:2 coef exp(coef) se(coef) robust se z p
lev5 -0.52113 0.59385 0.10739 0.10935 -4.766 1.88e-06
age10 -0.04278 0.95813 0.03915 0.04199 -1.019 0.308
extent 0.53143 1.70137 0.11399 0.12263 4.334 1.47e-05
node4 0.82599 2.28413 0.09671 0.10028 8.237 < 2e-16
1:3 coef exp(coef) se(coef) robust se z p
lev5 0.08992 1.09409 0.31145 0.31975 0.281 0.7785
age10 0.95268 2.59264 0.18940 0.18190 5.237 1.63e-07
extent 0.47976 1.61568 0.36679 0.46724 1.027 0.3045
node4 0.80634 2.23969 0.33753 0.35185 2.292 0.0219
2:3 coef exp(coef) se(coef) robust se z p
lev5 0.23120 1.26011 0.11413 0.12111 1.909 0.0563
age10 0.09618 1.10096 0.03989 0.04391 2.190 0.0285
extent 0.09959 1.10472 0.10598 0.11439 0.871 0.3839
node4 0.40838 1.50438 0.10455 0.10256 3.982 6.83e-05
States: 1= (s0), 2= recur, 3= death
Likelihood ratio test=188.9 on 12 df, p=< 2.2e-16
n= 1390, number of events= 915
```

- Hazard ratios are not enough
- \(p_k(t; x)\) = probability in state \(k\) at time \(t\)
- \(E(N_k(t); x)\) = number of visits
to state \(k\)
- P(ever visit state \(k\))

- E(sojourn time in each state) up to time \(t\)
- E(sojourn, per vist)

The set of probability in state curves for a fitted multistate PH
model can be generated for any target set of covariates. (Predictions
are always for a hypothetical *someone*). Below, I dropped age to
make things a bit simpler. Then there is a trick: if we exclude any
observation time after recurrence (lines with td.recur==1), then the
model is a competing risks model and the probability in state for
recurrence will be “ever recurred”. It is an easy way to get the
probability of ever recurring.

The objects csurv2 and csurv will each have the curves for 16 subjects, the number of rows in data set dummy; each curve has information on 3 states of (s0), recurrence and death. (If you don’t provide an explicit name for the initial state, and I didn’t, it uses (s0) as a generic label.) The intermediate printing below was not shown on the slides.

```
data states
16 3
```

```
n nevent rmean
1, (s0) 1390 0 6.5680067
2, (s0) 1390 0 7.0445011
3, (s0) 1390 0 5.7484689
4, (s0) 1390 0 6.4688788
5, (s0) 1390 0 4.6183596
6, (s0) 1390 0 5.6179260
7, (s0) 1390 0 3.2591642
8, (s0) 1390 0 4.4647544
9, (s0) 1390 0 5.1645521
10, (s0) 1390 0 6.0429096
11, (s0) 1390 0 3.8857981
12, (s0) 1390 0 5.0234830
13, (s0) 1390 0 2.5047500
14, (s0) 1390 0 3.7340445
15, (s0) 1390 0 1.3681868
16, (s0) 1390 0 2.3763783
1, recur 1390 463 0.4864469
2, recur 1390 463 0.2532152
3, recur 1390 463 0.6983250
4, recur 1390 463 0.3716350
5, recur 1390 463 0.9418232
6, recur 1390 463 0.5227841
7, recur 1390 463 1.1579902
8, recur 1390 463 0.6891639
9, recur 1390 463 0.6936849
10, recur 1390 463 0.3729134
11, recur 1390 463 0.8824530
12, recur 1390 463 0.5039921
13, recur 1390 463 1.0037567
14, recur 1390 463 0.6281135
15, recur 1390 463 0.9977164
16, recur 1390 463 0.6989545
1, death 1390 452 0.9455464
2, death 1390 452 0.7022837
3, death 1390 452 1.5532061
4, death 1390 452 1.1594862
5, death 1390 452 2.4398172
6, death 1390 452 1.8592899
7, death 1390 452 3.5828457
8, death 1390 452 2.8460817
9, death 1390 452 2.1417630
10, death 1390 452 1.5841771
11, death 1390 452 3.2317489
12, death 1390 452 2.4725250
13, death 1390 452 4.4914933
14, death 1390 452 3.6378420
15, death 1390 452 5.6340969
16, death 1390 452 4.9246672
```

`[1] 0.02260495 0.11410118 0.81700753 0.04628633`

Now draw the barplot without collapsing over extent of disease.