Survival Analysis with Cox Model Implementation using Adaptive COVID-19 Treatment Trial (ACTT) Data¶

Data Source: NIAID Accessing NIAID Clinical Trials Data Commons¶

Related study publication: https://pubmed.ncbi.nlm.nih.gov/32445440/¶

Related clinical trial: https://clinicaltrials.gov/ct2/show/NCT04280705¶

Fan Wang¶

June 20 2022¶

In this notebook, we explore the clinical data from Adaptive COVID-19 Treatment Trial (ACTT) to evaluate the clinical efficacy of remdesivir relative to the control arm in patients hospitalized with COVID-19 as assessed by the time to recovery.

This notebook showcases the ability to do exploratory analysis within the NIAID Accessing NIAID Clinical Trials Data Commons. The analysis is not intended to constitute advice nor is it to be used as a substitute for decision making from a professional. The data used in this notebook is controlled access and can be shared securely by granting the users access to this data in the NIAID Clinical Trials Data Commons. Please note that the data used in this notebook is CONTROLLED ACCESS DATA, which would need access to project ACTT in NIAID AccessClinicalData Commons for this notebook to run.

Table of contents¶

  1. Study design and data description.

  2. Demographic and clinical characteristics of the patients at baseline.

  3. Kaplan–Meier estimates of cumulative recoveries.

  4. Recovery rate ratios and hazard ratios calculated from the stratified Cox model.


Study design and data description¶

1. Study design¶

ACTT-1 (Adaptive COVID-19 Treatment Trial) is an adaptive, randomized, double-blind, placebo-controlled trial to evaluate the safety and efficacy of remdesivir (200 mg loading dose on day 1, followed by 100 mg daily for up to 9 additional days) in hospitalized adults diagnosed with COVID-19. Subjects will be assessed daily while hospitalized. If the subjects are discharged from the hospital, they will have a study visit at Days 15, 22, and 29.

The primary outcome is time to recovery by Day 29. The primary analysis will include data from both severity groups using a stratified log-rank test. A key secondary outcome evaluates treatment-related improvements in the 8-point ordinal scale at Day 15. As little is known about the clinical course of COVID-19, an evaluation of the pooled (i.e., blinded to treatment assignment) proportion recovered will be used to gauge whether the targeted total number of subjects in the recovered categories of the ordinal scale will be achieved with the planned sample size. The analysis of the pilot data will be blinded, allowing for the pilot data to be included in subsequent analyses.

2. Import packages¶

We’ll first import all the packages that we need for this notebook:

  • numpy is the fundamental package for scientific computing in python.
  • pandas is what we’ll use to manipulate our data.
  • matplotlib is a plotting library.
  • tableone is a package for creating summary statistics for a patient population.
  • lifelines is an open-source survival analysis library.
In [1]:
! pip install gen3 -U
! pip install tableone
! pip install --upgrade scipy
! pip install --upgrade lifelines
#! pip install patsy==0.5.2 --user
import zipfile
import pandas as pd
import numpy as np
import warnings
warnings.simplefilter("ignore")
%config InlineBackend.figure_format = 'svg'
%matplotlib inline
from tableone import TableOne, load_dataset
import matplotlib.pyplot as plt
import statistics
from lifelines import KaplanMeierFitter, CoxPHFitter
from lifelines.statistics import logrank_test
from lifelines.plotting import add_at_risk_counts
from patsy import dmatrices
from scipy import stats
import string
Requirement already satisfied: gen3 in /opt/conda/lib/python3.9/site-packages (4.6.3)
Collecting gen3
  Downloading gen3-4.10.1-py3-none-any.whl (109 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.0/109.0 kB 21.5 MB/s eta 0:00:00
Requirement already satisfied: httpx in /opt/conda/lib/python3.9/site-packages (from gen3) (0.15.5)
Requirement already satisfied: indexclient>=1.6.2 in /opt/conda/lib/python3.9/site-packages (from gen3) (2.1.0)
Requirement already satisfied: aiohttp in /opt/conda/lib/python3.9/site-packages (from gen3) (3.8.1)
Requirement already satisfied: humanfriendly in /opt/conda/lib/python3.9/site-packages (from gen3) (10.0)
Requirement already satisfied: dataclasses-json in /opt/conda/lib/python3.9/site-packages (from gen3) (0.5.6)
Collecting drsclient<0.3.0,>=0.2.1
  Downloading drsclient-0.2.1.tar.gz (7.1 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: python-dateutil in /opt/conda/lib/python3.9/site-packages (from gen3) (2.8.2)
Requirement already satisfied: backoff in /opt/conda/lib/python3.9/site-packages (from gen3) (1.11.1)
Requirement already satisfied: tqdm>=4.61.2 in /opt/conda/lib/python3.9/site-packages (from gen3) (4.64.0)
Requirement already satisfied: jsonschema in /opt/conda/lib/python3.9/site-packages (from gen3) (4.6.0)
Requirement already satisfied: requests in /opt/conda/lib/python3.9/site-packages (from gen3) (2.28.0)
Requirement already satisfied: pypfb<1.0.0 in /opt/conda/lib/python3.9/site-packages (from gen3) (0.5.0)
Requirement already satisfied: click in /opt/conda/lib/python3.9/site-packages (from gen3) (7.1.2)
Requirement already satisfied: pandas<2.0.0,>=1.4.2 in /opt/conda/lib/python3.9/site-packages (from gen3) (1.4.2)
Requirement already satisfied: aiofiles<0.9.0,>=0.8.0 in /opt/conda/lib/python3.9/site-packages (from gen3) (0.8.0)
Requirement already satisfied: asyncio<4.0.0,>=3.4.3 in /opt/conda/lib/python3.9/site-packages (from drsclient<0.3.0,>=0.2.1->gen3) (3.4.3)
Requirement already satisfied: certifi in /opt/conda/lib/python3.9/site-packages (from httpx->gen3) (2022.6.15)
Requirement already satisfied: rfc3986[idna2008]<2,>=1.3 in /opt/conda/lib/python3.9/site-packages (from httpx->gen3) (1.5.0)
Requirement already satisfied: sniffio in /opt/conda/lib/python3.9/site-packages (from httpx->gen3) (1.2.0)
Requirement already satisfied: httpcore==0.11.* in /opt/conda/lib/python3.9/site-packages (from httpx->gen3) (0.11.1)
Requirement already satisfied: h11<0.10,>=0.8 in /opt/conda/lib/python3.9/site-packages (from httpcore==0.11.*->httpx->gen3) (0.9.0)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.9/site-packages (from pandas<2.0.0,>=1.4.2->gen3) (2022.1)
Requirement already satisfied: numpy>=1.18.5 in /opt/conda/lib/python3.9/site-packages (from pandas<2.0.0,>=1.4.2->gen3) (1.22.3)
Requirement already satisfied: gdcdictionary<2.0.0,>=1.2.0 in /opt/conda/lib/python3.9/site-packages (from pypfb<1.0.0->gen3) (1.2.0)
Requirement already satisfied: dictionaryutils<=3.0.2 in /opt/conda/lib/python3.9/site-packages (from pypfb<1.0.0->gen3) (3.0.0)
Requirement already satisfied: fastavro<2.0.0,>=1.0.0 in /opt/conda/lib/python3.9/site-packages (from pypfb<1.0.0->gen3) (1.4.9)
Requirement already satisfied: PyYAML<6.0.0,>=5.3.1 in /opt/conda/lib/python3.9/site-packages (from pypfb<1.0.0->gen3) (5.4.1)
Requirement already satisfied: python-json-logger<0.2.0,>=0.1.11 in /opt/conda/lib/python3.9/site-packages (from pypfb<1.0.0->gen3) (0.1.11)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.9/site-packages (from python-dateutil->gen3) (1.16.0)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.9/site-packages (from requests->gen3) (3.3)
Requirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.9/site-packages (from requests->gen3) (2.0.12)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.9/site-packages (from requests->gen3) (1.26.9)
Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.9/site-packages (from aiohttp->gen3) (21.4.0)
Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.9/site-packages (from aiohttp->gen3) (1.7.2)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/conda/lib/python3.9/site-packages (from aiohttp->gen3) (4.0.2)
Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.9/site-packages (from aiohttp->gen3) (1.2.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.9/site-packages (from aiohttp->gen3) (6.0.2)
Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.9/site-packages (from aiohttp->gen3) (1.3.0)
Requirement already satisfied: marshmallow<4.0.0,>=3.3.0 in /opt/conda/lib/python3.9/site-packages (from dataclasses-json->gen3) (3.14.1)
Requirement already satisfied: marshmallow-enum<2.0.0,>=1.5.1 in /opt/conda/lib/python3.9/site-packages (from dataclasses-json->gen3) (1.5.1)
Requirement already satisfied: typing-inspect>=0.4.0 in /opt/conda/lib/python3.9/site-packages (from dataclasses-json->gen3) (0.7.1)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /opt/conda/lib/python3.9/site-packages (from jsonschema->gen3) (0.18.1)
Requirement already satisfied: cdislogging~=1.0 in /opt/conda/lib/python3.9/site-packages (from dictionaryutils<=3.0.2->pypfb<1.0.0->gen3) (1.1.1)
Requirement already satisfied: mypy-extensions>=0.3.0 in /opt/conda/lib/python3.9/site-packages (from typing-inspect>=0.4.0->dataclasses-json->gen3) (0.4.3)
Requirement already satisfied: typing-extensions>=3.7.4 in /opt/conda/lib/python3.9/site-packages (from typing-inspect>=0.4.0->dataclasses-json->gen3) (4.1.1)
Building wheels for collected packages: drsclient
  Building wheel for drsclient (pyproject.toml) ... done
  Created wheel for drsclient: filename=drsclient-0.2.1-py3-none-any.whl size=7417 sha256=39f08cecc9ea43b7cad4684ca07d36a801b1e8c9bd15384f3ac3c902ee30f05a
  Stored in directory: /home/jovyan/.cache/pip/wheels/fa/14/cf/4242c44ed310a955ad62575b92f4b81c24852e388dca1d6385
Successfully built drsclient
Installing collected packages: drsclient, gen3
  Attempting uninstall: drsclient
    Found existing installation: drsclient 0.1.4
    Uninstalling drsclient-0.1.4:
      Successfully uninstalled drsclient-0.1.4
  Attempting uninstall: gen3
    Found existing installation: gen3 4.6.3
    Uninstalling gen3-4.6.3:
      Successfully uninstalled gen3-4.6.3
Successfully installed drsclient-0.2.1 gen3-4.10.1
Collecting tableone
  Downloading tableone-0.7.10-py2.py3-none-any.whl (32 kB)
Requirement already satisfied: numpy>=1.12.1 in /opt/conda/lib/python3.9/site-packages (from tableone) (1.22.3)
Requirement already satisfied: pandas>=0.22.0 in /opt/conda/lib/python3.9/site-packages (from tableone) (1.4.2)
Collecting tabulate>=0.8.2
  Downloading tabulate-0.8.10-py3-none-any.whl (29 kB)
Requirement already satisfied: statsmodels>=0.8.0 in /opt/conda/lib/python3.9/site-packages (from tableone) (0.13.2)
Requirement already satisfied: scipy>=0.18.1 in /opt/conda/lib/python3.9/site-packages (from tableone) (1.8.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /opt/conda/lib/python3.9/site-packages (from pandas>=0.22.0->tableone) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.9/site-packages (from pandas>=0.22.0->tableone) (2022.1)
Requirement already satisfied: patsy>=0.5.2 in /opt/conda/lib/python3.9/site-packages (from statsmodels>=0.8.0->tableone) (0.5.2)
Requirement already satisfied: packaging>=21.3 in /opt/conda/lib/python3.9/site-packages (from statsmodels>=0.8.0->tableone) (21.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.9/site-packages (from packaging>=21.3->statsmodels>=0.8.0->tableone) (3.0.9)
Requirement already satisfied: six in /opt/conda/lib/python3.9/site-packages (from patsy>=0.5.2->statsmodels>=0.8.0->tableone) (1.16.0)
Installing collected packages: tabulate, tableone
Successfully installed tableone-0.7.10 tabulate-0.8.10
Requirement already satisfied: scipy in /opt/conda/lib/python3.9/site-packages (1.8.1)
Requirement already satisfied: numpy<1.25.0,>=1.17.3 in /opt/conda/lib/python3.9/site-packages (from scipy) (1.22.3)
Collecting lifelines
  Downloading lifelines-0.27.1-py3-none-any.whl (349 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 349.5/349.5 kB 45.2 MB/s eta 0:00:00
Collecting autograd>=1.3
  Downloading autograd-1.4-py3-none-any.whl (48 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.8/48.8 kB 12.7 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.14.0 in /opt/conda/lib/python3.9/site-packages (from lifelines) (1.22.3)
Requirement already satisfied: scipy>=1.2.0 in /opt/conda/lib/python3.9/site-packages (from lifelines) (1.8.1)
Collecting autograd-gamma>=0.3
  Downloading autograd-gamma-0.5.0.tar.gz (4.0 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: matplotlib>=3.0 in /opt/conda/lib/python3.9/site-packages (from lifelines) (3.4.2)
Collecting formulaic>=0.2.2
  Downloading formulaic-0.3.4-py3-none-any.whl (68 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.2/68.2 kB 16.2 MB/s eta 0:00:00
Requirement already satisfied: pandas>=1.0.0 in /opt/conda/lib/python3.9/site-packages (from lifelines) (1.4.2)
Collecting future>=0.15.2
  Downloading future-0.18.2.tar.gz (829 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 829.2/829.2 kB 59.9 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting astor>=0.8
  Downloading astor-0.8.1-py2.py3-none-any.whl (27 kB)
Collecting interface-meta<2.0.0,>=1.2.0
  Downloading interface_meta-1.3.0-py3-none-any.whl (14 kB)
Collecting wrapt>=1.0
  Downloading wrapt-1.14.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (77 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.8/77.8 kB 19.8 MB/s eta 0:00:00
Requirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.9/site-packages (from matplotlib>=3.0->lifelines) (3.0.9)
Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.9/site-packages (from matplotlib>=3.0->lifelines) (9.1.1)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.9/site-packages (from matplotlib>=3.0->lifelines) (0.11.0)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.9/site-packages (from matplotlib>=3.0->lifelines) (2.8.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.9/site-packages (from matplotlib>=3.0->lifelines) (1.4.3)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.9/site-packages (from pandas>=1.0.0->lifelines) (2022.1)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib>=3.0->lifelines) (1.16.0)
Building wheels for collected packages: autograd-gamma, future
  Building wheel for autograd-gamma (setup.py) ... done
  Created wheel for autograd-gamma: filename=autograd_gamma-0.5.0-py3-none-any.whl size=4032 sha256=8c02f878022bb668777651cce1fe034363a6338117634b4756cbc73fcdf41038
  Stored in directory: /home/jovyan/.cache/pip/wheels/a8/03/64/8557323821d25118c3a2dc1646996f7a962a8970d4b7d22473
  Building wheel for future (setup.py) ... done
  Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=491058 sha256=848abe5e6aabe3cb1bf1169f535def46d7a428dc9c6afbe0af737fba217b6fa5
  Stored in directory: /home/jovyan/.cache/pip/wheels/2f/a0/d3/4030d9f80e6b3be787f19fc911b8e7aa462986a40ab1e4bb94
Successfully built autograd-gamma future
Installing collected packages: wrapt, interface-meta, future, astor, autograd, formulaic, autograd-gamma, lifelines
Successfully installed astor-0.8.1 autograd-1.4 autograd-gamma-0.5.0 formulaic-0.3.4 future-0.18.2 interface-meta-1.3.0 lifelines-0.27.1 wrapt-1.14.1

Please make sure to download the credentials.json from https://accessclinicaldata.niaid.nih.gov/identity and upload to working directory of this notebook.

In [2]:
!gen3 --endpoint accessclinicaldata.niaid.nih.gov --auth credentials-4.json drs-pull object dg.NACD/e77ab887-7dfc-4c3b-879f-5e2c2dc0bc5f --no-unpack-packages
In [3]:
path_to_zip_file ="ACTT1_Datasets.zip"

with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
    zip_ref.extractall()

Load the Dataset:

In [4]:
data = pd.read_csv("ACTT1_Datasets/ACTT_1_original/ACTT1.csv")
data.head()
Out[4]:
USUBJID TRTP AGE SEX RACE ETHNIC BMI REGION STRATUM ORDSCRG ... CANCERFL IMMDFL ASTHMAFL COMORB1 COMORB2 ORDSCR15 TTRECOV RECCNSR TTDEATH DTHCNSR
0 COV.00701 Placebo 74 M ASIAN NOT HISPANIC OR LATINO NaN North America Severe Disease Baseline Clinical Status Score 5 ... N N N No Comorbidities No Comorbidities 1 5 0 26 1
1 COV.00702 Remdesivir 80 M WHITE NOT HISPANIC OR LATINO NaN North America Severe Disease Baseline Clinical Status Score 5 ... N N N Any Comorbidities 1 Comorbidity 2 6 0 28 1
2 COV.00703 Placebo 36 F WHITE NOT HISPANIC OR LATINO 40.7 North America Severe Disease Baseline Clinical Status Score 7 ... N N Y Any Comorbidities 2 or more Comorbidities 7 25 1 26 1
3 COV.00704 Placebo 75 F WHITE NOT HISPANIC OR LATINO 27.6 North America Severe Disease Baseline Clinical Status Score 6 ... N N N Any Comorbidities 1 Comorbidity 2 11 0 27 1
4 COV.00705 Placebo 62 F WHITE NOT HISPANIC OR LATINO 32.5 North America Severe Disease Baseline Clinical Status Score 6 ... N N N Any Comorbidities 2 or more Comorbidities 5 26 0 28 1

5 rows × 31 columns

3. Data description¶

Demographic characteristics:¶

  • USUBJID: Subject ID.
  • TRTP: Treatment group.
  • AGE: Age (<40; 40-64; 65 and older).
  • SEX: Sex (Female; Male).
  • RACE: Race (White; Black/African American; Asian; Other).
  • ETHNIC: Ethnic.
  • BMI: BMI.
  • REGION: Geographic region (North American sites; Asian sites; European sites).
  • STRATUM: Disease severity. "Severe disease" was defined as participants meeting one or more of the following criteria: requiring invasive or non-invasive mechanical ventilation, requiring supplemental oxygen, an SpO2 ≤ 94% on room air, or tachypnea (respiratory rate ≥ 24 breaths per minute). "Mild / moderate disease" was defined by a SpO2 > 94% and respiratory rate < 24 breaths per minute without supplemental oxygen requirement.
  • ORDSCRG: Baseline ordinal scale category (4; 5; 6; 7).
  • BDURSYMP: Duration of symptoms prior to enrollment.

Prior and concurrent medical conditions:¶

  • HYPFL: Hypertension.
  • CADFL: Coronary artery disease.
  • CHFFL: Congestive heart failure.
  • CRDFL: Chronic respiratory disease.
  • CORFL: Chronic oxygen requirement.
  • CLDFL: Chronic liver disease
  • CKDFL: Chronic kidney disease.
  • DIAB1FL: Diabetes I.
  • DIAB2FL: Diabetes II.
  • OBESIFL: Obesity.
  • CANCERFL: Cancer.
  • IMMDFL: Immune deficiency.
  • ASTHMAFL: Asthma.
  • COMORB1: Comorbidity presence (None; Any).
  • COMORB2: Comorbidity number (None, One, Two or more).

Baseline characteristics:¶

  • STRATUM: Disease severity. "Severe disease" was defined as participants meeting one or more of the following criteria: requiring invasive or non-invasive mechanical ventilation, requiring supplemental oxygen, an SpO2 ≤ 94% on room air, or tachypnea (respiratory rate ≥ 24 breaths per minute). "Mild / moderate disease" was defined by a SpO2 > 94% and respiratory rate < 24 breaths per minute without supplemental oxygen requirement.
  • ORDSCRG: Baseline ordinal scale category (4; 5; 6; 7).
  • BDURSYMP: Duration of symptoms prior to enrollment.
  • ORDSCR15: The 8-point ordinal clinical status scale at Day 15.

Primary measures:¶

  • TTRECOV: Time to recovery.
  • RECCNSR: Recovery censored.
  • TTDEATH: Time to death.
  • DTHCNSR: Death censored.

Demographic and Baseline Characteristics by Treatment Group¶

Subject screening will begin with a brief discussion with study staff. Some will be excluded based on demographic data and medical history (i.e., pregnant, < 18 years of age, renal failure, etc.). In order to be eligible to participate in this study, a patient must meet all of the required criteria. Please find the additional study procedures details Here.

In the below table, summaries of age, sex, race, ethnicity, comorbidity and baseline clinical status score is presented by treatment groups.

In [5]:
columns = [
    "AGE",
    "SEX",
    "RACE",
    "ETHNIC",
    "COMORB2",
    "DIAB2FL",
    "HYPFL",
    "OBESIFL",
    "ORDSCRG",
]
categorical = [
    "SEX",
    "RACE",
    "ETHNIC",
    "COMORB2",
    "DIAB2FL",
    "HYPFL",
    "OBESIFL",
    "ORDSCRG",
]
groupby = ["TRTP"]
labels = {
    "AGE": "Age",
    "SEX": "Sex",
    "RACE": "Race",
    "ETHNIC": "Ethnic group",
    "COMORB2": "No. of coexisting conditions",
    "DIAB2FL": "Type 2 diabetes",
    "HYPFL": "Hypertension",
    "OBESIFL": "Obesity",
    "ORDSCRG": "Score on ordinal scale",
}
mytable = TableOne(
    data,
    columns=columns,
    categorical=categorical,
    groupby=groupby,
    rename=labels,
    pval=False,
)
mytable
Out[5]:
Grouped by TRTP
Missing Overall Placebo Remdesivir
n 1062 521 541
Age, mean (SD) 0 58.9 (15.0) 59.2 (15.4) 58.6 (14.6)
Sex, n (%) F 0 378 (35.6) 189 (36.3) 189 (34.9)
M 684 (64.4) 332 (63.7) 352 (65.1)
Race, n (%) AMERICAN INDIAN OR ALASKA NATIVE 0 7 (0.7) 3 (0.6) 4 (0.7)
ASIAN 135 (12.7) 56 (10.7) 79 (14.6)
BLACK OR AFRICAN AMERICAN 226 (21.3) 117 (22.5) 109 (20.1)
MULTIPLE 3 (0.3) 1 (0.2) 2 (0.4)
NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER 4 (0.4) 2 (0.4) 2 (0.4)
UNKNOWN 121 (11.4) 55 (10.6) 66 (12.2)
WHITE 566 (53.3) 287 (55.1) 279 (51.6)
Ethnic group, n (%) HISPANIC OR LATINO 0 250 (23.5) 116 (22.3) 134 (24.8)
NOT HISPANIC OR LATINO 755 (71.1) 373 (71.6) 382 (70.6)
NOT REPORTED 29 (2.7) 14 (2.7) 15 (2.8)
UNKNOWN 28 (2.6) 18 (3.5) 10 (1.8)
No. of coexisting conditions, n (%) 1 Comorbidity 0 275 (25.9) 137 (26.3) 138 (25.5)
2 or more Comorbidities 579 (54.5) 283 (54.3) 296 (54.7)
No Comorbidities 194 (18.3) 97 (18.6) 97 (17.9)
Unknown 14 (1.3) 4 (0.8) 10 (1.8)
Type 2 diabetes, n (%) N 11 729 (69.4) 361 (69.6) 368 (69.2)
Y 322 (30.6) 158 (30.4) 164 (30.8)
Hypertension, n (%) N 11 518 (49.3) 255 (49.1) 263 (49.4)
Y 533 (50.7) 264 (50.9) 269 (50.6)
Obesity, n (%) N 13 573 (54.6) 284 (54.8) 289 (54.4)
Y 476 (45.4) 234 (45.2) 242 (45.6)
Score on ordinal scale, n (%) Baseline Clinical Status Score 4 11 138 (13.1) 63 (12.2) 75 (14.1)
Baseline Clinical Status Score 5 435 (41.4) 203 (39.2) 232 (43.5)
Baseline Clinical Status Score 6 193 (18.4) 98 (18.9) 95 (17.8)
Baseline Clinical Status Score 7 285 (27.1) 154 (29.7) 131 (24.6)

Summary of findings:¶

  1. The mean age of the patients was 58.9 years, and 64.4% were male.

  2. On the basis of the evolving epidemiology of Covid-19 during the trial, 79.8% of patients were enrolled at sites in North America, 15.3% in Europe, and 4.9% in Asia.

  3. Overall, 53.3% of the patients were White, 21.3% were Black, 12.7% were Asian, and 12.7% were designated as other or not reported; 250 (23.5%) were Hispanic or Latino.

  4. Most patients had either one (25.9%) or two or more (54.5%) of the prespecified coexisting conditions at enrollment, most commonly hypertension (50.2%), obesity (44.8%), and type 2 diabetes mellitus (30.3%).

  5. A total of 957 patients (90.1%) had severe disease at enrollment; 285 patients (26.8%) met category 7 criteria on the ordinal scale, 193 (18.2%) category 6, 435 (41.0%) category 5, and 138 (13.0%) category 4. Eleven patients (1.0%) had missing ordinal scale data at enrollment.

Kaplan–Meier Estimates of Cumulative Recoveries¶

1. Kaplan–Meier Estimates (and 95% confidence bands) of Cumulative Recoveries in the Overall Population¶

The Kaplan-Meier estimator is used to estimate the survival function. It measures the fraction of subjects who survived for a certain amount of survival time 𝑡 . Here, we apply the standard Kaplan–Meier method for estimating the time to recovery for both Remdesivir group and Placebo group in the overall population.

In [6]:
ix = data["TRTP"] == "Placebo"

ax = plt.subplot(111)

kmf_exp = KaplanMeierFitter()
ax = kmf_exp.fit(
    data.loc[~ix]["TTRECOV"],
    (1 - data.loc[~ix]["RECCNSR"]),
    label="Remdesivir",
    timeline=range(0, 29),
).plot_cumulative_density(ax=ax)

kmf_control = KaplanMeierFitter()
ax = kmf_control.fit(
    data.loc[ix]["TTRECOV"],
    (1 - data.loc[ix]["RECCNSR"]),
    label="Placebo",
    timeline=range(0, 29),
).plot_cumulative_density(ax=ax)
ax.set_title("Overall Kaplan-Meier Estimates of Cumulative Recoveries")
ax.set_xlabel("Days", fontsize=10)
ax.set_ylabel("Proportion Recovered", fontsize=10)
ax.xaxis.set_ticks(np.arange(0, 29, 2))
ax.set_ylim(0, 1)

from lifelines.plotting import add_at_risk_counts

#add_at_risk_counts(kmf_exp, kmf_control, ax=ax, rows_to_show=['At risk'])
add_at_risk_counts(kmf_exp, kmf_control, ax=ax)
Out[6]:
<AxesSubplot:title={'center':'Overall Kaplan-Meier Estimates of Cumulative Recoveries'}, xlabel='Days', ylabel='Proportion Recovered'>
2022-06-27T18:27:30.865247 image/svg+xml Matplotlib v3.4.2, https://matplotlib.org/

2. Kaplan–Meier Estimates (and 95% confidence bands) of Cumulative Recoveries by Baseline Ordinal Scale¶

Panel A shows the estimates (and 95% confidence bands) in the population with baseline ordinal scale = 4; Panel B in those with baseline ordinal scale = 5; Panel C in those with baseline ordinal scale = 6; Panel D in those with baseline ordinal scale = 7.

In [7]:
group1 = data[data["TRTP"] == "Remdesivir"]
group2 = data[data["TRTP"] == "Placebo"]
T = group1["TTRECOV"]
E = group1["RECCNSR"]
T1 = group2["TTRECOV"]
E1 = group2["RECCNSR"]

clinical_status_list = [x for x in data["ORDSCRG"].unique().tolist() if str(x) != "nan"]
clinical_status_list.sort()
fig, axes = plt.subplots(2, 2, figsize=(9, 7), constrained_layout=True)
fig.suptitle(
    "Kaplan–Meier Estimates of Cumulative Recoveries in Patients with Various Baseline Scores",
    fontsize=12,
)
axes = axes.reshape(4,)

for i, clinical_status in enumerate(clinical_status_list):

    ix = data["ORDSCRG"] == clinical_status
    kmf_control = KaplanMeierFitter()
    kmf_exp = KaplanMeierFitter()
    ax = kmf_exp.fit(
        T[ix], (1 - E[ix]), label="Remdesivir", timeline=range(0, 29)
    ).plot_cumulative_density(ax=axes[i])
    ax = kmf_control.fit(
        T1[ix], (1 - E1[ix]), label="Placebo", timeline=range(0, 29)
    ).plot_cumulative_density(ax=axes[i])
    ax.set_title(clinical_status, fontsize=10)
    ax.set_xlabel("Days", fontsize=9)
    ax.set_ylabel("Proportion Recovered", fontsize=9)
    ax.xaxis.set_ticks(np.arange(0, 29, 2))
    ax.set_ylim(0, 1)
    ax.text(
        -0.1,
        1.1,
        string.ascii_uppercase[i],
        transform=ax.transAxes,
        size=10,
        weight="bold",
    )
2022-06-27T18:27:32.086571 image/svg+xml Matplotlib v3.4.2, https://matplotlib.org/

Summary of findings:¶

Patients in the remdesivir group had a shorter time to recovery than patients in the placebo group.

Stratified Cox Proportional Hazards Model to Estimate Time to Recovery According to Subgroup¶

Cox Proportional Hazards Model is a semi-parametric model in the sense that the baseline hazard function does not have to be specified i.e it can vary, allowing a different parameter to be used for each unique survival time. But, it assumes that the rate ratio remains proportional throughout the studied period. This results in increased flexibility of the model. A fully-parametric proportional hazards model also assumes that the baseline hazard function can be parameterized according to a particular model for the distribution of the survival times.

Here we calculate the hazard ratios using the stratified Cox model. Run the following cell to fit the Cox Proportional Hazards model using the lifelines package.

Data Wrangling:

In [8]:
di_race = {
    "WHITE": "White",
    "BLACK OR AFRICAN AMERICAN": "Black",
    "ASIAN": "Asian",
    "UNKNOWN": "Other",
    "AMERICAN INDIAN OR ALASKA NATIVE": "Other",
    "MULTIPLE": "Other",
    "NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER": "Other",
}
di_ethnic = {
    "HISPANIC OR LATINO": "Hispanic or Latino",
    "NOT HISPANIC OR LATINO": "Not Hispanic or Latino",
    "UNKNOWN": "Unknown",
    "NOT REPORTED": "Unknown",
}
di_sex = {"M": "Male", "F": "Female"}

data1 = data.replace({"RACE": di_race, "ETHNIC": di_ethnic, "SEX": di_sex})
age_category = pd.cut(
    data1.AGE, bins=[18, 40, 65, 95], labels=["18 to <40 yr", "40 to <65 yr", "≥65 yr"]
)
data1["AGE"] = age_category

symp_category = pd.cut(
    data1.BDURSYMP, bins=[0, 10, 46], labels=["≤10 days", ">10 days"]
)
symp_category = symp_category.cat.add_categories(["Unknown"])
data1["BDURSYMP"] = symp_category
data1.loc[data1["BDURSYMP"].isnull() == True, "BDURSYMP"] = "Unknown"

model_expr = "TTRECOV ~  STRATUM + TTRECOV + C(REGION) + C(RACE) + C(ETHNIC) + C(AGE) + C(SEX) + C(BDURSYMP) + C(ORDSCRG) + RECCNSR"

y, X = dmatrices(model_expr, data1, return_type="dataframe")
X = X[X.columns.drop(list(X.filter(regex="Unknown")))]
X.head()
Out[8]:
Intercept STRATUM[T.Severe Disease] C(REGION)[T.Europe] C(REGION)[T.North America] C(RACE)[T.Black] C(RACE)[T.Other] C(RACE)[T.White] C(ETHNIC)[T.Not Hispanic or Latino] C(AGE)[T.40 to <65 yr] C(AGE)[T.≥65 yr] C(SEX)[T.Male] C(BDURSYMP)[T.>10 days] C(ORDSCRG)[T.Baseline Clinical Status Score 5] C(ORDSCRG)[T.Baseline Clinical Status Score 6] C(ORDSCRG)[T.Baseline Clinical Status Score 7] TTRECOV RECCNSR
0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 5.0 0.0
1 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 6.0 0.0
2 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 25.0 1.0
3 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 11.0 0.0
4 1.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 26.0 0.0

Create Model and fit the data to train the model:

In [9]:
warnings.simplefilter("ignore")
cph = CoxPHFitter()
cph.fit(
    df=X,
    duration_col="TTRECOV",
    event_col="RECCNSR",
    strata=["STRATUM[T.Severe Disease]"],
)
# Plot the HR
cph.plot(hazard_ratios=True)
Out[9]:
<AxesSubplot:xlabel='HR (95% CI)'>
2022-06-27T18:27:33.158561 image/svg+xml Matplotlib v3.4.2, https://matplotlib.org/
In [10]:
# Have a look at the significance of the features
cph.print_summary()
model lifelines.CoxPHFitter
duration col 'TTRECOV'
event col 'RECCNSR'
strata [STRATUM[T.Severe Disease]]
baseline estimation breslow
number of observations 1051
number of events observed 300
partial log-likelihood -1456.67
time fit was run 2022-06-27 18:27:32 UTC
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95% cmp to z p -log2(p)
C(REGION)[T.Europe] -1.08 0.34 0.45 -1.95 -0.20 0.14 0.82 0.00 -2.40 0.02 5.94
C(REGION)[T.North America] -1.01 0.36 0.41 -1.83 -0.20 0.16 0.82 0.00 -2.44 0.01 6.10
C(RACE)[T.Black] 0.24 1.27 0.26 -0.27 0.76 0.76 2.13 0.00 0.92 0.36 1.48
C(RACE)[T.Other] 0.13 1.14 0.31 -0.48 0.74 0.62 2.10 0.00 0.42 0.67 0.57
C(RACE)[T.White] 0.10 1.10 0.25 -0.39 0.58 0.68 1.79 0.00 0.39 0.70 0.52
C(ETHNIC)[T.Not Hispanic or Latino] -0.10 0.90 0.17 -0.42 0.22 0.65 1.25 0.00 -0.61 0.54 0.88
C(AGE)[T.40 to <65 yr] -0.69 0.50 0.21 -1.09 -0.29 0.34 0.75 0.00 -3.36 <0.005 10.31
C(AGE)[T.≥65 yr] -0.66 0.52 0.20 -1.06 -0.26 0.35 0.77 0.00 -3.22 <0.005 9.61
C(SEX)[T.Male] 0.10 1.10 0.13 -0.16 0.35 0.86 1.42 0.00 0.75 0.45 1.15
C(BDURSYMP)[T.>10 days] 0.06 1.06 0.13 -0.19 0.31 0.83 1.37 0.00 0.49 0.63 0.68
C(ORDSCRG)[T.Baseline Clinical Status Score 5] -0.10 0.91 0.60 -1.27 1.07 0.28 2.92 0.00 -0.17 0.87 0.20
C(ORDSCRG)[T.Baseline Clinical Status Score 6] -0.35 0.71 0.60 -1.52 0.82 0.22 2.28 0.00 -0.58 0.56 0.84
C(ORDSCRG)[T.Baseline Clinical Status Score 7] -0.13 0.88 0.59 -1.29 1.04 0.27 2.82 0.00 -0.21 0.83 0.27

Concordance 0.56
Partial AIC 2939.33
log-likelihood ratio test 19.88 on 13 df
-log2(p) of ll-ratio test 3.35

Summary¶

This notebook replicates the analysis in a recently published randomized comparative trial evaluating remdesivir treatment for COVID-19. In the workspace of NIAID Clinical Trials Data Commons, researchers can apply the commonly used survival analysis techniques, such as the Kaplan–Meier method, Cox model, hazard ratio method, etc. Clinical investigators are encouraged to consider applying these methods for quantifying treatment effects in future studies of COVID-19 using NIAID Clinical Trials Data Commons.