Data Exploration of Framingham Heart Study Teaching Dataset In R

Please note: This notebook uses open access data

The following notebook was created by converting the python code in Data Exploration of Framingham Heart Study Teaching Dataset into R. If you wish to perform any of the data visualization or manipulation done in this notebook in python code please cite the original notebook.

Original Author: Qiong Liu

Notebook Author: Owen Dominguez

The dataset used in the following notebook was developed using the longitudinal Framingham heart study teaching dataset as the data source. The teaching dataset includes three clinical examination and 20 year follow-up data based on a subset of the original Framingham cohort participants. This dataset was created for teaching and training purposes, and certain measures were used to created anonymous versions and thus the data is unsuitable for publication. Detailed documentation on the variables can be found here.

In this notebook, we will demonstrate how to pull the object file of the Framingham teaching dataset from BioData Catalyst data commons into a BRH workspace, and perform data exploration and visualization equivocal to the original python code using R packages.

Install and set the required R Libraries

Pull the Framingham data file

Basic data manipulation

At the moment the Framingham data is in a state where graphing any value in relation to the patient's demographic information — sex, age, education, bmi, etc — will result in a graph where there are too many bins to sort participants into. Due to this, we will be unable to make any meaningful conclusions with the data in this form. Thus data manipulation is required.

The manipulation we will do is the following:

Visualizing the manipulated data

Demographic information of FMS participants at first visit

Risk Factor Exploration

We will now explore the risk factors of the participants at different visits

Age and BMI risk factors at different visits

For referense:

DIABP, GLUCOSE, SYSBP, and TOTCHOL risk factors at different visits

Catagorizing Risk Factors

For the next analysis we need to catagorize whether a patient falls into a risk factor by converting its numeric value into a binary value by using a threshold. For instance we consider serum cholesterol > 200 to be a risk factor and thus would be a binary value of 1.

The next few blocks of code combine the risk factor binary values with the event data binary event data and converts the data into a format that we can graph using a heat map.

Correlation between risk and event factors

The next few blocks of code will manipulate the data into a form in which we can visualize the composition of disease events and the risk factor sum.

In the table above, event sum is represented in colums and the rf sum is represented in rows.

Risk Factors and Events

From this historgram we can see that: