Data Exploration of Framingham Heart Study Teaching Dataset

Please note: This notebook uses open access data

This teaching dataset was developed using the longitudinal Framingham heart study as the data source. The teaching dataset includes three clinical examination and 20 year follow-up data based on a subset of the original Framingham cohort participants. This dataset was created for teaching and training purposes, and certain measures were used to created anonymous versions. Detailed documentation on variables can be found HERE

In this tutorial, we will demonstrate how to pull the object file of Framingham teaching dataset from BioData Catalyst data commons into a BRH workspace, and perform data exploration and visualization using Python packages.

Import Python libraries

Pull Framingham data file

Data exploration

Demographic characteristics of FHS participants:

Risk factor exploration

The next block shows the distribution of several risk factor variables, including BMI and AGE, at three visits.

The next block shows the distribution of several variables, including Systolic Blood Pressure (SYSBP), Diastolic Blood Pressure (DIABP), Serum Total Cholesterol (TOTCHOL), and Casual Serum Glucose (GLUCOSE), at three visits.

The next block categorizes some of the risk factor values into binary groups using a threshold. For instance, Age over 60 is considered as a risk factor.

The next block combines the risk factor dataframe with disease event and generates a correlation heatmap with these variables

The next two blocks generate a counts table between risk factor sum variable and event sum variable

The next block creates a histogram showing the composition of disease events in each risk factor group.