Identification of Potential Drug Targets of Inflammatory Bowel Disease¶

Please note: This notebook uses open access data

Fan Wang¶

July 31 2022¶

Systematic MR of molecular phenotypes such as proteins and expression of transcript levels offer enormous potential to prioritise drug targets for further investigation. However, many genes and gene products are not easily druggable, so some potentially important causal genes may not offer an obvious route to intervention.

A parallel problem is that current GWAS of molecular phenotypes have limited sample sizes and limited protein coverages. A potential way to address both these problems is to use protein-protein interaction (PPI) information to identify druggable targets which are linked to a non-druggable, but robustly causal target. Their relationship to the causal target increases our confidence in their potential causal role even if the initial evidence of effect is below our multiple-testing threshold.

This notebook demonstrates an approach to query data in EpiGraphDB to prioritize potential alternative drug targets in the same PPI network for Inflammatory Bowel Disease (IBD), as follows:

  • For an existing drug target of interests, we use PPI networks to search for its directly interacting genes that are evidenced to be druggable.
  • We then examine the causal evidence of these candidate genes on the disease.
  • We also examine the literature evidence of these candidate genes on the disease.
  • Then we query the metadata including meta nodes and meta edges and the overall schema.
In [1]:
import json
from pprint import pprint
from pprint import pformat
import matplotlib
import matplotlib.pyplot as plt

get_ipython().run_line_magic("config", "InlineBackend.figure_format = 'svg'")
import networkx as nx
import requests
import pandas as pd

First, we will ping the API to check our connection. Here we use the .get() method to send a GET request to the /ping endpoint of the API.

In [2]:
API_URL = "https://api.epigraphdb.org"
endpoint = "/ping"
response_object = requests.get(API_URL + endpoint)
GENE_NAME = "IL23R"
OUTCOME_TRAIT = "Inflammatory bowel disease"

# Check that the ping was sucessful
try:
    response_object.raise_for_status()
    print("If this line gets printed, ping was sucessful.")
except requests.exceptions.HTTPError as err:
    print(err)
If this line gets printed, ping was sucessful.

1. Using PPI networks for alternative drug targets search¶

The assumption here is that the most likely alternative targets are either directly interacting with IL23R or somewhere in the PPI network. In this example, we consider only genes that were found to interact with IL23R via direct protein-protein interactions, and require that those interacting proteins should also be druggable.

The thousands of genes are classified with regard to their druggability by Finan et al. 2017, where the Tier 1 category refers to approved drugs or those in clinical testing while for other tier categories the druggability confidence drops in order Tier 2 and then Tier 3.

In [3]:
def get_drug_targets_ppi(gene_name):
    endpoint = "/gene/druggability/ppi"
    url = f"{API_URL}{endpoint}"
    params = {"gene_name": gene_name}
    r = requests.get(url, params=params)
    r.raise_for_status()
    df = pd.json_normalize(r.json()["results"])
    return df


GENE_NAME = "IL23R"
OUTCOME_TRAIT = "Inflammatory bowel disease"
ppi_df = get_drug_targets_ppi(gene_name=GENE_NAME)
ppi_df
Out[3]:
g1.name p1.uniprot_id p2.uniprot_id g2.name g2.druggability_tier
0 IL23R Q5VWK5 P04141 CSF2 Tier 1
1 IL23R Q5VWK5 P01562 IFNA1 Tier 1
2 IL23R Q5VWK5 P01579 IFNG Tier 1
3 IL23R Q5VWK5 P22301 IL10 Tier 1
4 IL23R Q5VWK5 P29460 IL12B Tier 1
5 IL23R Q5VWK5 P42701 IL12RB1 Tier 1
6 IL23R Q5VWK5 P35225 IL13 Tier 1
7 IL23R Q5VWK5 P40933 IL15 Tier 1
8 IL23R Q5VWK5 Q16552 IL17A Tier 1
9 IL23R Q5VWK5 Q96PD4 IL17F Tier 1
10 IL23R Q5VWK5 P60568 IL2 Tier 1
11 IL23R Q5VWK5 Q9GZX6 IL22 Tier 1
12 IL23R Q5VWK5 Q9NPF7 IL23A Tier 1
13 IL23R Q5VWK5 P05112 IL4 Tier 1
14 IL23R Q5VWK5 P05113 IL5 Tier 1
15 IL23R Q5VWK5 P05231 IL6 Tier 1
16 IL23R Q5VWK5 P15248 IL9 Tier 1
17 IL23R Q5VWK5 P23458 JAK1 Tier 1
18 IL23R Q5VWK5 O60674 JAK2 Tier 1
19 IL23R Q5VWK5 P19838 NFKB1 Tier 1
20 IL23R Q5VWK5 P42336 PIK3CA Tier 1
21 IL23R Q5VWK5 P51449 RORC Tier 1
22 IL23R Q5VWK5 P40763 STAT3 Tier 1
23 IL23R Q5VWK5 Q969D9 TSLP Tier 1
24 IL23R Q5VWK5 P29597 TYK2 Tier 1
25 IL23R Q5VWK5 P51684 CCR6 Tier 2
26 IL23R Q5VWK5 P25963 NFKBIA Tier 2
27 IL23R Q5VWK5 Q9HC29 NOD2 Tier 2
28 IL23R Q5VWK5 P27986 PIK3R1 Tier 2
29 IL23R Q5VWK5 Q04206 RELA Tier 2
30 IL23R Q5VWK5 P42224 STAT1 Tier 2
31 IL23R Q5VWK5 P42229 STAT5A Tier 2
32 IL23R Q5VWK5 P42226 STAT6 Tier 2
33 IL23R Q5VWK5 P09919 CSF3 Tier 3A
34 IL23R Q5VWK5 Q9NZ08 ERAP1 Tier 3A
35 IL23R Q5VWK5 P29459 IL12A Tier 3A
36 IL23R Q5VWK5 Q8TAD2 IL17D Tier 3A
37 IL23R Q5VWK5 Q9UHD0 IL19 Tier 3A
38 IL23R Q5VWK5 Q9HBE4 IL21 Tier 3A
39 IL23R Q5VWK5 Q13007 IL24 Tier 3A
40 IL23R Q5VWK5 P13232 IL7 Tier 3A
41 IL23R Q5VWK5 O00421 CCRL2 Tier 3B

For further analysis we select the gene of interest (IL23R) as well as its interacting genes with Tier 1 druggability.

In [4]:
def get_gene_list(ppi_df, include_primary_gene: bool = True):
    if include_primary_gene:
        gene_list = list(ppi_df["g1.name"].drop_duplicates()) + list(
            ppi_df.query("`g2.druggability_tier` == 'Tier 1'")["g2.name"]
        )
    else:
        gene_list = list(ppi_df.query("`g2.druggability_tier` == 'Tier 1'")["g2.name"])
    return gene_list


gene_list = get_gene_list(ppi_df)
gene_list
Out[4]:
['IL23R',
 'CSF2',
 'IFNA1',
 'IFNG',
 'IL10',
 'IL12B',
 'IL12RB1',
 'IL13',
 'IL15',
 'IL17A',
 'IL17F',
 'IL2',
 'IL22',
 'IL23A',
 'IL4',
 'IL5',
 'IL6',
 'IL9',
 'JAK1',
 'JAK2',
 'NFKB1',
 'PIK3CA',
 'RORC',
 'STAT3',
 'TSLP',
 'TYK2']

2. Looking for literature evidence¶

EpiGraphDB facilitates fast processing of this information by allowing access to a host of literature-mined relationships that have been structured into semantic triples. These take the general form (subject, predicate, object) and have been generated using contemporary natural language processing techniques applied to a massive amount of published biomedical research papers by SemMedDB. In the following section we will query the API for the literature relationship between a given gene, IL23R (several studies confirmed IL23R associations in independent cohorts of patients with Crohn's disease or ulcerative colitis) and an outcome trait, Inflammatory bowel disease.

In [5]:
def extract_literature(outcome_trait, gene_list):
    def per_gene(gene_name):
        endpoint = "/gene/literature"
        url = f"{API_URL}{endpoint}"
        params = {"gene_name": gene_name, "object_name": outcome_trait.lower()}
        r = requests.get(url, params=params)
        try:
            r.raise_for_status()
            res_df = pd.json_normalize(r.json()["results"])
            if len(res_df) > 0:
                res_df = res_df.assign(
                    literature_count=lambda df: df["pubmed_id"].apply(lambda x: len(x))
                )
            return res_df
        except:
            return None

    res_df = pd.concat(
        [per_gene(gene_name=gene_name) for gene_name in gene_list]
    ).reset_index(drop=True)
    return res_df


literature_df = extract_literature(outcome_trait=OUTCOME_TRAIT, gene_list=gene_list)
literature_df
Out[5]:
pubmed_id gene.name lt.id lt.name lt.type st.predicate literature_count
0 [23131344] IL23R C0021390 Inflammatory Bowel Diseases [dsyn] PREDISPOSES 1
1 [21155887, 17484863] IL23R C0021390 Inflammatory Bowel Diseases [dsyn] NEG_ASSOCIATED_WITH 2
2 [31728561] IL23R C0021390 Inflammatory Bowel Diseases [dsyn] CAUSES 1
3 [21155887, 18383521, 18383363, 25159710, 18341... IL23R C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 21
4 [27852544] IL23R C0021390 Inflammatory Bowel Diseases [dsyn] AFFECTS 1
5 [21557945, 19030026] CSF2 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 2
6 [17206685] CSF2 C0021390 Inflammatory Bowel Diseases [dsyn] AFFECTS 1
7 [23891915] IFNA1 C0021390 Inflammatory Bowel Diseases [dsyn] TREATS 1
8 [24975266] IFNA1 C0021390 Inflammatory Bowel Diseases [dsyn] PREVENTS 1
9 [20951137, 28174758, 9836081] IFNA1 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 3
10 [19519446] IFNG C0021390 Inflammatory Bowel Diseases [dsyn] TREATS 1
11 [3139380] IFNG C0021390 Inflammatory Bowel Diseases [dsyn] CAUSES 1
12 [19740775, 18452147] IFNG C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 2
13 [10403730] IFNG C0021390 Inflammatory Bowel Diseases [dsyn] AFFECTS 1
14 [16573780, 27917223, 19184348, 28551707, 25999... IL10 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 13
15 [27468578, 25296012] IL10 C0021390 Inflammatory Bowel Diseases [dsyn] AFFECTS 2
16 [27468578] IL10 C0021390 Inflammatory Bowel Diseases [dsyn] PREDISPOSES 1
17 [11271474] IL10 C0021390 Inflammatory Bowel Diseases [dsyn] NEG_PREDISPOSES 1
18 [24519095, 29023267, 17628614] IL10 C0021390 Inflammatory Bowel Diseases [dsyn] CAUSES 3
19 [18383521, 23573954, 30541240, 22479607, 19817... IL12B C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 6
20 [11023669, 22741617] IL13 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 2
21 [9609761, 11023669] IL15 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 2
22 [30193869, 21576383] IL17A C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 2
23 [30193869, 21994045, 18088064, 21576383] IL17F C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 4
24 [6607860, 6237813, 1587419] IL2 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 3
25 [19201773] IL22 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 1
26 [30193869, 27029486, 18753178, 18499066] IL23A C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 4
27 [10477546] IL4 C0021390 Inflammatory Bowel Diseases [dsyn] DISRUPTS 1
28 [7806044, 8964392, 9389741] IL4 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 3
29 [15766556] IL6 C0021390 Inflammatory Bowel Diseases [dsyn] TREATS 1
30 [11204808] IL6 C0021390 Inflammatory Bowel Diseases [dsyn] NEG_ASSOCIATED_WITH 1
31 [25145003] IL6 C0021390 Inflammatory Bowel Diseases [dsyn] CAUSES 1
32 [11204808, 7683293] IL6 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 2
33 [24120915] IL6 C0021390 Inflammatory Bowel Diseases [dsyn] AFFECTS 1
34 [29788053] IL9 C0021390 Inflammatory Bowel Diseases [dsyn] CAUSES 1
35 [28652656, 11515847] IL9 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 2
36 [31158699] IL9 C0021390 Inflammatory Bowel Diseases [dsyn] AFFECTS 1
37 [31069840, 19817673, 20627814, 22269120] JAK2 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 4
38 [27852544] JAK2 C0021390 Inflammatory Bowel Diseases [dsyn] AFFECTS 1
39 [17600378, 9882195] NFKB1 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 2
40 [21637825, 20004201] PIK3CA C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 2
41 [30006408] RORC C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 1
42 [28770550] STAT3 C0021390 Inflammatory Bowel Diseases [dsyn] CAUSES 1
43 [21733838] STAT3 C0021390 Inflammatory Bowel Diseases [dsyn] AUGMENTS 1
44 [21631466, 25132422, 28785144, 19817673, 20627... STAT3 C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 9
45 [27852544, 21994179] STAT3 C0021390 Inflammatory Bowel Diseases [dsyn] AFFECTS 2
46 [27697608] TSLP C0021390 Inflammatory Bowel Diseases [dsyn] CAUSES 1
47 [21318591, 27697608] TSLP C0021390 Inflammatory Bowel Diseases [dsyn] ASSOCIATED_WITH 2
48 [26432894, 26432894] TYK2 C0021390 Inflammatory Bowel Diseases [dsyn] AFFECTS 2

3. Using Mendelian randomization results for causal effect estimation¶

The next step is to find out whether any of these genes have a comparable and statistically plausable effect on IBD.

In [6]:
def extract_mr(outcome_trait, gene_list, qtl_type):
    endpoint = "/xqtl/single-snp-mr"
    url = f"{API_URL}{endpoint}"

    def per_gene(gene_name):
        params = {
            "exposure_gene": gene_name,
            "outcome_trait": outcome_trait,
            "qtl_type": qtl_type,
            "pval_threshold": 1e-5,
        }
        r = requests.get(url, params=params)
        try:
            r.raise_for_status()
            df = pd.json_normalize(r.json()["results"])
            return df
        except:
            return None

    res_df = pd.concat(
        [per_gene(gene_name=gene_name) for gene_name in gene_list]
    ).reset_index(drop=True)
    return res_df


# Search for both pqtl and eqtl
xqtl_df = pd.concat(
    [
        extract_mr(
            outcome_trait=OUTCOME_TRAIT, gene_list=gene_list, qtl_type=qtl_type
        ).assign(qtl_type=qtl_type)
        for qtl_type in ["pQTL", "eQTL"]
    ]
).reset_index(drop=True)
xqtl_df
Out[6]:
gene.ensembl_id gene.name gwas.id gwas.trait r.beta r.se r.p r.rsid qtl_type
0 ENSG00000162594 IL23R ieu-a-294 Inflammatory bowel disease 1.500821 0.054592 2.212578e-166 rs11581607 pQTL
1 ENSG00000113302 IL12B ieu-a-294 Inflammatory bowel disease 0.417605 0.034490 9.590000e-34 rs4921484 pQTL
2 ENSG00000162594 IL23R ieu-a-294 Inflammatory bowel disease 0.886712 0.064420 4.165652e-43 rs2064689 eQTL
3 ENSG00000164136 IL15 ieu-a-294 Inflammatory bowel disease -1.421625 0.197131 5.530616e-13 rs75301646 eQTL
4 ENSG00000113520 IL4 ieu-a-294 Inflammatory bowel disease 0.459848 0.084050 4.471537e-08 rs2070874 eQTL
5 ENSG00000096968 JAK2 ieu-a-294 Inflammatory bowel disease -1.896710 0.203808 1.322967e-20 rs4788084 eQTL
6 ENSG00000109320 NFKB1 ieu-a-294 Inflammatory bowel disease 0.973556 0.173893 2.160849e-08 rs4766578 eQTL
7 ENSG00000143365 RORC ieu-a-294 Inflammatory bowel disease -0.994991 0.116343 1.207271e-17 rs4845604 eQTL
8 ENSG00000168610 STAT3 ieu-a-294 Inflammatory bowel disease 0.597473 0.075700 2.958269e-15 rs1053004 eQTL

4. Query metadata¶

Here we query for the metadata information using the endpoint GET /meta/schema, which will be used for downstream processing.

In [7]:
endpoint = "/meta/schema"
params = {"graphviz": False, "plot": False}
r = requests.get(f"{API_URL}{endpoint}", params=params)
r.raise_for_status()
metadata = r.json()

# Preview of metadata information
keys = metadata.keys()
print(pformat(keys), "\n")
for key in list(keys):
    print(f"# {key}:")
    print(pformat(metadata[key])[:1000], "\n")
dict_keys(['nodes', 'edges', 'connections']) 

# nodes:
{'Disease': {'count': 38960,
             'properties': {'_id': {'indexed': True,
                                    'type': 'STRING',
                                    'unique': False},
                            '_name': {'indexed': True,
                                      'type': 'STRING',
                                      'unique': False},
                            '_source': {'indexed': False,
                                        'type': 'LIST',
                                        'unique': False},
                            'definition': {'indexed': False,
                                           'type': 'STRING',
                                           'unique': False},
                            'doid': {'indexed': True,
                                     'type': 'LIST',
                                     'unique': False},
                            'efo': {'indexed': False,
                                    'type': 'LIST',
                     

# edges:
{'BIORXIV_OBJ': {'count': 32651,
                 'properties': {'_source': {'array': True, 'type': 'LIST'}}},
 'BIORXIV_PREDICATE': {'count': 32648,
                       'properties': {'_source': {'array': True,
                                                  'type': 'LIST'},
                                      'count': {'array': False,
                                                'type': 'INTEGER'},
                                      'predicate': {'array': False,
                                                    'type': 'STRING'}}},
 'BIORXIV_SUB': {'count': 32657,
                 'properties': {'_source': {'array': True, 'type': 'LIST'}}},
 'BIORXIV_TO_LIT': {'count': 35211,
                    'properties': {'_source': {'array': True, 'type': 'LIST'}}},
 'CPIC': {'count': 375,
          'properties': {'_source': {'array': True, 'type': 'LIST'},
                         'cpic_level': {'array': False, 'type': 'STRING'},
                         'guideline': {'array': F 

# connections:
[{'count': 2461,
  'from_node': 'Drug',
  'rel': 'OPENTARGETS_DRUG_TO_DISEASE',
  'to_node': 'Disease'},
 {'count': 5763,
  'from_node': 'Gene',
  'rel': 'GENE_TO_DISEASE',
  'to_node': 'Disease'},
 {'count': 8247,
  'from_node': 'Disease',
  'rel': 'MONDO_MAP_UMLS',
  'to_node': 'LiteratureTerm'},
 {'count': 2819,
  'from_node': 'Disease',
  'rel': 'MONDO_MAP_EFO',
  'to_node': 'Efo'},
 {'count': 2463,
  'from_node': 'Pathway',
  'rel': 'PATHWAY_CHILD_OF',
  'to_node': 'Pathway'},
 {'count': 121873,
  'from_node': 'Protein',
  'rel': 'PROTEIN_IN_PATHWAY',
  'to_node': 'Pathway'},
 {'count': 1969,
  'from_node': 'LiteratureTriple',
  'rel': 'MEDRXIV_SUB',
  'to_node': 'LiteratureTerm'},
 {'count': 5584547,
  'from_node': 'LiteratureTerm',
  'rel': 'SEMMEDDB_PREDICATE',
  'to_node': 'LiteratureTerm'},
 {'count': 5584547,
  'from_node': 'LiteratureTriple',
  'rel': 'SEMMEDDB_SUB',
  'to_node': 'LiteratureTerm'},
 {'count': 5556,
  'from_node': 'Gwas',
  'rel': 'METAMAP_LITE',
  'to_node' 

We can extract the specific meta node information as a pandas dataframe from the metadata.

In [8]:
meta_node_df = pd.DataFrame.from_dict(metadata["nodes"], orient="index")

(
    meta_node_df.sort_index().assign(
        count=lambda df: df["count"].apply(lambda x: f"{x:,}")
    )
)
Out[8]:
count properties
Disease 38,960 {'_name': {'type': 'STRING', 'indexed': True, ...
Drug 2,697 {'molecule_type': {'type': 'STRING', 'indexed'...
Efo 25,390 {'_name': {'type': 'STRING', 'indexed': True, ...
Gene 57,737 {'druggability_tier': {'type': 'STRING', 'inde...
Gwas 34,494 {'note': {'type': 'STRING', 'indexed': False, ...
Literature 3,995,672 {'issn': {'type': 'STRING', 'indexed': False, ...
LiteratureTerm 108,905 {'_name': {'type': 'STRING', 'indexed': True, ...
LiteratureTriple 5,609,945 {'subject_id': {'type': 'STRING', 'indexed': T...
Pathway 2,441 {'_name': {'type': 'STRING', 'indexed': True, ...
Protein 20,280 {'name': {'type': 'STRING', 'indexed': True, '...
Tissue 54 {'name': {'type': 'STRING', 'indexed': True, '...
Variant 99,005 {'ref': {'type': 'STRING', 'indexed': False, '...

We can also extract the meta relationship (edge) information, and the connections.

In [9]:
meta_rel_df = pd.DataFrame.from_dict(metadata["edges"], orient="index").merge(
    pd.DataFrame.from_dict(
        {_["rel"]: _ for _ in metadata["connections"]}, orient="index"
    )[["from_node", "to_node"]],
    left_index=True,
    right_index=True,
)

(
    meta_rel_df.sort_values(by=["from_node", "to_node"]).assign(
        count=lambda df: df["count"].apply(lambda x: f"{x:,}")
    )
)
Out[9]:
count properties from_node to_node
MONDO_MAP_EFO 2,819 {'_source': {'array': False, 'type': 'STRING'}} Disease Efo
MONDO_MAP_UMLS 8,247 {'_source': {'array': False, 'type': 'STRING'}} Disease LiteratureTerm
OPENTARGETS_DRUG_TO_DISEASE 2,461 {'_source': {'array': True, 'type': 'LIST'}} Drug Disease
CPIC 375 {'pharmgkb_level_of_evidence': {'array': False... Drug Gene
OPENTARGETS_DRUG_TO_TARGET 6,534 {'phase': {'array': False, 'type': 'STRING'}, ... Drug Gene
EFO_CHILD_OF 43,132 {'_source': {'array': True, 'type': 'LIST'}} Efo Efo
GENE_TO_DISEASE 5,763 {'last_updated': {'array': False, 'type': 'STR... Gene Disease
XQTL_MULTI_SNP_MR 3,015,233 {'p': {'array': False, 'type': 'FLOAT'}, 'se':... Gene Gwas
XQTL_SINGLE_SNP_MR_GENE_GWAS 8,449,779 {'p': {'array': False, 'type': 'FLOAT'}, 'se':... Gene Gwas
GENE_TO_PROTEIN 19,142 {'_source': {'array': True, 'type': 'LIST'}} Gene Protein
EXPRESSED_IN 2,918,240 {'tpm': {'array': False, 'type': 'FLOAT'}, '_s... Gene Tissue
GWAS_NLP_EFO 12,302 {'score': {'array': False, 'type': 'FLOAT'}, '... Gwas Efo
GWAS_EFO_EBI 281 {'_source': {'array': True, 'type': 'LIST'}} Gwas Efo
PRS 118,124 {'p': {'array': False, 'type': 'FLOAT'}, 'r2':... Gwas Gwas
MR_EVE_MR 25,804,945 {'b': {'array': False, 'type': 'FLOAT'}, 'se':... Gwas Gwas
GEN_COR 840,960 {'h2_intercept_SE': {'array': False, 'type': '... Gwas Gwas
OBS_COR 17,932 {'_source': {'array': True, 'type': 'LIST'}, '... Gwas Gwas
GWAS_NLP 89,239,773 {'score': {'array': False, 'type': 'FLOAT'}, '... Gwas Gwas
GWAS_TO_LITERATURE 28,111,669 {'_source': {'array': True, 'type': 'LIST'}} Gwas Literature
METAMAP_LITE 5,556 {'_source': {'array': True, 'type': 'LIST'}, '... Gwas LiteratureTerm
GWAS_TO_LITERATURE_TRIPLE 17,531,153 {'pval': {'array': False, 'type': 'FLOAT'}, 'g... Gwas LiteratureTriple
OPENGWAS_TOPHITS 160,283 {'_source': {'array': True, 'type': 'LIST'}, '... Gwas Variant
GWAS_TO_VARIANT 26,436 {'se': {'array': False, 'type': 'FLOAT'}, 'nca... Gwas Variant
TERM_TO_GENE 16,435 {'_source': {'array': False, 'type': 'STRING'}} LiteratureTerm Gene
SEMMEDDB_PREDICATE 5,584,547 {'count': {'array': False, 'type': 'INTEGER'},... LiteratureTerm LiteratureTerm
BIORXIV_PREDICATE 32,648 {'count': {'array': False, 'type': 'INTEGER'},... LiteratureTerm LiteratureTerm
MEDRXIV_PREDICATE 1,969 {'count': {'array': False, 'type': 'INTEGER'},... LiteratureTerm LiteratureTerm
BIORXIV_TO_LIT 35,211 {'_source': {'array': True, 'type': 'LIST'}} LiteratureTriple Literature
SEMMEDDB_TO_LIT 10,589,785 {'_source': {'array': True, 'type': 'LIST'}} LiteratureTriple Literature
MEDRXIV_TO_LIT 2,091 {'_source': {'array': True, 'type': 'LIST'}} LiteratureTriple Literature
MEDRXIV_SUB 1,969 {'_source': {'array': True, 'type': 'LIST'}} LiteratureTriple LiteratureTerm
SEMMEDDB_SUB 5,584,547 {'_source': {'array': True, 'type': 'LIST'}} LiteratureTriple LiteratureTerm
MEDRXIV_OBJ 1,969 {'_source': {'array': True, 'type': 'LIST'}} LiteratureTriple LiteratureTerm
BIORXIV_OBJ 32,651 {'_source': {'array': True, 'type': 'LIST'}} LiteratureTriple LiteratureTerm
BIORXIV_SUB 32,657 {'_source': {'array': True, 'type': 'LIST'}} LiteratureTriple LiteratureTerm
SEMMEDDB_OBJ 5,584,547 {'_source': {'array': True, 'type': 'LIST'}} LiteratureTriple LiteratureTerm
PATHWAY_CHILD_OF 2,463 {'_source': {'array': True, 'type': 'LIST'}} Pathway Pathway
PROTEIN_IN_PATHWAY 121,873 {'_source': {'array': True, 'type': 'LIST'}} Protein Pathway
STRING_INTERACT_WITH 827,184 {'score': {'array': False, 'type': 'FLOAT'}, '... Protein Protein
VARIANT_TO_GENE 108,561 {'amino_acids': {'array': False, 'type': 'STRI... Variant Gene
XQTL_SINGLE_SNP_MR_SNP_GENE 41,564 {'_source': {'array': True, 'type': 'LIST'}} Variant Gene

We can generate a network diagram of the graph db schema using networkx.

In [10]:
graph = nx.from_pandas_edgelist(meta_rel_df, source="from_node", target="to_node")
cmap = matplotlib.colors.ListedColormap(["dodgerblue", "lightgray", "darkorange"])
meta_rel_df["from_node"] = pd.Categorical(meta_rel_df["from_node"])
f = plt.figure(figsize=(10, 10))
f.tight_layout()
plt.subplot(1, 1, 1)
nx.draw(
    G=graph,
    with_labels=True,
    node_size=3000,
    edgecolors="gray",
    node_color="skyblue",
    font_size=10,
    font_weight="bold",
    width=0.75,
)
No description has been provided for this image

Reference¶

  • Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, Abraham C, Regueiro M, Griffiths A, others. 2006. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 314:1461–1463.

  • Finan C, Gaulton A, Kruger FA, Lumbers RT, Shah T, Engmann J, Galver L, Kelley R, Karlsson A, Santos R, others. 2017. The druggable genome and support for target identification and validation in drug development. Science translational medicine 9:eaag1166.

  • Momozawa Y, Mni M, Nakamura K, Coppieters W, Almer S, Amininejad L, Cleynen I, Colombel J-F, De Rijk P, Dewit O, others. 2011. Resequencing of positional candidates identifies low frequency IL23R coding variants protecting against inflammatory bowel disease. Nature genetics 43:43–47.

  • Zheng J, Brumpton BM, Bronson PG, Liu Y, Haycock P, Elsworth B, Haberland V, Baird D, Walker V, Robinson JW, John S, Prins B, Runz H, Nelson MR, Hurle M, Hemani G, Asvold BO, Butterworth A, Smith GD, Scott RA, Gaunt TR. 2019. Systematic Mendelian randomization and colocalization analyses of the plasma proteome and blood transcriptome to prioritize drug targets for complex disease.

In [ ]: