Identification of Potential Drug Targets of Inflammatory Bowel Disease¶
Please note: This notebook uses open access data
Fan Wang¶
July 31 2022¶
Systematic MR of molecular phenotypes such as proteins and expression of transcript levels offer enormous potential to prioritise drug targets for further investigation. However, many genes and gene products are not easily druggable, so some potentially important causal genes may not offer an obvious route to intervention.
A parallel problem is that current GWAS of molecular phenotypes have limited sample sizes and limited protein coverages. A potential way to address both these problems is to use protein-protein interaction (PPI) information to identify druggable targets which are linked to a non-druggable, but robustly causal target. Their relationship to the causal target increases our confidence in their potential causal role even if the initial evidence of effect is below our multiple-testing threshold.
This notebook demonstrates an approach to query data in EpiGraphDB to prioritize potential alternative drug targets in the same PPI network for Inflammatory Bowel Disease (IBD), as follows:
- For an existing drug target of interests, we use PPI networks to search for its directly interacting genes that are evidenced to be druggable.
- We then examine the causal evidence of these candidate genes on the disease.
- We also examine the literature evidence of these candidate genes on the disease.
- Then we query the metadata including meta nodes and meta edges and the overall schema.
import json
from pprint import pprint
from pprint import pformat
import matplotlib
import matplotlib.pyplot as plt
get_ipython().run_line_magic("config", "InlineBackend.figure_format = 'svg'")
import networkx as nx
import requests
import pandas as pd
First, we will ping the API to check our connection. Here we use the .get() method to send a GET request to the /ping
endpoint of the API.
API_URL = "https://api.epigraphdb.org"
endpoint = "/ping"
response_object = requests.get(API_URL + endpoint)
GENE_NAME = "IL23R"
OUTCOME_TRAIT = "Inflammatory bowel disease"
# Check that the ping was sucessful
try:
response_object.raise_for_status()
print("If this line gets printed, ping was sucessful.")
except requests.exceptions.HTTPError as err:
print(err)
If this line gets printed, ping was sucessful.
1. Using PPI networks for alternative drug targets search¶
The assumption here is that the most likely alternative targets are either directly interacting with IL23R or somewhere in the PPI network. In this example, we consider only genes that were found to interact with IL23R via direct protein-protein interactions, and require that those interacting proteins should also be druggable.
The thousands of genes are classified with regard to their druggability by Finan et al. 2017, where the Tier 1 category refers to approved drugs or those in clinical testing while for other tier categories the druggability confidence drops in order Tier 2 and then Tier 3.
def get_drug_targets_ppi(gene_name):
endpoint = "/gene/druggability/ppi"
url = f"{API_URL}{endpoint}"
params = {"gene_name": gene_name}
r = requests.get(url, params=params)
r.raise_for_status()
df = pd.json_normalize(r.json()["results"])
return df
GENE_NAME = "IL23R"
OUTCOME_TRAIT = "Inflammatory bowel disease"
ppi_df = get_drug_targets_ppi(gene_name=GENE_NAME)
ppi_df
g1.name | p1.uniprot_id | p2.uniprot_id | g2.name | g2.druggability_tier | |
---|---|---|---|---|---|
0 | IL23R | Q5VWK5 | P04141 | CSF2 | Tier 1 |
1 | IL23R | Q5VWK5 | P01562 | IFNA1 | Tier 1 |
2 | IL23R | Q5VWK5 | P01579 | IFNG | Tier 1 |
3 | IL23R | Q5VWK5 | P22301 | IL10 | Tier 1 |
4 | IL23R | Q5VWK5 | P29460 | IL12B | Tier 1 |
5 | IL23R | Q5VWK5 | P42701 | IL12RB1 | Tier 1 |
6 | IL23R | Q5VWK5 | P35225 | IL13 | Tier 1 |
7 | IL23R | Q5VWK5 | P40933 | IL15 | Tier 1 |
8 | IL23R | Q5VWK5 | Q16552 | IL17A | Tier 1 |
9 | IL23R | Q5VWK5 | Q96PD4 | IL17F | Tier 1 |
10 | IL23R | Q5VWK5 | P60568 | IL2 | Tier 1 |
11 | IL23R | Q5VWK5 | Q9GZX6 | IL22 | Tier 1 |
12 | IL23R | Q5VWK5 | Q9NPF7 | IL23A | Tier 1 |
13 | IL23R | Q5VWK5 | P05112 | IL4 | Tier 1 |
14 | IL23R | Q5VWK5 | P05113 | IL5 | Tier 1 |
15 | IL23R | Q5VWK5 | P05231 | IL6 | Tier 1 |
16 | IL23R | Q5VWK5 | P15248 | IL9 | Tier 1 |
17 | IL23R | Q5VWK5 | P23458 | JAK1 | Tier 1 |
18 | IL23R | Q5VWK5 | O60674 | JAK2 | Tier 1 |
19 | IL23R | Q5VWK5 | P19838 | NFKB1 | Tier 1 |
20 | IL23R | Q5VWK5 | P42336 | PIK3CA | Tier 1 |
21 | IL23R | Q5VWK5 | P51449 | RORC | Tier 1 |
22 | IL23R | Q5VWK5 | P40763 | STAT3 | Tier 1 |
23 | IL23R | Q5VWK5 | Q969D9 | TSLP | Tier 1 |
24 | IL23R | Q5VWK5 | P29597 | TYK2 | Tier 1 |
25 | IL23R | Q5VWK5 | P51684 | CCR6 | Tier 2 |
26 | IL23R | Q5VWK5 | P25963 | NFKBIA | Tier 2 |
27 | IL23R | Q5VWK5 | Q9HC29 | NOD2 | Tier 2 |
28 | IL23R | Q5VWK5 | P27986 | PIK3R1 | Tier 2 |
29 | IL23R | Q5VWK5 | Q04206 | RELA | Tier 2 |
30 | IL23R | Q5VWK5 | P42224 | STAT1 | Tier 2 |
31 | IL23R | Q5VWK5 | P42229 | STAT5A | Tier 2 |
32 | IL23R | Q5VWK5 | P42226 | STAT6 | Tier 2 |
33 | IL23R | Q5VWK5 | P09919 | CSF3 | Tier 3A |
34 | IL23R | Q5VWK5 | Q9NZ08 | ERAP1 | Tier 3A |
35 | IL23R | Q5VWK5 | P29459 | IL12A | Tier 3A |
36 | IL23R | Q5VWK5 | Q8TAD2 | IL17D | Tier 3A |
37 | IL23R | Q5VWK5 | Q9UHD0 | IL19 | Tier 3A |
38 | IL23R | Q5VWK5 | Q9HBE4 | IL21 | Tier 3A |
39 | IL23R | Q5VWK5 | Q13007 | IL24 | Tier 3A |
40 | IL23R | Q5VWK5 | P13232 | IL7 | Tier 3A |
41 | IL23R | Q5VWK5 | O00421 | CCRL2 | Tier 3B |
For further analysis we select the gene of interest (IL23R) as well as its interacting genes with Tier 1 druggability.
def get_gene_list(ppi_df, include_primary_gene: bool = True):
if include_primary_gene:
gene_list = list(ppi_df["g1.name"].drop_duplicates()) + list(
ppi_df.query("`g2.druggability_tier` == 'Tier 1'")["g2.name"]
)
else:
gene_list = list(ppi_df.query("`g2.druggability_tier` == 'Tier 1'")["g2.name"])
return gene_list
gene_list = get_gene_list(ppi_df)
gene_list
['IL23R', 'CSF2', 'IFNA1', 'IFNG', 'IL10', 'IL12B', 'IL12RB1', 'IL13', 'IL15', 'IL17A', 'IL17F', 'IL2', 'IL22', 'IL23A', 'IL4', 'IL5', 'IL6', 'IL9', 'JAK1', 'JAK2', 'NFKB1', 'PIK3CA', 'RORC', 'STAT3', 'TSLP', 'TYK2']
2. Looking for literature evidence¶
EpiGraphDB facilitates fast processing of this information by allowing access to a host of literature-mined relationships that have been structured into semantic triples. These take the general form (subject, predicate, object) and have been generated using contemporary natural language processing techniques applied to a massive amount of published biomedical research papers by SemMedDB. In the following section we will query the API for the literature relationship between a given gene, IL23R (several studies confirmed IL23R associations in independent cohorts of patients with Crohn's disease or ulcerative colitis) and an outcome trait, Inflammatory bowel disease.
def extract_literature(outcome_trait, gene_list):
def per_gene(gene_name):
endpoint = "/gene/literature"
url = f"{API_URL}{endpoint}"
params = {"gene_name": gene_name, "object_name": outcome_trait.lower()}
r = requests.get(url, params=params)
try:
r.raise_for_status()
res_df = pd.json_normalize(r.json()["results"])
if len(res_df) > 0:
res_df = res_df.assign(
literature_count=lambda df: df["pubmed_id"].apply(lambda x: len(x))
)
return res_df
except:
return None
res_df = pd.concat(
[per_gene(gene_name=gene_name) for gene_name in gene_list]
).reset_index(drop=True)
return res_df
literature_df = extract_literature(outcome_trait=OUTCOME_TRAIT, gene_list=gene_list)
literature_df
pubmed_id | gene.name | lt.id | lt.name | lt.type | st.predicate | literature_count | |
---|---|---|---|---|---|---|---|
0 | [23131344] | IL23R | C0021390 | Inflammatory Bowel Diseases | [dsyn] | PREDISPOSES | 1 |
1 | [21155887, 17484863] | IL23R | C0021390 | Inflammatory Bowel Diseases | [dsyn] | NEG_ASSOCIATED_WITH | 2 |
2 | [31728561] | IL23R | C0021390 | Inflammatory Bowel Diseases | [dsyn] | CAUSES | 1 |
3 | [21155887, 18383521, 18383363, 25159710, 18341... | IL23R | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 21 |
4 | [27852544] | IL23R | C0021390 | Inflammatory Bowel Diseases | [dsyn] | AFFECTS | 1 |
5 | [21557945, 19030026] | CSF2 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 2 |
6 | [17206685] | CSF2 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | AFFECTS | 1 |
7 | [23891915] | IFNA1 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | TREATS | 1 |
8 | [24975266] | IFNA1 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | PREVENTS | 1 |
9 | [20951137, 28174758, 9836081] | IFNA1 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 3 |
10 | [19519446] | IFNG | C0021390 | Inflammatory Bowel Diseases | [dsyn] | TREATS | 1 |
11 | [3139380] | IFNG | C0021390 | Inflammatory Bowel Diseases | [dsyn] | CAUSES | 1 |
12 | [19740775, 18452147] | IFNG | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 2 |
13 | [10403730] | IFNG | C0021390 | Inflammatory Bowel Diseases | [dsyn] | AFFECTS | 1 |
14 | [16573780, 27917223, 19184348, 28551707, 25999... | IL10 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 13 |
15 | [27468578, 25296012] | IL10 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | AFFECTS | 2 |
16 | [27468578] | IL10 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | PREDISPOSES | 1 |
17 | [11271474] | IL10 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | NEG_PREDISPOSES | 1 |
18 | [24519095, 29023267, 17628614] | IL10 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | CAUSES | 3 |
19 | [18383521, 23573954, 30541240, 22479607, 19817... | IL12B | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 6 |
20 | [11023669, 22741617] | IL13 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 2 |
21 | [9609761, 11023669] | IL15 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 2 |
22 | [30193869, 21576383] | IL17A | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 2 |
23 | [30193869, 21994045, 18088064, 21576383] | IL17F | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 4 |
24 | [6607860, 6237813, 1587419] | IL2 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 3 |
25 | [19201773] | IL22 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 1 |
26 | [30193869, 27029486, 18753178, 18499066] | IL23A | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 4 |
27 | [10477546] | IL4 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | DISRUPTS | 1 |
28 | [7806044, 8964392, 9389741] | IL4 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 3 |
29 | [15766556] | IL6 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | TREATS | 1 |
30 | [11204808] | IL6 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | NEG_ASSOCIATED_WITH | 1 |
31 | [25145003] | IL6 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | CAUSES | 1 |
32 | [11204808, 7683293] | IL6 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 2 |
33 | [24120915] | IL6 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | AFFECTS | 1 |
34 | [29788053] | IL9 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | CAUSES | 1 |
35 | [28652656, 11515847] | IL9 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 2 |
36 | [31158699] | IL9 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | AFFECTS | 1 |
37 | [31069840, 19817673, 20627814, 22269120] | JAK2 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 4 |
38 | [27852544] | JAK2 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | AFFECTS | 1 |
39 | [17600378, 9882195] | NFKB1 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 2 |
40 | [21637825, 20004201] | PIK3CA | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 2 |
41 | [30006408] | RORC | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 1 |
42 | [28770550] | STAT3 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | CAUSES | 1 |
43 | [21733838] | STAT3 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | AUGMENTS | 1 |
44 | [21631466, 25132422, 28785144, 19817673, 20627... | STAT3 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 9 |
45 | [27852544, 21994179] | STAT3 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | AFFECTS | 2 |
46 | [27697608] | TSLP | C0021390 | Inflammatory Bowel Diseases | [dsyn] | CAUSES | 1 |
47 | [21318591, 27697608] | TSLP | C0021390 | Inflammatory Bowel Diseases | [dsyn] | ASSOCIATED_WITH | 2 |
48 | [26432894, 26432894] | TYK2 | C0021390 | Inflammatory Bowel Diseases | [dsyn] | AFFECTS | 2 |
3. Using Mendelian randomization results for causal effect estimation¶
The next step is to find out whether any of these genes have a comparable and statistically plausable effect on IBD.
def extract_mr(outcome_trait, gene_list, qtl_type):
endpoint = "/xqtl/single-snp-mr"
url = f"{API_URL}{endpoint}"
def per_gene(gene_name):
params = {
"exposure_gene": gene_name,
"outcome_trait": outcome_trait,
"qtl_type": qtl_type,
"pval_threshold": 1e-5,
}
r = requests.get(url, params=params)
try:
r.raise_for_status()
df = pd.json_normalize(r.json()["results"])
return df
except:
return None
res_df = pd.concat(
[per_gene(gene_name=gene_name) for gene_name in gene_list]
).reset_index(drop=True)
return res_df
# Search for both pqtl and eqtl
xqtl_df = pd.concat(
[
extract_mr(
outcome_trait=OUTCOME_TRAIT, gene_list=gene_list, qtl_type=qtl_type
).assign(qtl_type=qtl_type)
for qtl_type in ["pQTL", "eQTL"]
]
).reset_index(drop=True)
xqtl_df
gene.ensembl_id | gene.name | gwas.id | gwas.trait | r.beta | r.se | r.p | r.rsid | qtl_type | |
---|---|---|---|---|---|---|---|---|---|
0 | ENSG00000162594 | IL23R | ieu-a-294 | Inflammatory bowel disease | 1.500821 | 0.054592 | 2.212578e-166 | rs11581607 | pQTL |
1 | ENSG00000113302 | IL12B | ieu-a-294 | Inflammatory bowel disease | 0.417605 | 0.034490 | 9.590000e-34 | rs4921484 | pQTL |
2 | ENSG00000162594 | IL23R | ieu-a-294 | Inflammatory bowel disease | 0.886712 | 0.064420 | 4.165652e-43 | rs2064689 | eQTL |
3 | ENSG00000164136 | IL15 | ieu-a-294 | Inflammatory bowel disease | -1.421625 | 0.197131 | 5.530616e-13 | rs75301646 | eQTL |
4 | ENSG00000113520 | IL4 | ieu-a-294 | Inflammatory bowel disease | 0.459848 | 0.084050 | 4.471537e-08 | rs2070874 | eQTL |
5 | ENSG00000096968 | JAK2 | ieu-a-294 | Inflammatory bowel disease | -1.896710 | 0.203808 | 1.322967e-20 | rs4788084 | eQTL |
6 | ENSG00000109320 | NFKB1 | ieu-a-294 | Inflammatory bowel disease | 0.973556 | 0.173893 | 2.160849e-08 | rs4766578 | eQTL |
7 | ENSG00000143365 | RORC | ieu-a-294 | Inflammatory bowel disease | -0.994991 | 0.116343 | 1.207271e-17 | rs4845604 | eQTL |
8 | ENSG00000168610 | STAT3 | ieu-a-294 | Inflammatory bowel disease | 0.597473 | 0.075700 | 2.958269e-15 | rs1053004 | eQTL |
4. Query metadata¶
Here we query for the metadata information using the endpoint GET /meta/schema
, which will be used for downstream processing.
endpoint = "/meta/schema"
params = {"graphviz": False, "plot": False}
r = requests.get(f"{API_URL}{endpoint}", params=params)
r.raise_for_status()
metadata = r.json()
# Preview of metadata information
keys = metadata.keys()
print(pformat(keys), "\n")
for key in list(keys):
print(f"# {key}:")
print(pformat(metadata[key])[:1000], "\n")
dict_keys(['nodes', 'edges', 'connections']) # nodes: {'Disease': {'count': 38960, 'properties': {'_id': {'indexed': True, 'type': 'STRING', 'unique': False}, '_name': {'indexed': True, 'type': 'STRING', 'unique': False}, '_source': {'indexed': False, 'type': 'LIST', 'unique': False}, 'definition': {'indexed': False, 'type': 'STRING', 'unique': False}, 'doid': {'indexed': True, 'type': 'LIST', 'unique': False}, 'efo': {'indexed': False, 'type': 'LIST', # edges: {'BIORXIV_OBJ': {'count': 32651, 'properties': {'_source': {'array': True, 'type': 'LIST'}}}, 'BIORXIV_PREDICATE': {'count': 32648, 'properties': {'_source': {'array': True, 'type': 'LIST'}, 'count': {'array': False, 'type': 'INTEGER'}, 'predicate': {'array': False, 'type': 'STRING'}}}, 'BIORXIV_SUB': {'count': 32657, 'properties': {'_source': {'array': True, 'type': 'LIST'}}}, 'BIORXIV_TO_LIT': {'count': 35211, 'properties': {'_source': {'array': True, 'type': 'LIST'}}}, 'CPIC': {'count': 375, 'properties': {'_source': {'array': True, 'type': 'LIST'}, 'cpic_level': {'array': False, 'type': 'STRING'}, 'guideline': {'array': F # connections: [{'count': 2461, 'from_node': 'Drug', 'rel': 'OPENTARGETS_DRUG_TO_DISEASE', 'to_node': 'Disease'}, {'count': 5763, 'from_node': 'Gene', 'rel': 'GENE_TO_DISEASE', 'to_node': 'Disease'}, {'count': 8247, 'from_node': 'Disease', 'rel': 'MONDO_MAP_UMLS', 'to_node': 'LiteratureTerm'}, {'count': 2819, 'from_node': 'Disease', 'rel': 'MONDO_MAP_EFO', 'to_node': 'Efo'}, {'count': 2463, 'from_node': 'Pathway', 'rel': 'PATHWAY_CHILD_OF', 'to_node': 'Pathway'}, {'count': 121873, 'from_node': 'Protein', 'rel': 'PROTEIN_IN_PATHWAY', 'to_node': 'Pathway'}, {'count': 1969, 'from_node': 'LiteratureTriple', 'rel': 'MEDRXIV_SUB', 'to_node': 'LiteratureTerm'}, {'count': 5584547, 'from_node': 'LiteratureTerm', 'rel': 'SEMMEDDB_PREDICATE', 'to_node': 'LiteratureTerm'}, {'count': 5584547, 'from_node': 'LiteratureTriple', 'rel': 'SEMMEDDB_SUB', 'to_node': 'LiteratureTerm'}, {'count': 5556, 'from_node': 'Gwas', 'rel': 'METAMAP_LITE', 'to_node'
We can extract the specific meta node information as a pandas dataframe from the metadata.
meta_node_df = pd.DataFrame.from_dict(metadata["nodes"], orient="index")
(
meta_node_df.sort_index().assign(
count=lambda df: df["count"].apply(lambda x: f"{x:,}")
)
)
count | properties | |
---|---|---|
Disease | 38,960 | {'_name': {'type': 'STRING', 'indexed': True, ... |
Drug | 2,697 | {'molecule_type': {'type': 'STRING', 'indexed'... |
Efo | 25,390 | {'_name': {'type': 'STRING', 'indexed': True, ... |
Gene | 57,737 | {'druggability_tier': {'type': 'STRING', 'inde... |
Gwas | 34,494 | {'note': {'type': 'STRING', 'indexed': False, ... |
Literature | 3,995,672 | {'issn': {'type': 'STRING', 'indexed': False, ... |
LiteratureTerm | 108,905 | {'_name': {'type': 'STRING', 'indexed': True, ... |
LiteratureTriple | 5,609,945 | {'subject_id': {'type': 'STRING', 'indexed': T... |
Pathway | 2,441 | {'_name': {'type': 'STRING', 'indexed': True, ... |
Protein | 20,280 | {'name': {'type': 'STRING', 'indexed': True, '... |
Tissue | 54 | {'name': {'type': 'STRING', 'indexed': True, '... |
Variant | 99,005 | {'ref': {'type': 'STRING', 'indexed': False, '... |
We can also extract the meta relationship (edge) information, and the connections.
meta_rel_df = pd.DataFrame.from_dict(metadata["edges"], orient="index").merge(
pd.DataFrame.from_dict(
{_["rel"]: _ for _ in metadata["connections"]}, orient="index"
)[["from_node", "to_node"]],
left_index=True,
right_index=True,
)
(
meta_rel_df.sort_values(by=["from_node", "to_node"]).assign(
count=lambda df: df["count"].apply(lambda x: f"{x:,}")
)
)
count | properties | from_node | to_node | |
---|---|---|---|---|
MONDO_MAP_EFO | 2,819 | {'_source': {'array': False, 'type': 'STRING'}} | Disease | Efo |
MONDO_MAP_UMLS | 8,247 | {'_source': {'array': False, 'type': 'STRING'}} | Disease | LiteratureTerm |
OPENTARGETS_DRUG_TO_DISEASE | 2,461 | {'_source': {'array': True, 'type': 'LIST'}} | Drug | Disease |
CPIC | 375 | {'pharmgkb_level_of_evidence': {'array': False... | Drug | Gene |
OPENTARGETS_DRUG_TO_TARGET | 6,534 | {'phase': {'array': False, 'type': 'STRING'}, ... | Drug | Gene |
EFO_CHILD_OF | 43,132 | {'_source': {'array': True, 'type': 'LIST'}} | Efo | Efo |
GENE_TO_DISEASE | 5,763 | {'last_updated': {'array': False, 'type': 'STR... | Gene | Disease |
XQTL_MULTI_SNP_MR | 3,015,233 | {'p': {'array': False, 'type': 'FLOAT'}, 'se':... | Gene | Gwas |
XQTL_SINGLE_SNP_MR_GENE_GWAS | 8,449,779 | {'p': {'array': False, 'type': 'FLOAT'}, 'se':... | Gene | Gwas |
GENE_TO_PROTEIN | 19,142 | {'_source': {'array': True, 'type': 'LIST'}} | Gene | Protein |
EXPRESSED_IN | 2,918,240 | {'tpm': {'array': False, 'type': 'FLOAT'}, '_s... | Gene | Tissue |
GWAS_NLP_EFO | 12,302 | {'score': {'array': False, 'type': 'FLOAT'}, '... | Gwas | Efo |
GWAS_EFO_EBI | 281 | {'_source': {'array': True, 'type': 'LIST'}} | Gwas | Efo |
PRS | 118,124 | {'p': {'array': False, 'type': 'FLOAT'}, 'r2':... | Gwas | Gwas |
MR_EVE_MR | 25,804,945 | {'b': {'array': False, 'type': 'FLOAT'}, 'se':... | Gwas | Gwas |
GEN_COR | 840,960 | {'h2_intercept_SE': {'array': False, 'type': '... | Gwas | Gwas |
OBS_COR | 17,932 | {'_source': {'array': True, 'type': 'LIST'}, '... | Gwas | Gwas |
GWAS_NLP | 89,239,773 | {'score': {'array': False, 'type': 'FLOAT'}, '... | Gwas | Gwas |
GWAS_TO_LITERATURE | 28,111,669 | {'_source': {'array': True, 'type': 'LIST'}} | Gwas | Literature |
METAMAP_LITE | 5,556 | {'_source': {'array': True, 'type': 'LIST'}, '... | Gwas | LiteratureTerm |
GWAS_TO_LITERATURE_TRIPLE | 17,531,153 | {'pval': {'array': False, 'type': 'FLOAT'}, 'g... | Gwas | LiteratureTriple |
OPENGWAS_TOPHITS | 160,283 | {'_source': {'array': True, 'type': 'LIST'}, '... | Gwas | Variant |
GWAS_TO_VARIANT | 26,436 | {'se': {'array': False, 'type': 'FLOAT'}, 'nca... | Gwas | Variant |
TERM_TO_GENE | 16,435 | {'_source': {'array': False, 'type': 'STRING'}} | LiteratureTerm | Gene |
SEMMEDDB_PREDICATE | 5,584,547 | {'count': {'array': False, 'type': 'INTEGER'},... | LiteratureTerm | LiteratureTerm |
BIORXIV_PREDICATE | 32,648 | {'count': {'array': False, 'type': 'INTEGER'},... | LiteratureTerm | LiteratureTerm |
MEDRXIV_PREDICATE | 1,969 | {'count': {'array': False, 'type': 'INTEGER'},... | LiteratureTerm | LiteratureTerm |
BIORXIV_TO_LIT | 35,211 | {'_source': {'array': True, 'type': 'LIST'}} | LiteratureTriple | Literature |
SEMMEDDB_TO_LIT | 10,589,785 | {'_source': {'array': True, 'type': 'LIST'}} | LiteratureTriple | Literature |
MEDRXIV_TO_LIT | 2,091 | {'_source': {'array': True, 'type': 'LIST'}} | LiteratureTriple | Literature |
MEDRXIV_SUB | 1,969 | {'_source': {'array': True, 'type': 'LIST'}} | LiteratureTriple | LiteratureTerm |
SEMMEDDB_SUB | 5,584,547 | {'_source': {'array': True, 'type': 'LIST'}} | LiteratureTriple | LiteratureTerm |
MEDRXIV_OBJ | 1,969 | {'_source': {'array': True, 'type': 'LIST'}} | LiteratureTriple | LiteratureTerm |
BIORXIV_OBJ | 32,651 | {'_source': {'array': True, 'type': 'LIST'}} | LiteratureTriple | LiteratureTerm |
BIORXIV_SUB | 32,657 | {'_source': {'array': True, 'type': 'LIST'}} | LiteratureTriple | LiteratureTerm |
SEMMEDDB_OBJ | 5,584,547 | {'_source': {'array': True, 'type': 'LIST'}} | LiteratureTriple | LiteratureTerm |
PATHWAY_CHILD_OF | 2,463 | {'_source': {'array': True, 'type': 'LIST'}} | Pathway | Pathway |
PROTEIN_IN_PATHWAY | 121,873 | {'_source': {'array': True, 'type': 'LIST'}} | Protein | Pathway |
STRING_INTERACT_WITH | 827,184 | {'score': {'array': False, 'type': 'FLOAT'}, '... | Protein | Protein |
VARIANT_TO_GENE | 108,561 | {'amino_acids': {'array': False, 'type': 'STRI... | Variant | Gene |
XQTL_SINGLE_SNP_MR_SNP_GENE | 41,564 | {'_source': {'array': True, 'type': 'LIST'}} | Variant | Gene |
We can generate a network diagram of the graph db schema using networkx
.
graph = nx.from_pandas_edgelist(meta_rel_df, source="from_node", target="to_node")
cmap = matplotlib.colors.ListedColormap(["dodgerblue", "lightgray", "darkorange"])
meta_rel_df["from_node"] = pd.Categorical(meta_rel_df["from_node"])
f = plt.figure(figsize=(10, 10))
f.tight_layout()
plt.subplot(1, 1, 1)
nx.draw(
G=graph,
with_labels=True,
node_size=3000,
edgecolors="gray",
node_color="skyblue",
font_size=10,
font_weight="bold",
width=0.75,
)
Reference¶
Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, Abraham C, Regueiro M, Griffiths A, others. 2006. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 314:1461–1463.
Finan C, Gaulton A, Kruger FA, Lumbers RT, Shah T, Engmann J, Galver L, Kelley R, Karlsson A, Santos R, others. 2017. The druggable genome and support for target identification and validation in drug development. Science translational medicine 9:eaag1166.
Momozawa Y, Mni M, Nakamura K, Coppieters W, Almer S, Amininejad L, Cleynen I, Colombel J-F, De Rijk P, Dewit O, others. 2011. Resequencing of positional candidates identifies low frequency IL23R coding variants protecting against inflammatory bowel disease. Nature genetics 43:43–47.
Zheng J, Brumpton BM, Bronson PG, Liu Y, Haycock P, Elsworth B, Haberland V, Baird D, Walker V, Robinson JW, John S, Prins B, Runz H, Nelson MR, Hurle M, Hemani G, Asvold BO, Butterworth A, Smith GD, Scott RA, Gaunt TR. 2019. Systematic Mendelian randomization and colocalization analyses of the plasma proteome and blood transcriptome to prioritize drug targets for complex disease.