knoweng_nih

###Author: Edward Huang KnowEng project for NetPath.

Preprocessing

Converts the Stuart data that Sheng sent to the old data, compatible with the scripts meant to run on Mayo clinic data.

$ python ensg_to_hgnc_conversion.py

LINCS top pathways (ground truth)

Finds the top pathways for each drug, cell-line pair from LINCS dataset.
```
$ python lincs_top_pathways.py z_score_min max_genes_per_drug
```
Original tuning parameters: Z_SCORE_MIN = 2 MAX_GENES_PER_DRUG = 250
Positive control for LINCS.
```
$ python top_pathways_lincs_positive_control.py AFT_NUM
```
Generates drug-cell_line-pathway combinations to check correctness of LINCS data.

NetPath

Finds top pathways for the gene expression data.
```

$ python2.7 drug_pathway_fisher_correlation.py -s sortP -p 0.05 -i ge
$ python correlation_top_pathways_kw.py
```
Makes two files. One shows p-values between drugs and pathways (Fisher's test between most highly correlated genes for the drug and genes in the pathway). Another shows p-values between drugs and genes (Pearson between gene expression and drug response).
Make embedding files that only have gene embedding vectors. Kind of a misnomer. Doesn't add drug embeddings, but rather the missing pathway embeddings.
```
python impute_drugless_embeddings.py
```
Here, we can decide whether to use abs(cos) * corr or just (cos * corr) when computing a drug-pathway score (line 110).
```
$ python embedding_top_pathways.py top_k
```
Same format as the other top pathways. However, we can tune the top k pathways to keep.

Testing NetPath

Creates an embedding file, ppi_top_pathways_50_0.8.U_top_250_just_cosine.txt, that uses scores of only cosine similarity, not cosine * correlation of top k most correlated genes per drug.
```
$ python embedding_top_pathways_just_cosine.py
```
Instead of taking the top 250 most correlated genes for each drug, we get the a random 250 genes. Only runs for ppi_top_pathways_50_0.8.U_top_250. Produces random_ppi_top_pathways_50_0.8.U_top_250.txt.
```
$ python random_embedding_top_pathways.py
```
Compares the two files created by the two scripts prior to this one.
```
$ python compute_p_values_netpath.py
```

NetPath Evaluation

Compare the previous drug-pathway ranking methods with LINCS as a baseline.
```
$ python compare_methods_with_lincs.py corr_fisher/corr_kw/ppi/genetic/literome/sequence lincs_z lincs_max_num top_k
```
Compares how each method finds drug-pathway pairs to LINCS, the ground truth, by using Fisher's test. lincs_z and lincs_max_num determine which LINCS top pathways file to compare to.
Prints out summary tables for different p-values how each method compares to the baseline, Pearson correlation.
```
$ python summary_comparison_with_lincs.py lincs_z lincs_max_num top_k
```

Superdrug experiments

Superdrug - finding the drug that is most representative of all drugs to find genes that have good drug response for all drugs.

Outputs a file containing the most principal component vector, with length equal to the number of genes in our data.
```
$ python superdrug_principal_component.py
```
Outputs a file containing the pathway p-values when compared to the TOP_K genes of the superdrug.
```
$ python superdrug_top_pathways.py TOP_K
```
This removes pathways below a threshold for the superdrug. We want to remove the pathways that are too similar to too many drugs.
```
$ python compare_methods_with_lincs.py ppi/genetic/literome/sequence TOP_K
```
This script computes the 4x4 summary table for all of our methods.
```
$ summary_comparison_with_lincs.py TOP_K
```
Removes the superdrug pathways from the expression pathways that are below the p_val arugment. Outputs top_pathways_exp_hgnc_subtract_superdrug.txt.
```
$ python subtract_superdrug_from_pathways.py exp p_val
```
Running random pathways. Randomly samples pathways for each drug, where the number of samples equals the number of pathway-drug pairs for expression below a certain threshold, with this threshold in the range of [0.001, 0.005, 0.01, 0.05, 0.1].
```
$ python random_control_genes.py
```
Gets the top 250 superdrug genes.
```
$ python reformat_superdrug_genes.py
```

Kruskal-Wallis baseline tests

Recompute the top pathway rankings using the Kruskal-Wallis H-test instead of Fisher's exact test.
```
$ python kw_top_pathways.py
```
Preprocess LINCS file.
```
$ python preprocess_lincs_z_scores.py
```
Embedding doesn't use Fisher's test.
```
$ python lincs_top_pathways_kw.py
```

Assorted scripts

Finds how many pathways are in common between KEGG pathway genes and LINCS genes.
```
$ python kegg_lincs_intersection.py
```
For the paper. This script finds the drugs the NetPath has below the threshold that Fisher's test does not retrieve for gene expression values. PPI, for 0.01 and 0.005. Finds 13 drug-path pairs. Expression only finds 10.
```
$ python various_paper_scripts.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

knoweng_nih

Preprocessing

LINCS top pathways (ground truth)

NetPath

Testing NetPath

NetPath Evaluation

Superdrug experiments

Kruskal-Wallis baseline tests

Assorted scripts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
README.md		README.md
compare_methods_with_lincs.py		compare_methods_with_lincs.py
compute_p_values_netpath.py		compute_p_values_netpath.py
correlation_top_pathways_kw.py		correlation_top_pathways_kw.py
drug_pathway_fisher_correlation.py		drug_pathway_fisher_correlation.py
embedding_top_pathways.py		embedding_top_pathways.py
embedding_top_pathways_just_cosine.py		embedding_top_pathways_just_cosine.py
ensg_to_hgnc_conversion.py		ensg_to_hgnc_conversion.py
file_operations.py		file_operations.py
impute_drugless_embeddings.py		impute_drugless_embeddings.py
lincs_top_pathways.py		lincs_top_pathways.py
lincs_top_pathways_kw.py		lincs_top_pathways_kw.py
pca_top_pathways.py		pca_top_pathways.py
preprocess_l1_top_pathways.py		preprocess_l1_top_pathways.py
preprocess_lincs_z_scores.py		preprocess_lincs_z_scores.py
random_control_genes.py		random_control_genes.py
random_embedding_top_pathways.py		random_embedding_top_pathways.py
reformat_superdrug_genes.py		reformat_superdrug_genes.py
subtract_superdrug_from_pathways.py		subtract_superdrug_from_pathways.py
summary_comparison_with_lincs.py		summary_comparison_with_lincs.py
superdrug_principal_component.py		superdrug_principal_component.py
superdrug_top_pathways.py		superdrug_top_pathways.py
top_pathways_lincs_positive_control.py		top_pathways_lincs_positive_control.py
various_paper_scripts.py		various_paper_scripts.py

Folders and files

Latest commit

History

Repository files navigation

knoweng_nih

Preprocessing

LINCS top pathways (ground truth)

NetPath

Testing NetPath

NetPath Evaluation

Superdrug experiments

Kruskal-Wallis baseline tests

Assorted scripts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages