ChemPert: mapping between chemical perturbation and transcriptional response for noncancer cells
This repository presents a ChemPert tool either to predict the transcriptional responses given the perturbagen or predict the perturbagen targeting desired sets of transcriptional TFs based on ChemPert database.
To install it:
git clone https://gitlab.lcsb.uni.lu/CBG/chempert.git path/to/workdir
cd path/to/workdir
IMPORTANT Download short path information from WEBDAV https://webdav-r3lab.uni.lu/public/data/9p51-ch19/
cd path/to/workdir/chempert/PKN/
wget https://webdav-r3lab.uni.lu/public/data/9p51-ch19/padjust_enriched_allSimplePath.tar.gz
tar –zxvf padjust_enriched_allSimplePath.tar.gz
Requirements
ChemPert was implemented in R and it has been tested on Unix environment with R version 3.6.2.
R packages
ChemPert requires the packages: "tibble", "plyr", "dplyr", "igraph", "writexl", "Matrix", "foreach", "doParallel", "iterators", "bigmemory", "gtools", "viper", "fsea", "limma". To test if you have the required packages and automatically install the missing packages, run R_package_install.R
.
Usage
There are two options for ChemPert tool. One option for predicting the response TFs given the perturbagen and expression profile of initial cellular state (Option 1) and the other for predicting perturbagens that target desired transcriptional TFs (Option 2).
Input file formats
Option 1: The prediction of response TFs
-
Option: Specify the option parameter as 1.
-
Species: only for human/mouse/rat.
The targets of perturbagen can be given by user or searched from the ChemPert database.
-
Perturbagen target file: The file of perturbagen targets in rds/txt format (See example in
testdata/responseTFs_prediction_example/Example_of_perturbagen_target_file.txt
). If parameter 4 is given, this parameter should be NULL. -
Perturbagen: The perturbagen name from ChemPert database. If parameter 3 is given, this parameter should be NULL.
-
Expression profile file: Gene expression file of initial cellular state in rds/txt format, which contains the mean expression value for each genes (See example in
testdata/responseTFs_prediction_example/Example_of_expression_profile_file.txt
)
Run test dataset
Rscript pipeline_chempert.R 1 human ../testdata/responseTFs_prediction_example/Example_of_perturbagen_target_file.txt NULL ../testdata/responseTFs_prediction_example/Example_of_expression_profile_file.txt
or
Rscript pipeline_chempert.R 1 human NULL "sb-203580" ../testdata/responseTFs_prediction_example/Example_of_expression_profile_file.txt
Option 2: The prediction of perturbagens
-
Option: Specify the option parameter as 2.
-
Species: only for human/mouse/rat.
-
Query TF file: Query TF file with TF names in the first column and the value of TFs in the second column, which means activation/inhibition(1/-1), in rds/txt format (See example in
testdata/perturbagen_prediction_example/Input_queryTFs_GSE169077.txt
).
Run test dataset
Rscript pipeline_chempert.R 2 human ../testdata/perturbagen_prediction_example/Input_queryTFs_GSE169077.txt
Note:
To predict the transcriptional responses or perturbagens by using ChemPert tool, or download the ChemPert database, user can also go to the webpage of ChemPert directly: https://chempert.uni.lu .
Output files
Option 1: The prediction of response TFs
The output files after running the prediction of response TFs are presented in output/
, including 2 files:
- padjust_enriched_allSimplePath_MajorityLen.Robj: This file contains the p-adjusted value of the enriched short paths for the initial gene expression data. This file is used for the prediction of response TFs.
- predicted_reTFs.txt: This file contains the list of predicted response TFs that are sorted by the frequency with which each TF appeared in retrieved transcriptomics datasets. The higher this frequency, the more likely that the TF is a responder of the query perturbation.
Option 2: The prediction of perturbagens
The output files after running the prediction of perturbagens are presented in output/
, including 2 files:
- predicted_signalling_proteins.xlsx: This file contains the list of predicted signalling proteins (column "target") and their corresponding sign (column "sign"), Jaccard score (column "score") and frequency (column "Freq"). The column "sign" includes values "1", "-1" and "2", which mean the corresponding protein should be activated, inhibited or unknown, respectively.
- predicted_perturbagens.xlsx: This file contains the list of predicted perturbagens and their corresponding information. The perturbagens are ranked based on the column "NES" (Normalised Enrichment Score) in descending order. The columns "Target_size", "p.value","FDR" show the number of targets for corresponding perturbagen, p-value and false discovery rate, respectively. Column "Targets" presents the targets of perturbagen predicted by ChemPert. Columns “Predicted_Effect” and “Database_Effect” show the interaction effect between perturbagens and signalling proteins that are reported in our prediction and public databases, respectively. Values 1, -1 and 2 mean activation, inhibition and unknown, respectively. Column "Annotation" reports the functional annotation.