Skip to content
Snippets Groups Projects

ChemPert: mapping between chemical perturbation and transcriptional response for noncancer cells

This repository presents a ChemPert tool either to predict the transcriptional responses given the perturbagen or predict the perturbagen targeting desired sets of transcriptional TFs based on ChemPert database.

To install it:

git clone https://gitlab.lcsb.uni.lu/CBG/chempert.git path/to/workdir
cd path/to/workdir

IMPORTANT Download short path information from WEBDAV https://webdav-r3lab.uni.lu/public/data/9p51-ch19/

cd path/to/workdir/chempert/PKN/
wget https://webdav-r3lab.uni.lu/public/data/9p51-ch19/padjust_enriched_allSimplePath.tar.gz
tar –zxvf padjust_enriched_allSimplePath.tar.gz

Requirements

ChemPert was implemented in R and it has been tested on Unix environment with R version 3.6.2.

R packages

ChemPert requires the packages: "tibble", "plyr", "dplyr", "igraph", "writexl", "Matrix", "foreach", "doParallel", "iterators", "bigmemory", "gtools", "viper", "fsea", "limma". To test if you have the required packages and automatically install the missing packages, run R_package_install.R.

Usage

There are two options for ChemPert tool. One option for predicting the response TFs given the perturbagen and expression profile of initial cellular state (Option 1) and the other for predicting perturbagens that target desired transcriptional TFs (Option 2).

Input file formats

Option 1: The prediction of response TFs

  1. Option: Specify the option parameter as 1.

  2. Species: only for human/mouse/rat.

The targets of perturbagen can be given by user or searched from the ChemPert database.

  1. Perturbagen target file: The file of perturbagen targets in rds/txt format (See example in testdata/responseTFs_prediction_example/Example_of_perturbagen_target_file.txt). If parameter 4 is given, this parameter should be NULL.

  2. Perturbagen: The perturbagen name from ChemPert database. If parameter 3 is given, this parameter should be NULL.

  3. Expression profile file: Gene expression file of initial cellular state in rds/txt format, which contains the mean expression value for each genes (See example in testdata/responseTFs_prediction_example/Example_of_expression_profile_file.txt)

Run test dataset

Rscript pipeline_chempert.R 1 human ../testdata/responseTFs_prediction_example/Example_of_perturbagen_target_file.txt NULL ../testdata/responseTFs_prediction_example/Example_of_expression_profile_file.txt

or

Rscript pipeline_chempert.R 1 human NULL "sb-203580" ../testdata/responseTFs_prediction_example/Example_of_expression_profile_file.txt

Option 2: The prediction of perturbagens

  1. Option: Specify the option parameter as 2.

  2. Species: only for human/mouse/rat.

  3. Query TF file: Query TF file with TF names in the first column and the value of TFs in the second column, which means activation/inhibition(1/-1), in rds/txt format (See example in testdata/perturbagen_prediction_example/Input_queryTFs_GSE169077.txt ).

Run test dataset

Rscript pipeline_chempert.R 2 human ../testdata/perturbagen_prediction_example/Input_queryTFs_GSE169077.txt 

Note:

To predict the transcriptional responses or perturbagens by using ChemPert tool, or download the ChemPert database, user can also go to the webpage of ChemPert directly: https://chempert.uni.lu .

Output files

Option 1: The prediction of response TFs

The output files after running the prediction of response TFs are presented in output/, including 2 files:

  • padjust_enriched_allSimplePath_MajorityLen.Robj: This file contains the p-adjusted value of the enriched short paths for the initial gene expression data. This file is used for the prediction of response TFs.
  • predicted_reTFs.txt: This file contains the list of predicted response TFs that are sorted by the frequency with which each TF appeared in retrieved transcriptomics datasets. The higher this frequency, the more likely that the TF is a responder of the query perturbation.

Option 2: The prediction of perturbagens

The output files after running the prediction of perturbagens are presented in output/, including 2 files:

  • predicted_signalling_proteins.xlsx: This file contains the list of predicted signalling proteins (column "target") and their corresponding sign (column "sign"), Jaccard score (column "score") and frequency (column "Freq"). The column "sign" includes values "1", "-1" and "2", which mean the corresponding protein should be activated, inhibited or unknown, respectively.
  • predicted_perturbagens.xlsx: This file contains the list of predicted perturbagens and their corresponding information. The perturbagens are ranked based on the column "NES" (Normalised Enrichment Score) in descending order. The columns "Target_size", "p.value","FDR" show the number of targets for corresponding perturbagen, p-value and false discovery rate, respectively. Column "Targets" presents the targets of perturbagen predicted by ChemPert. Columns “Predicted_Effect” and “Database_Effect” show the interaction effect between perturbagens and signalling proteins that are reported in our prediction and public databases, respectively. Values 1, -1 and 2 mean activation, inhibition and unknown, respectively. Column "Annotation" reports the functional annotation.

uni.lu.svg