README.md 4.34 KB
Newer Older
Susan Ghaderi's avatar
Susan Ghaderi committed
1
# Dibac: Distribution-Based Analysis Of Cell Differentiation Identifies Mechanisms Of Cell Fate
Susan Ghaderi's avatar
Susan Ghaderi committed
2
### Susan Ghader, Stefano Magni, Thais Arns, Tomasz Ignac and Alexander Skupin
Susan Ghaderi's avatar
Susan Ghaderi committed
3

Alexander Skupin's avatar
Alexander Skupin committed
4
Publication: Submitted
Susan Ghaderi's avatar
Susan Ghaderi committed
5

Susan Ghaderi's avatar
Susan Ghaderi committed
6
7
## Abstract
The recent developments in single cell genomics allow for in-depth characterization of cellular heterogeneity in tissue development and the identification of new regulatory mechanisms. Despite these achievements, our understanding of underlying principles in cell fate dynamics is still rather limited.  Here, we present a new approach that exploits the high dimensional transcription distributions of single cell RNA sequencing (sc-RNAseq) data by information theory-based measures, which allow for robust identification of cell differentiation properties and efficient differentially expressed gene (DEG) analysis. We show that appropriate binarization of single cell transcription data allows for the rigorous definition of mutual information and robust entropy measures that reflect the general properties of cell fate decisions. We exemplify our distribution-based analysis of cell differentiation (DiBAC) with single cell qPCR data of blood cell development and sc-RNAseq data of Parkinson's disease-related iPSC differentiation into dopaminergic neurons.
Susan Ghaderi's avatar
Susan Ghaderi committed
8

Susan Ghaderi's avatar
Susan Ghaderi committed
9
## Environment
Alexander Skupin's avatar
Alexander Skupin committed
10
All analyses were done in python3 and R and main findings are presented in the corresponding figures of the manuscript. Figure panels and results can be obtained by running the corresponding script.
Susan Ghaderi's avatar
Susan Ghaderi committed
11

Alexander Skupin's avatar
Alexander Skupin committed
12
## Environment requirments
Susan Ghaderi's avatar
Susan Ghaderi committed
13
14
15
- Numpy
- Scanpy
- Pandas
Susan Ghaderi's avatar
Susan Ghaderi committed
16
17
- matplotlip
- sklearn
Susan Ghaderi's avatar
Susan Ghaderi committed
18
- SOMPY
Alexander Skupin's avatar
Alexander Skupin committed
19
- R (for GO analysis)
Susan Ghaderi's avatar
Susan Ghaderi committed
20

Susan Ghaderi's avatar
Susan Ghaderi committed
21
## Data
Alexander Skupin's avatar
Alexander Skupin committed
22
23
In our analysis, we used two kinds of data: single-cell qPCR data of blood cell development and sc-RNAseq data of PD-related iPSC differentiation provided in the data folder of the repository. 
The single-cell qPCR data of blood cell development contains data for three treatments of the EML stem cells (ERY, MYL and COM) and 4 time points for each treatment. They are located in the folder data/BloodCell_data. The sc-RNAseq data of PD-related iPSC differentiation containes the two conditions, mutant (A) and isogenic control (B). THe DEMs are located in the folder data/IPS_data.
Susan Ghaderi's avatar
Susan Ghaderi committed
24

Alexander Skupin's avatar
Alexander Skupin committed
25
## Code for Results
Susan Ghaderi's avatar
Susan Ghaderi committed
26

Alexander Skupin's avatar
Alexander Skupin committed
27
Our analysis is composed of several steps:
Alexander Skupin's avatar
Alexander Skupin committed
28
29
30
- t-SNE plots (Fig. 2):
  - This code is provided in **tSNE_plot** and described in SI.  

Alexander Skupin's avatar
Alexander Skupin committed
31
-  Computing diffrentially expressed genes (DEGs) (Fig. 2) using:
Alexander Skupin's avatar
Alexander Skupin committed
32
    - Mutual information (including binarization)
Alexander Skupin's avatar
Alexander Skupin committed
33
      - the code for this computation is in **Fig_2_DEG_computation.py**
Susan Ghaderi's avatar
Susan Ghaderi committed
34
    - Scanpy
Alexander Skupin's avatar
Alexander Skupin committed
35
      - The code for this results is also provided in **Fig2_DEG_computation.py**. To run this code, please install [Scanpy – Single-Cell Analysis in Python (https://scanpy.readthedocs.io/en/stable/)].
Susan Ghaderi's avatar
Susan Ghaderi committed
36

Alexander Skupin's avatar
Alexander Skupin committed
37

Alexander Skupin's avatar
Alexander Skupin committed
38
 - Computing correlation and mutual information based transition indices (Fig. 3)
Alexander Skupin's avatar
Alexander Skupin committed
39
   - Computing correlations and mutual information between cells and genes and subsequent transition indices for cell differentiation.
Alexander Skupin's avatar
Alexander Skupin committed
40
41
   - The code for these results is given in **Fig_3_MI_Correlations_panels_CDEF.py**. This code depends on the class objects **Criticl_index_computation.py** and **Main_Script_MI.py** which have to be located at the corresponding paths.

Alexander Skupin's avatar
Alexander Skupin committed
42
43

- Computing Kullback–Leibler divergence (KL) (Fig. 4):
Alexander Skupin's avatar
Alexander Skupin committed
44
   - The code is provided in **KL_computation.py**. This code also depends on the class objects **Criticl_index_computation.py** and **Main_Script_MI.py** which have to be located at the corresponding paths.
Susan Ghaderi's avatar
Susan Ghaderi committed
45
- Computing Self orgnizing map (SOM):
Alexander Skupin's avatar
Alexander Skupin committed
46
   - The code is provided in **Fig_4_C_SOM_analysis.py**. To implement this code, please  launch [SOMPY (https://github. com/sevamoo/SOMPY)] first.
Alexander Skupin's avatar
Alexander Skupin committed
47
   - This code depends on the KL data calculated by  **KL_computation.py**, which has to be executed first.
Susan Ghaderi's avatar
Susan Ghaderi committed
48
- Pathway analysis:
Alexander Skupin's avatar
Alexander Skupin committed
49
  - The corresponding R code is provided in **GO_R_Code.R** and **Plot_Selected_GO_R_Code**.
Susan Ghaderi's avatar
Susan Ghaderi committed
50

Alexander Skupin's avatar
Alexander Skupin committed
51

Susan Ghaderi's avatar
Susan Ghaderi committed
52
## Notice
Alexander Skupin's avatar
Alexander Skupin committed
53
After cloning this project, please ensure that the environment and paths are set up accordingly. Then the results of the figures can be generated by runing e.g. **Fig_3_MI_Correlation_panels_CDEF.py** or **KL_computation.py** from the terminal or your corresponding python environment and will calculate the correlation and mutual information based indices, KL values, and the corresponding plots.
Alexander Skupin's avatar
Alexander Skupin committed
54
For DEGs and SOM calculations, the required packages should be first launched and the corresponding codes should be put at the corresponding paths.
Susan Ghaderi's avatar
Susan Ghaderi committed
55

Susan Ghaderi's avatar
Susan Ghaderi committed
56
57