# Dibac: Distribution-Based Analysis Of Cell Differentiation Identifies Mechanisms Of Cell Fate ### Susan Ghader, Stefano Magni, Thais Arns, Tomasz Ignac and Alexander Skupin Publication: Submitted ## Abstract The recent developments in single cell genomics allow for in-depth characterization of cellular heterogeneity in tissue development and the identification of new regulatory mechanisms. Despite these achievements, our understanding of underlying principles in cell fate dynamics is still rather limited. Here, we present a new approach that exploits the high dimensional transcription distributions of single cell RNA sequencing (sc-RNAseq) data by information theory-based measures, which allow for robust identification of cell differentiation properties and efficient differentially expressed gene (DEG) analysis. We show that appropriate binarization of single cell transcription data allows for the rigorous definition of mutual information and robust entropy measures that reflect the general properties of cell fate decisions. We exemplify our distribution-based analysis of cell differentiation (DiBAC) with single cell qPCR data of blood cell development and sc-RNAseq data of Parkinson's disease-related iPSC differentiation into dopaminergic neurons. ## Environment All analyses were done in python3 and R and main findings are presented in the corresponding figures of the manuscript. Figure panels and results can be obtained by running the corresponding script. ## Environment requirments - Numpy - Scanpy - Pandas - matplotlip - sklearn - SOMPY - R (for GO analysis) ## Data In our analysis, we used two kinds of data: single-cell qPCR data of blood cell development and sc-RNAseq data of PD-related iPSC differentiation provided in the data folder of the repository. The single-cell qPCR data of blood cell development contains data for three treatments of the EML stem cells (ERY, MYL and COM) and 4 time points for each treatment. They are located in the folder data/BloodCell_data. The sc-RNAseq data of PD-related iPSC differentiation containes the two conditions, mutant (A) and isogenic control (B). THe DEMs are located in the folder data/IPS_data. ## Code for Results Our analysis is composed of several steps: - t-SNE plots (Fig. 2): - This code is provided in **tSNE_plot** and described in SI. - Computing diffrentially expressed genes (DEGs) (Fig. 2) using: - Mutual information (including binarization) - the code for this computation is in **Fig_2_DEG_computation.py** - Scanpy - The code for this results is also provided in **Fig2_DEG_computation.py**. To run this code, please install [Scanpy – Single-Cell Analysis in Python (https://scanpy.readthedocs.io/en/stable/)]. - Computing correlation and mutual information based transition indices (Fig. 3) - Computing correlations and mutual information between cells and genes and subsequent transition indices for cell differentiation. - The code for these results is given in **Fig_3_MI_Correlations_panels_CDEF.py**. This code depends on the class objects **Criticl_index_computation.py** and **Main_Script_MI.py** which have to be located at the corresponding paths. - Computing Kullback–Leibler divergence (KL) (Fig. 4): - The code is provided in **KL_computation.py**. This code also depends on the class objects **Criticl_index_computation.py** and **Main_Script_MI.py** which have to be located at the corresponding paths. - Computing Self orgnizing map (SOM): - The code is provided in **Fig_4_C_SOM_analysis.py**. To implement this code, please launch [SOMPY (https://github. com/sevamoo/SOMPY)] first. - This code depends on the KL data calculated by **KL_computation.py**, which has to be executed first. - Pathway analysis: - The corresponding R code is provided in **GO_R_Code.R** and **Plot_Selected_GO_R_Code**. ## Notice After cloning this project, please ensure that the environment and paths are set up accordingly. Then the results of the figures can be generated by runing e.g. **Fig_3_MI_Correlation_panels_CDEF.py** or **KL_computation.py** from the terminal or your corresponding python environment and will calculate the correlation and mutual information based indices, KL values, and the corresponding plots. For DEGs and SOM calculations, the required packages should be first launched and the corresponding codes should be put at the corresponding paths.