-
Anna Buschart authoredAnna Buschart authored
- This repository contains scripts and related explanations for the analysis of metagenomics data of the microbiome in infants during the first week of life.
- Part 1: Read preprocessing, assembly, gene prediction and read mapping using IMP
- Part 2: Curation of assembled contigs to remove artefactual sequences
- Part 3: Annotation of genes with KEGG orthologous groups (KOs) and counting of reads per KO
- Part 4: Binning of contigs into population-level genomes
- Part 5: Linking of population-level genomes over different samples using phylogenetic marker genes
- Part 6: Taxonomic annotation of the bins
- Part 7: Linking strains and population-level genomes over different samples using SNV-patterns
- Part 8: Analysis of intra- and inter-population variability
- Part 9: mOTU analysis
This repository contains scripts and related explanations for the analysis of metagenomics data of the microbiome in infants during the first week of life.
The different parts of the workflow are described below. The links in the sub-headings below lead to descriptions of these different parts and how different scripts are connected. You can find the scripts behind the links next to the bullet points.
Read preprocessing, assembly, gene prediction and read mapping using IMP
Part 1:The first steps of this analysis are done by the Integrated Metaomics Pipeline IMP.
Curation of assembled contigs to remove artefactual sequences
Part 2:In this part of the analysis, a pre-binning of sample contigs with contigs from a contamination control sample is performed and contigs that cluster with the contaminant sequences are removed.
- fastaExtractCutRibosomal1000.pl
- calculateContigLength.pl
- 160920_autocluster_contamination_1000.R
- fastaExtractCutRibosomalNoCutoff.pl
- 160920_autocluster_contamination_1.R
- fastaExtractrRNA4dom.pl
Annotation of genes with KEGG orthologous groups (KOs) and counting of reads per KO
Part 3:Here, HMMs are used to annotate the predicted genes with KOs. The number of reads mapping to each KO in each sample are then counted for later differential analysis.
Binning of contigs into population-level genomes
Part 4:Contigs are binned using the algorithm developed for the MuSt study of type 1 diabetes. The scripts have been adapted for use with IMP output and previous curation of contigs.
- fastaExtractCutRibosomal1000.pl
- fastaExtractCutRibosomalNoCutoff.pl
- 160921_autoClust_noConta_evil_1000.R
- 160921_autoClust_noConta_evil_1.R
Linking of population-level genomes over different samples using phylogenetic marker genes
Part 5:In this step, the bins from the individual samples are connected based on the relatedness of the phylogenetic marker genes.
Taxonomic annotation of the bins
Part 6:We use PhyloPhlAn to find the taxonomy of every bin.
Linking strains and population-level genomes over different samples using SNV-patterns
Part 7:To find which strains are common to several samples based on reads and potentially linked to reconstructed genomes, we use StrainPhlAn.
Analysis of intra- and inter-population variability
Part 8:In this step, single nucleotide variants (SNVs) are used to examine the intra- and inter-population variability of the population-level genomes which were in common between different samples using Pogenom.
mOTU analysis
Part 9:To achieve a taxonomic overview, the relative abundances of metagenomic operational taxonomic units (mOTUs) were calculated from curated reads.