Commit 53aad4aa authored by Leon-Charles Tranchevent's avatar Leon-Charles Tranchevent
Browse files

Added a function to run a quality control test.

parents
.Rproj.user/*
.Rbuildignore
ArrayUtils.Rproj
man/*
Package: ArrayUtils
Type: Package
Title: Utils For Array Processing
Version: 0.0.1
Author: Leon-Charles Tranchevent
Maintainer: Leon-Charles Tranchevent <leon-charles.tranchevent@uni.lu>
Description: This package contains functions to analyse microarray data.
It is more a set of useful functions than a real package.
License: The Unlicense
Encoding: UTF-8
LazyData: true
Imports:
utils,affy,Biobase,arrayQualityMetrics
RoxygenNote: 6.1.1
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to <http://unlicense.org>
exportPattern("^[[:alpha:]]+")
#' @title Executes a quality control of a given micro-array dataset.
#' @description This function executes a quality control of the dataset defined by the input parameters.
#' It starts by loading the clinical data associated with the dataset, then loads the raw data and
#' last runs the quality control of the annotated data.
#'
#' The function assumes that the dataset is associated with a unique name (e.g., GEO identifier)
#' and that a folder with this name exists and contains both the clinical data ('ClinicalData.tsv')
#' and the raw data (as cel files in a '/RAW/' folder).
#'
#' It then creates a report that contains various quality indicators and is stored as an HTML
#' document. It does not return any value.
#'
#' Note: the function does not check for the existence of folders and files (not a real package
#' function).
#'
#' @param dataset_name A string representing the name of the dataset to analyse.
#' @param raw_data_dir A string representing the folder that contains the input data.
#' @param output_data_dir A string representing the folder that contains the output data.
#' @param compressed A boolean representing whether the raw data are compressed or not. This is
#' TRUE by default.
#' @param phenotype_groups A list of phenotype factor names that can be used to highlight the
#' samples in the QC report. This is none by default.
#' @param verbose A boolean representing whether the function should display log information. This
#' is TRUE by default.
#' @return NULL
run_quality_control <- function(dataset_name,
raw_data_dir,
output_data_dir,
compressed = TRUE,
phenotype_groups = c(),
verbose = TRUE) {
# We define the I/Os.
clinical_data_file <- paste0(raw_data_dir, dataset_name, "/", "ClinicalData.tsv")
raw_data_input_dir <- paste0(raw_data_dir, dataset_name, "/", "RAW/")
data_output_dir <- paste0(output_data_dir, dataset_name, "/")
# We load the clinical data as to annotate the AffyBatch object and make QC more useful.
pheno_data <- Biobase::AnnotatedDataFrame(utils::read.delim(file = clinical_data_file,
row.names = 1,
colClasses = "factor"))
# We clean up and log information.
rm(clinical_data_file)
if (verbose == TRUE) {
message(paste0("[", Sys.time(), "][", dataset_name, "] Phenotypic data read."))
}
# We load the CEL files to create the affyBatch object and then attach the clinical data.
raw_file_list <- affy::list.celfiles(raw_data_input_dir, full.names = TRUE)
batch <- affy::ReadAffy(filenames = raw_file_list, compress = compressed, verbose = verbose)
Biobase::phenoData(batch) <- pheno_data
# We clean up and log information.
rm(raw_file_list, pheno_data, raw_data_input_dir)
if (verbose == TRUE) {
message(paste0("[", Sys.time(), "][", dataset_name, "] Expression data read."))
}
# We run the quality control itself.
arrayQualityMetrics::arrayQualityMetrics(expressionset = batch,
outdir = data_output_dir,
force = TRUE,
do.logtransform = TRUE,
intgroup = phenotype_groups)
# We clean up and log information.
rm(data_output_dir, batch)
if (verbose == TRUE) {
message(paste0("[", Sys.time(), "][", dataset_name, "] QC analysis performed."))
}
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment