Commit 830fd682 authored by Vilem Ded's avatar Vilem Ded Committed by Laurent Heirendt
Browse files

updating data-mail sending slide - removing sensitive data and use of cloud...

updating data-mail sending slide - removing sensitive data and use of cloud from reasons why is mail bad
parent 3c05119b
library(ggplot2)
library(data.table)
library(ggpubr)
data(iris)
iris <- data.table(iris)
iris <- iris[c(1:103)]
g1 <- ggplot(iris, aes(x = Species, y = Sepal.Length))+
geom_bar(aes(fill = Species),stat="summary", fun.y="mean" ) +guides(fill = F)+
ylim(c(0,8))
g2 <- ggplot(iris, aes(x = Species, y = Sepal.Length))+
geom_boxplot(aes(fill = Species))+
ylim(c(0,8))+guides(fill = F)
g3 <- ggplot(iris, aes(x = Species, y = Sepal.Length))+
geom_boxplot(aes(fill = Species))+
ylim(c(0,8))+ geom_point( position="jitter")+
guides(fill = F)
ggarrange(g1, g2, g3, nrow = 1)+ggsave(filename = "../plot-data.png", device = "png", width =12, height = 6)
# IT101 - Working with computers
<br>IT101 - Working with computers<br>
## November 7th, 2019
<div style="top: 6em; left: 0%; position: absolute;">
<img src="theme/img/lcsb_bg.png">
</div>
<div style="top: 5em; left: 60%; position: absolute;">
<img src="slides/img/r3-training-logo.png" height="200px">
<br><br><br><br>
<h3></h3>
<br><br><br>
<h4>
Vilem Ded<br>
Data Steward<br>
vilem.ded@uni.lu<br>
<i>Luxembourg Centre for Systems Biomedicine</i>
</h4>
</div>
# Data housekeeping
## Available data storage
<div class='fragment' style="position:absolute">
<img src="slides/img/LCSB_storages_full.png" height="750px">
</div>
<div class='fragment' style="position:absolute">
<img src="slides/img/LCSB_storages_personal-crossed.png" height="750px">
</div>
# Data ingestion/transfer
## Receiving and sending data
<img height="450px" style="position:relative;left:10%" src="slides/img/banned_exchange_channels.png"><br>
<div style="position:absolute; left:10%;width:30%">
## E-mail is not for data transfer
* Avoid transfer of any data by e-mail
* E-mail is a poor repository
* Data can be read on passage
</div>
<div class="fragment" style="left:50%; width:30%; position:absolute">
## Exchanging data
* Share on Atlas server
* OwnCloud share (LCSB - BioCore)
* DropIt service (SIU)
* DUMA (Aspera) share for sensitive data
</div>
</div>
# Data ingestion/transfer
Data can be corrupted:
* (non-)malicious modification
* faulty file transfer
* disk corruption
<div class="fragment">
### Solution
* disable write access to the source data
* Generate checksums!
<div style="position:absolute;left:40%">
<img src="slides/img/checksum.png" width="500px">
</div>
</div>
<div class="fragment" style="position:relative; left:0%">
## When to generate checksums?
* before data transfer
- new dataset from collaborator
- upload to remote repository
* long term storage
- master version of dataset
- snapshot of data for publication
</div>
# Introduction
<div class="fragment" style="position:absolute">
<img height="450px" src="slides/img/wordcloud.png"><br>
## Learning objectives
* How to manage your data
* How to look and analyze your data
* Solving issues with computers
* Reproduciblity in the research data life cycle
</div>
<div class="fragment" style="position:relative;left:50%; width:40%">
<div >
<center>
<img height="405px" src="slides/img/rudi_balling.jpg"><br>
Prof. Dr. Rudi Balling, director
</center>
</div>
## Pertains to practically all people at LCSB
* Scientists
* PhD candidates
* Technicians
* Administrators
</div>
[
{"filename": "index.md"},
{"filename": "introduction.md"},
{"filename": "data-introduction.md"},
{"filename": "data_flow.md"},
{"filename": "ingestion.md"},
{"filename": "storage_setup.md"},
{"filename": "data-housekeeping.md"},
{"filename": "reproducibility.md"},
{"filename": "code_versioning.md"},
{"filename": "visualization.md"},
{"filename": "problem_solving.md"},
{"filename": "fair-principles.md"},
{"filename": "r3_group.md"},
{"filename": "howtos.md"},
{"filename": "thanks.md"}
]
## Overview
0. Introduction - learning objectives + targeted audience
1. Data workflow
1. Ingestion:
* receiving/sending/sharing data
* file naming
* checksums
* backup
1. making data tidy
* what is table
*
1. Learning to code workflows and analyses - excel files, coding
1. Code versioning and reproducibility
1. Visualization
* see the data
1. problem solving
* guide
* rubberducking
* google for help
* oracle
1. R3 team
1. Acknowledgment
1. data minimization
# Problem solving
A guide for solving computing issues
1. Express the problem
* Write down what you want to achieve
2. Search for help
* Read **FAQs**, **help pages** and the **official documentation** well before turning to Google
* Use stack exchange, forums and related resources carefully
3. Ask an expert
# Problem solving
## Write to the Oracle
* The Oracle gives the precise answer to your problems
* You have to submit the problem in writing
* The Oracle answers a questions only once or if it finds the problem interesting
* If you supply a trivial problem, it will stop answering
* Available Oracles
* Service Now @ [service.uni.lu] (Uni and LCSB helpdesk)
* [Stack Overflow](https://stackoverflow.com/) and other online sites
* Local experts
# Responsible and Reproducible Research (R<sup>3</sup>)
## What is R<sup>3</sup>?
A multi-facetted change management
process built on 3 pillars:
- R3 pathfinder
- R3 school
- R3 accelerator
Common link module: R3 clinic
<div style="top: -1em; left: 50%; position: absolute;">
<img src="slides/img/3pillars-full.png">
</div>
<br>
<br>
<br>
<br>
<aside class="notes">
Pathfinder - policies, data management changes,
School - courses, howtos,
Accelerator - advanced teams and their boost/support
Clinic - hands-on, meetings in groups, code review + suggestions
</aside>
## R<sup>3</sup> Training
* LCSB's Monthly Data Management and Data Protection training
* ELIXIR Luxembourg's Research data management and stewardship training - 27th Jan 2020
* R<sup>3</sup> school Git basics - every 4 months
<aside class="notes">
Direct newcommers to this monthly training
</aside>
# Responsible and Reproducible Research (R<sup>3</sup>)
<center><img src="slides/img/r3-training-logo.png" height="200px"></center>
Your R<sup>3</sup> contacts:
<div style="display:block;text-align:center;position:relative">
<div class="profile-container">
* Christophe Trefois
* <img src="slides/img/R3_profile_pictures/christophe_trefois.png">
* R<sup>3</sup> coordination
</div>
<div class="profile-container">
* Venkata Satagopam
* <img src="slides/img/R3_profile_pictures/venkata_satagopam.png">
* R<sup>3</sup> Core
</div>
<div class="profile-container">
* Reinhard Schneider
* <img src="slides/img/R3_profile_pictures/reinhard_schneider.png">
* Head of Bioinformatics Core
</div>
<div class="profile-container">
* Pinar Alper
* <img src="slides/img/R3_profile_pictures/pinar_alper.png">
* Data steward
</div>
<div class="profile-container">
* Yohan Yarosz</li>
* <img src="slides/img/R3_profile_pictures/yohan_yarosz.png">
* R<sup>3</sup> Core
</div>
<div class="profile-container">
* Laurent Heirendt</li>
* <img src="slides/img/R3_profile_pictures/laurent_heirendt.png">
* Git, CI
</div>
<div class="profile-container">
* Wei Gu</li>
* <img src="slides/img/R3_profile_pictures/wei_gu.png">
* R<sup>3</sup> Core
</div>
<div class="profile-container">
* Sarah Peter</li>
* <img src="slides/img/R3_profile_pictures/sarah_peter.png">
* HPC
</div>
<div class="profile-container">
* Vilem Ded</li>
* <img src="slides/img/R3_profile_pictures/vilem_ded.png">
* Data steward
</div>
<div class="profile-container">
* Noua Toukourou</li>
* <img src="slides/img/R3_profile_pictures/noua_toukourou.png">
* R<sup>3</sup> Core
</div>
<div class="profile-container">
* Alexey Kolodkin</li>
* <img src="slides/img/R3_profile_pictures/alexey_kolodkin.png">
* Data steward
</div>
<div class="profile-container">
* Maharshi Vyas</li>
* <img src="slides/img/R3_profile_pictures/maharshi_vyas.png">
* R<sup>3</sup> Core
</div>
</div>
# Reproducibility
* ensures credibility
* key requirement for follow-up and collaborative studies
<div style="position:absolute">
<img src="slides/img/reproducibility_nature.png" height="650px">
</div>
<div class="fragment" style="position:relative;left:50%">
## Why is our workflow not reproducible?
Lack of provenance:
* Input data downloaded from “some website”
* Copy & paste operations
* Manual text entry
* Analysis not coded
* Intermediate and final data not generated by deterministic processes
</div>
# Reproducibility
## Learning to code workflows and analyses
<div style="display:inline-grid;grid-gap: 40px;grid-template-columns: auto auto;position:relative;left:12%">
<div class="fragment">
<div class="content-box">
<div class="box-title red">Excel alone</div>
<div class="content">
* Is great for looking at data.
* Data entry is fast.
* Analysis flow is hidden and not in focus.
</div>
</div>
<img src="slides/img/excel_data-sheet.png" height="350px">
</div>
<div class="fragment">
<div class="content-box">
<div class="box-title">Coding</div>
<div class="content">
* Is great for controlling analysis
* Data is hidden.
* Flow is visible.
</div>
</div>
<img src="slides/img/code-example.png" height="350px">
</div>
</div>
<div class="content-box fragment" style="width:70%;margin:0 auto;">
<div class="box-title green">Develop data science skills</div>
<div class="content">
* Develop good data management and analysis habits.
* Start coding your analysis within Excel.
* Make yourself familiar with a statistics environment such as R, Python or Matlab
* No need to learn a high level programming language such as C++ or Java.
</div>
</div>
</div>
# Table
<div style="position:absolute">
"Tabular format of data"
### Header
* one line!
* **good** names of columns
### Rows
* represent observations/entities
### Columns
* represent property of the observations
* one data type
</div>
<div style="left:50%; position:relative; top:-2em">
<img src="slides/img/excel_data-sheet.png" width="600px">
<div class="fragment" data-fragment-index="3" style="position:absolute">
<img src="slides/img/excel_analyses-sheet.jpeg" width="600px"><br>
</div>
<div class="fragment" data-fragment-index="4" style="position:relative">
<img src="slides/img/red-cross.png" width="600px"><br>
</div>
</div>
# Storage set-up
* Download Anti-virus software
* Encrypt movable media
### Backup
* take care of your own backups!
* don't work on your backup copy!
* minimum is <b>3-2-1 backup rule</b>
<div class="fragment" style="display:inline-grid;grid-template-columns: repeat(3, 1fr);">
<div class="content-box">
<div class="box-title green">Personal research data</div>
<div class="content">
* Working documents on your laptop
* Online share (DropIt, OwnCloud)
* Copy on an external hard drive
</div>
</div>
<div class="content-box">
<div class="box-title blue">Group research data</div>
<div class="content">
* Research data generated in group
* Back-up by central IT
* Geo-resilient copy
</div>
</div>
</div>
# Storage set-up
## Backup - Central IT/LCSB
<div style="position:relative">
<img src="slides/img/LCSB_storages_backed-up.png" height="750px">
</div>
<div style="position:absolute;left:65%;top:60%">
Server administrators take care of:
* server backups
* LCSB OwnCloud backups
* group/application server backups (not always)
</div>
# Storage set-up
## Backup - personal research data
<div style="position:relative">
<img src="slides/img/LCSB_storages_backup.png" height="750px">
</div>
<div style="position:absolute;left:55%;top:70%">
<font color="red">One version should reside on Atlas!</font>
</div>
# The Server is your friend!
* Allows a consistent backup policy for your datasets
* Keeps number of copies to minimum
* Specification of clear access rights
* High accessibility
* Data are discoverable
* Server can't be stolen
* ...
# Thank you.<sup> </sup>
<center><img src="slides/img/r3-training-logo.png" height="200px"></center>
<br>
<br>
<br>
<br>
<center>
Contact us if you need help:
<a href="mailto:r3lab.core@uni.lu">r3lab.core@uni.lu</a>
</center>
# Visualization
<center>
**Plot your data!**
<figure>
<img src="slides/img/DinoSequentialSmaller.gif" height="500px">
<blockquote>"never trust summary statistics alone; always visualize your data"</blockquote>
<figcaption>--Alberto Cairo</figcaption>
</figure>
</center>
# Visualization
<center>
**Plot your data!**
<figure>
<img src="slides/img/plot-data.png" height="800px">
</figure>
</center>
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment