Skip to content
Snippets Groups Projects
Commit 370143de authored by Vilem Ded's avatar Vilem Ded
Browse files

2022-04-05_IT101-DM

parent 7dd440ee
No related branches found
No related tags found
2 merge requests!119[release] Regular merge of develop,!1152022-04-05_IT101-DM
Showing
with 588 additions and 0 deletions
../../../../2021/2021-07-27_IT101-DM/slides/img/rdm-cycle.png
\ No newline at end of file
../../../../2021/2021-07-27_IT101-DM/slides/img/red-cross.png
\ No newline at end of file
../../../../2021/2021-07-27_IT101-DM/slides/img/reproducibility_nature.png
\ No newline at end of file
../../../../2021/2021-07-27_IT101-DM/slides/img/rudi_balling.jpg
\ No newline at end of file
../../../../2021/2021-07-27_IT101-DM/slides/img/scripts/
\ No newline at end of file
../../../../2021/2021-07-27_IT101-DM/slides/img/undraw_secure_server_s9u8.png
\ No newline at end of file
../../../../2021/2021-07-27_IT101-DM/slides/img/wordcloud.png
\ No newline at end of file
# DM101 - Basics to data management
## Apr 05th, 2022
<div style="top: 6em; left: 0%; position: absolute;">
<img src="theme/img/lcsb_bg.png">
</div>
<div style="top: 5em; left: 60%; position: absolute;">
<img src="slides/img/r3-training-logo.png" height="200px">
<br><br><br><br>
<h3></h3>
<br><br><br>
<h4>
Marina Popleteeva/Vilem Ded<br>
Data Stewards<br>
lcsb-datastewards@uni.lu<br>
<i>Luxembourg Centre for Systems Biomedicine</i>
</h4>
</div>
# Data transfer and integrity
* When sending data: <font color="red">Do not use emails, use secure platforms (Cloud, Aspera, Atlas share...)!</font>
<div class="fragment">
Data can be corrupted:
* (non-)malicious modification
* faulty file transfer
* disk corruption
</div>
<div class="fragment">
### Solution
* disable write access to the source data
* generate checksums!
<div style="position:absolute;left:40%;top:30%">
<img src="slides/img/checksum.png" width="500px">
</div>
</div>
<div class="fragment" style="position:relative; left:0%">
## When to generate checksums?
* before data transfer
- new dataset from collaborator
- upload to remote repository
* long term storage
- master version of dataset
- snapshot of data for publication
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
</div>
# Data transfer and integrity
## Encryption
<div style="position:relative;left:25%;top:60%">
<img align="middle" height="300px" src="slides/img/encryption.png">
</div>
<div class='fragment'>
- Guaranted confidentiality
</div>
<div class='fragment'>
- Encryption key need to be kept safe
</div>
<div class='fragment'>
- <font color='red'> Loosing your encryption key means loosing your data! </font>
</div>
<div class='fragment'>
- When a Master copy of the LCSB Research Data is encrypted, the encryption key <font color= red>must be shared with the Data Custodian</font> or authorized system administrator.
</div>
# Password exchange channels
<div style="position:relative">
<img src="slides/img/privateBin.png" height="350px">
</div>
<div style="position:absolute;left:65%;top:85%">
* Free service provided by LSCB at <a href="https://privatebin.lcsb.uni.lu" style="color:blue; font-size:0.8em;">privatebin.lcsb.uni.lu</a>
* **LUMS** account is required
* Set expiry period
* Can expire upon first access
* Password only accessible by sender and recipient
</div>
# Introduction
<div class="fragment" style="position:absolute">
## Learning objectives
* How to manage your data
* How to look and analyze your data
* Reproduciblity in the research data life cycle
</div>
<div class="fragment" style="position:relative;top:80%;left:60%">
## Pertains to practically all people at LCSB
* Scientists
* PhD candidates
* Technicians
* Administrators
</div>
<center>
<img height="450px" src="slides/img/wordcloud.png"><br>
</center>
[
{ "filename": "index.md" },
{ "filename": "introduction.md" },
{ "filename": "access_management.md" },
{ "filename": "data-introduction.md" },
{ "filename": "data_flow.md" },
{ "filename": "available-data-storage.md"},
{ "filename": "physical_security.md" },
{ "filename": "storage_setup.md" },
{ "filename": "ingestion.md" },
{ "filename": "data-housekeeping.md" },
{ "filename": "howtos.md" },
{ "filename": "reproducibility.md" },
{ "filename": "code_versioning.md" },
{ "filename": "visualization.md" },
{ "filename": "data_life_cycle.md" },
{ "filename": "fair-principles.md" },
{ "filename": "r3_group.md" },
{ "filename": "thanks.md" }
]
## Overview
0. Introduction - learning objectives + targeted audience
1. Data workflow
1. Ingestion:
* receiving/sending/sharing data
* file naming
* checksums
* backup
1. making data tidy
* what is table
*
1. Learning to code workflows and analyses - excel files, coding
1. Code versioning and reproducibility
1. Visualization
* see the data
1. problem solving
* guide
* rubberducking
* google for help
* oracle
1. R3 team
1. Acknowledgment
1. data minimization
# Physical Security
<div >
"<center>*Physical security describes security measures that are designed to deny unauthorized access to facilities, equipment and resources and to protect personnel and property from damage or harm (such as espionage, theft, or terrorist attacks)* </center>"
<center> <img height="230px" src="slides/img/physical_security.jpg"> </center>
<div style="position:absolute;top:30%;left:2%">
## LCSB offices
* Rouden Eck offices are locked by default
* Technical measures exist to individually control access to the building
* Physical access is limited to minimal authorized personnel
* Return of access badge is required when personnel contract is terminated
* Access to the data center requires approval(CIO)
* Visitors and external personnel acesss are monitoring
</div>
<div style="position:relative;top:80%;left:60%">
## Home Office, new security challenges
* Separate your work life from your home life
* Secure your home office
* Secure your home router
* Use VPN to access university applications.
* Encrypt your devices
* Keep your operating systems up to date
* Enable Automatic locking
</div>
</div>
# Problem solving
A guide for solving computing issues
1. Express the problem
* Write down what you want to achieve
2. Search for help
* Read **FAQs**, **help pages** and the **official documentation** well before turning to Google
* Use stack exchange, forums and related resources (carefully)
3. Ask an expert
* Submit the problem in writing
* Make the question interesting
# Responsible and Reproducible Research (R<sup>3</sup>)
## What is R<sup>3</sup>?
A multi-facetted change management
process built on 3 pillars:
- R3 pathfinder
- R3 school
- R3 accelerator
Common link module: R3 clinic
<br>
<br>
Contact via ServiceNow
<div style="top: -1em; left: 50%; position: absolute;">
<img src="slides/img/3pillars-full.png">
</div>
<aside class="notes">
Pathfinder - policies, finding optimal data management changes<br>
School - courses, howtos, trainnings<br>
Accelerator - advanced teams and their boost/support, CI/CD setup<br>
Clinic - hands-on, meetings in groups, code review + suggestions<br>
</aside>
## R<sup>3</sup> Training
* LCSB's Monthly Data Management and Data Protection training
* ELIXIR Luxembourg's trainings <br>
https://elixir-luxembourg.org/training
* R<sup>3</sup> school Git basics - every 4 months
* Best Practices in Research Data Management and Stewardship (twice a year)
<aside class="notes">
Direct newcomers to this monthly training
</aside>
# Responsible and Reproducible Research (R<sup>3</sup>)
<section data-transition="none" data-background-image="slides/img/r3-training-logo.png" data-background-size="1000px" data-background-opacity="0.1">
</section>
<div style="display:block;text-align:center;position:relative;" >
<div class="profile-container">
* Reinhard Schneider
* <img src="slides/img/R3_profile_pictures/reinhard_schneider.png">
* Head of Bioinformatics Core
</div>
<div class="profile-container">
* Pinar Alper
* <img src="slides/img/R3_profile_pictures/pinar_alper.png">
* Datasteward
</div>
<div class="profile-container">
* Yohan Yarosz</li>
* <img src="slides/img/R3_profile_pictures/yohan_yarosz.png">
* Development
</div>
<div class="profile-container">
* Laurent Heirendt</li>
* <img src="slides/img/R3_profile_pictures/laurent_heirendt.png">
* Git, CI
</div>
<div class="profile-container">
* Sarah Peter</li>
* <img src="slides/img/R3_profile_pictures/sarah_peter.png">
* Infrastructure
</div>
<div class="profile-container">
* Valentin Grouès</li>
* <img src="slides/img/R3_profile_pictures/valentine_groues.png">
* Development
</div>
<div class="profile-container">
* Vilem Ded</li>
* <img src="slides/img/R3_profile_pictures/vilem_ded.png">
* Datasteward
</div>
<div class="profile-container">
* Noua Toukourou</li>
* <img src="slides/img/R3_profile_pictures/noua_toukourou.png">
* Infrastructure
</div>
<div class="profile-container">
* Alexey Kolodkin</li>
* <img src="slides/img/R3_profile_pictures/alexey_kolodkin.png">
* Datasteward
</div>
<div class="profile-container">
* Maharshi Vyas</li>
* <img src="slides/img/R3_profile_pictures/maharshi_vyas.png">
* Infrastructure
</div>
<div class="profile-container">
* Nene Barry</li>
* <img src="slides/img/R3_profile_pictures/nene_barry.png">
* Datasteward
</div>
<div class="profile-container">
* Karim Chaouch</li>
* <img src="slides/img/R3_profile_pictures/karim_chaouch.png">
* Development
</div>
<div class="profile-container">
* Christophe Trefois
* <img src="slides/img/R3_profile_pictures/christophe_trefois.png">
* R<sup>3</sup> team lead
</div>
</div>
# Reproducibility
* ensures credibility
* key requirement for follow-up and collaborative studies
<div style="position:absolute">
<img src="slides/img/reproducibility_nature.png" height="650px">
</div>
<div class="fragment" style="position:relative;left:50%">
## Why is our workflow not reproducible?
Lack of provenance:
* Input data downloaded from “some website”
* Copy & paste operations
* Manual text entry
* Analysis not coded
</div>
# Reproducibility
## Learning to code workflows and analyses
<div style="display:inline-grid;grid-gap: 40px;grid-template-columns: auto auto;position:relative;left:12%">
<div class="fragment">
<div class="content-box">
<div class="box-title red">Spreadsheets alone</div>
<div class="content">
* Is great for looking at data.
* Data entry is fast.
* Analysis flow is hidden and not in focus.
</div>
</div>
<div style="text-align:center">
<img src="slides/img/excel_data-sheet.png" height="280px">
</div>
</div>
<div class="fragment">
<div class="content-box">
<div class="box-title">Coding</div>
<div class="content">
* Is great for controlling analysis
* Data is hidden.
* Flow is visible.
</div>
</div>
<img src="slides/img/code-example.png" height="280px">
</div>
</div>
<div class="content-box fragment" style="left:15%;width:60%;position:relative">
<div class="box-title green">Develop data science skills</div>
<div class="content">
* Develop good data management and analysis habits.
* Start coding your analysis within spreadsheets.
* Make yourself familiar with a statistics environment such as R, Python or Matlab
* No need to learn a high level programming language such as C++ or Java.
</div>
</div>
</div>
# Table
<div style="position:absolute">
"Tabular format of data"
### Header
* one line!
* **good** names of columns
### Rows
* represent observations/entities
### Columns
* represent property of the observations
* one data type
</div>
<div style="left:50%; position:relative; top:-2em">
<img src="slides/img/excel_data-sheet.png" width="700px">
<div class="fragment" data-fragment-index="3" style="position:absolute">
<img src="slides/img/excel_analyses-sheet.jpeg" width="700px"><br>
</div>
<div class="fragment" data-fragment-index="4" style="position:relative">
<img src="slides/img/red-cross.png" width="700px"><br>
</div>
</div>
# Storage set-up
* Download Anti-virus software
* Regularly update your SW/OS
* Encrypt movable media
<div class="fragment" >
### Backup
* take care of your own backups!
* don't work on your backup copy!
* minimum is <b>3-2-1 backup rule</b>
<div style="position:absolute;right:10%;top:10%">
<img src="slides/img/undraw_secure_server_s9u8.png" height="750px">
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
</div>
</div>
<div class="fragment">
### Passwords
* Strong passwords
* Password manager
* Safe password exchange channels
* Expiration time on password share
</div>
# Storage set-up
## Backup - Central IT/LCSB
<div style="position:relative">
<img src="slides/img/LCSB_storages_backed-up.png" height="750px">
</div>
<div style="position:absolute;left:65%;top:60%">
Server administrators take care of:
* server backups
* LCSB OwnCloud backups
* group/application server backups (not always)
</div>
# Storage set-up
## Backup - personal research data
<div style="position:relative">
<img src="slides/img/LCSB_storages_backup.png" height="750px">
</div>
<div style="position:absolute;left:55%;top:70%">
<font color="red">One version of data should reside on Atlas!</font>
</div>
# Thank you.<sup> </sup>
<center><img src="slides/img/r3-training-logo.png" height="200px"></center>
<br>
<br>
<br>
<br>
<center>
Contact us if you need help:
<a href="mailto:lcsb-r3@uni.lu">lcsb-r3@uni.lu</a>
</center>
<div style="position:absolute">
Links:
HowTo Cards / Policies: https://howto.lcsb.uni.lu/
Course Slides: https://courses.lcsb.uni.lu/
Internal Presentations: https://presentations.lcsb.uni.lu/
LCSB GitLab: https://gitlab.lcsb.uni.lu/
HPC: https://hpc.uni.lu/
Service Portal: https://service.uni.lu/sp
LCSB intranet: https://intranet.uni.lux
</div>
<div style="position:relative;top:1.5em;left:55%;width:45%">
Avalable SW and tools:
<div style="margin-left: 20px;">
SIU managed:
&ensp; - Service Portal > All Catalogs > IT > Softwares
</div>
<div style="margin-left: 20px;">
LCSB managed:
&ensp; - Service Portal > Knowledge > FAQ - Corporate Software\
&ensp; - LCSB intranet > Science tab > Tools
</div>
</div>
# Visualization
<center>
**Plot your data!**
<figure>
<img src="slides/img/DinoSequentialSmaller.gif" height="500px">
<blockquote>"never trust summary statistics alone; always visualize your data"</blockquote>
<figcaption>--Alberto Cairo</figcaption>
</figure>
</center>
# Visualization
<center>
**Plot your data!**
<figure>
<img src="slides/img/plot-data.png" height="800px">
</figure>
</center>
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment