Skip to content
Snippets Groups Projects
Commit 61b65b0d authored by Laurent Heirendt's avatar Laurent Heirendt :airplane:
Browse files

Merge branch '2021-07-27_IT101-DM' into 'develop'

2021 07 27 it101 dm

See merge request !108
parents e9435d7e 10f66eac
No related branches found
No related tags found
2 merge requests!109[release] Regular merge of develop,!1082021 07 27 it101 dm
Pipeline #44678 passed
Showing
with 594 additions and 13 deletions
# Introduction
<div class="fragment" style="position:absolute">
<img height="450px" src="slides/img/wordcloud.png"><br>
## Learning objectives
* How to manage your data
* How to look and analyze your data
* Solving issues with computers
* Reproduciblity in the research data life cycle
</div>
<div class="fragment" style="position:relative;left:50%; width:40%">
<div >
<center>
<img height="405px" src="slides/img/rudi_balling.jpg"><br>
Prof. Dr. Rudi Balling, director
</center>
</div>
## Pertains to practically all people at LCSB
* Scientists
* PhD candidates
* Technicians
* Administrators
</div>
../../2021-04-20_IT101-DM/slides/list.json
\ No newline at end of file
[
{ "filename": "index.md" },
{ "filename": "introduction.md" },
{ "filename": "access_management.md" },
{ "filename": "data-introduction.md" },
{ "filename": "data_flow.md" },
{ "filename": "ingestion.md" },
{ "filename": "storage_setup.md" },
{ "filename": "data-housekeeping.md" },
{ "filename": "howtos.md" },
{ "filename": "reproducibility.md" },
{ "filename": "code_versioning.md" },
{ "filename": "visualization.md" },
{ "filename": "data_life_cycle.md" },
{ "filename": "problem_solving.md" },
{ "filename": "fair-principles.md" },
{ "filename": "r3_group.md" },
{ "filename": "thanks.md" }
]
\ No newline at end of file
../../2021-04-20_IT101-DM/slides/overview.md
\ No newline at end of file
## Overview
0. Introduction - learning objectives + targeted audience
1. Data workflow
1. Ingestion:
* receiving/sending/sharing data
* file naming
* checksums
* backup
1. making data tidy
* what is table
*
1. Learning to code workflows and analyses - excel files, coding
1. Code versioning and reproducibility
1. Visualization
* see the data
1. problem solving
* guide
* rubberducking
* google for help
* oracle
1. R3 team
1. Acknowledgment
1. data minimization
../../2021-04-20_IT101-DM/slides/reproducibility.md
\ No newline at end of file
# Reproducibility
* ensures credibility
* key requirement for follow-up and collaborative studies
<div style="position:absolute">
<img src="slides/img/reproducibility_nature.png" height="650px">
</div>
<div class="fragment" style="position:relative;left:50%">
## Why is our workflow not reproducible?
Lack of provenance:
* Input data downloaded from “some website”
* Copy & paste operations
* Manual text entry
* Analysis not coded
</div>
# Reproducibility
## Learning to code workflows and analyses
<div style="display:inline-grid;grid-gap: 40px;grid-template-columns: auto auto;position:relative;left:12%">
<div class="fragment">
<div class="content-box">
<div class="box-title red">Spreadsheets alone</div>
<div class="content">
* Is great for looking at data.
* Data entry is fast.
* Analysis flow is hidden and not in focus.
</div>
</div>
<div style="text-align:center">
<img src="slides/img/excel_data-sheet.png" height="280px">
</div>
</div>
<div class="fragment">
<div class="content-box">
<div class="box-title">Coding</div>
<div class="content">
* Is great for controlling analysis
* Data is hidden.
* Flow is visible.
</div>
</div>
<img src="slides/img/code-example.png" height="280px">
</div>
</div>
<div class="content-box fragment" style="left:15%;width:60%;position:relative">
<div class="box-title green">Develop data science skills</div>
<div class="content">
* Develop good data management and analysis habits.
* Start coding your analysis within spreadsheets.
* Make yourself familiar with a statistics environment such as R, Python or Matlab
* No need to learn a high level programming language such as C++ or Java.
</div>
</div>
</div>
# Table
<div style="position:absolute">
"Tabular format of data"
### Header
* one line!
* **good** names of columns
### Rows
* represent observations/entities
### Columns
* represent property of the observations
* one data type
</div>
<div style="left:50%; position:relative; top:-2em">
<img src="slides/img/excel_data-sheet.png" width="700px">
<div class="fragment" data-fragment-index="3" style="position:absolute">
<img src="slides/img/excel_analyses-sheet.jpeg" width="700px"><br>
</div>
<div class="fragment" data-fragment-index="4" style="position:relative">
<img src="slides/img/red-cross.png" width="700px"><br>
</div>
</div>
../../2021-04-20_IT101-DM/slides/storage_setup.md
\ No newline at end of file
# Storage set-up
* Download Anti-virus software
* Regularly update your SW/OS
* Encrypt movable media
<div class="fragment" >
### Backup
* take care of your own backups!
* don't work on your backup copy!
* minimum is <b>3-2-1 backup rule</b>
<div style="position:absolute;right:10%;top:10%">
<img src="slides/img/undraw_secure_server_s9u8.png" height="750px">
</div>
<div style="position:absolute; width:45%; left:50%; top:28em; text-align:right">
<a href=" https://howto.lcsb.uni.lu/?policies:LCSB-POL-BIC-02" style="color:grey; font-size:0.8em;">Data Storage and Backup Policy</a>
</div>
</div>
<div class="fragment">
### Passwords
* Strong passwords
* Password manager
* Safe password exchange channels
* Expiration time on password share
</div>
# Storage set-up
## Password exchange channels
<div style="position:relative">
<img src="slides/img/privateBin.png" height="350px">
</div>
<div style="position:absolute;left:65%;top:85%">
* Free service provided by LSCB at <a href="https://privatebin.lcsb.uni.lu" style="color:blue; font-size:0.8em;">privatebin.lcsb.uni.lu</a>
* **LUMS** account is required
* Set expiry period
* Can expire upon first access
* Password only accessible by sender and recipient
</div>
# Storage set-up
## Backup - Central IT/LCSB
<div style="position:relative">
<img src="slides/img/LCSB_storages_backed-up.png" height="750px">
</div>
<div style="position:absolute;left:65%;top:60%">
Server administrators take care of:
* server backups
* LCSB OwnCloud backups
* group/application server backups (not always)
</div>
# Storage set-up
## Backup - personal research data
<div style="position:relative">
<img src="slides/img/LCSB_storages_backup.png" height="750px">
</div>
<div style="position:absolute;left:55%;top:70%">
<font color="red">One version should reside on Atlas!</font>
</div>
../../2021-04-20_IT101-DM/slides/thanks.md
\ No newline at end of file
# Thank you.<sup> </sup>
<center><img src="slides/img/r3-training-logo.png" height="200px"></center>
<br>
<br>
<br>
<br>
<center>
Contact us if you need help:
<a href="mailto:lcsb-r3@uni.lu">lcsb-r3@uni.lu</a>
</center>
<div style="position:absolute">
Links:
HowTo Cards / Policies: https://howto.lcsb.uni.lu/
Course Slides: https://courses.lcsb.uni.lu/
Internal Presentations: https://presentations.lcsb.uni.lu/
LCSB GitLab: https://gitlab.lcsb.uni.lu/
HPC: https://hpc.uni.lu/
Service Portal: https://service.uni.lu/sp
LCSB intranet: https://intranet.uni.lux
</div>
<div style="position:relative;top:1.5em;left:55%;width:45%">
Avalable SW and tools:
<div style="margin-left: 20px;">
SIU managed:
&ensp; - Service Portal > All Catalogs > IT > Softwares
</div>
<div style="margin-left: 20px;">
LCSB managed:
&ensp; - Service Portal > Knowledge > FAQ - Corporate Software\
&ensp; - LCSB intranet > Science tab > Tools
</div>
</div>
../../2021-04-20_IT101-DM/slides/visualization.md
\ No newline at end of file
# Visualization
<center>
**Plot your data!**
<figure>
<img src="slides/img/DinoSequentialSmaller.gif" height="500px">
<blockquote>"never trust summary statistics alone; always visualize your data"</blockquote>
<figcaption>--Alberto Cairo</figcaption>
</figure>
</center>
# Visualization
<center>
**Plot your data!**
<figure>
<img src="slides/img/plot-data.png" height="800px">
</figure>
</center>
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment