Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Gitlab will go into maintenance Friday 3rd February from 9:00 to 10:00
Open sidebar
R3
school
courses
Commits
5d3d7400
Commit
5d3d7400
authored
Aug 03, 2020
by
Vilem Ded
Browse files
2020-09-08_IT101-DM
parent
41f54564
Changes
18
Hide whitespace changes
Inline
Side-by-side
2020/2020-09-08_IT101-DM/slides/code_versioning.md
0 → 100644
View file @
5d3d7400
# Code versioning
<div
style=
"position:absolute; width:40%"
>
**git**
*
Current standard for code versioning
*
Maintain versions of your code as it develops
*
Local system, which does not require an online repository
*
Repositories allow distributed development
<img
align=
"middle"
height=
"300px"
src=
"slides/img/Git-logo.png"
>
</div>
<div
class=
"fragment"
style=
"position:absolute; left:50%; width:40%"
"
>
**git@lcsb**
*
Recommended, supported repository
*
Allows tracking of issues
*
Ready for continous integration - code checked on commits to the repository.
*
[
https://git-r3lab.uni.lu
](
https://git-r3lab.uni.lu
)
**Use at LCSB**
*
All analyses code should be in a repository
*
Minimally at submission of a manuscript
*
Better daily
*
Even better "analyses chunkwise"
</div>
<aside
class=
"notes"
>
Policy! - code in central repository
</aside>
2020/2020-09-08_IT101-DM/slides/data-housekeeping.md
0 → 100644
View file @
5d3d7400
# Data housekeeping
## File names
<div
style=
"display:flex; position:static; width:100%"
>
<div
class=
"fragment"
data-fragment-index=
"0"
style=
"position:static; width:30%"
>
### General pricinples
*
Machine readable
*
Human readable
*
Plays well with default ordering
</div>
<div
class=
"fragment"
data-fragment-index=
"1"
style=
"position:absolute; left:33%; width:30%"
>
### Separators
*
No spaces
*
Underscore to separate
*
Hyphen to combine
</div>
<div
class=
"fragment"
data-fragment-index=
"2"
style=
"position:absolute; left:66%; width:30%"
>
### Date format follows **ISO 8601**<br>
2018-12-03
<br>
2018-12-06_1700
</div>
</div>
<div
class=
"fragment"
data-fragment-index=
"3"
style=
"width:100%; position:static"
>
<div
style=
"position:absolute;width:55%"
>
<b>
Bad
</b>
names
```
bash
PhD-project-Jan19 alldata_final.foo
Finacial detailes BIocore 19/11/12.xls
ATACseq1Londonmapped.bam
Hlad.jez.M-L-průtoky JíObj.z Ohře-od 10-2011.xlsx
```
</div>
<div
style=
"position:relative;width:55%; bottom:20%; left:50%"
>
<b>
Good
</b>
names
```
bash
Iris-setosa_samples_1927-05-12.csv
PI102_Mouse12_EEG_2018-11-03_1245.tsv
Bioinfiniti_FullProposal_2018-11-15_1655.do
```
</div>
</div>
<br>
<br>
<div
class=
"fragment"
data-fragment-index=
"3"
style=
"width:100%;"
>
From Jenny Bryan by CC-BY
(https://speakerdeck.com/jennybc/how-to-name-files)
</div>
# Data housekeeping
## File organization
*
Have folder organization conventions for your
**group**
*
Per Paper
*
Per Study/Project
*
Per Collaborator
*
Keep
<b>
readme files
</b>
for data
*
Title
*
Date of Creation/Receipt
*
Instrument or software specific information
*
People involved
*
Relations between multiple files/folders
*
Separate files you are actively working from the old ones
*
Orient newcomers to the group's conventions
# Data housekeeping
<div
style=
"position:absolute"
>
## When working
*
Clarify and separate source and intermediate data
*
Keep data copies to a
**minimum**
*
Cleanup post-analysis
*
Cleanup copies created for presentations or for sharing
*
Handover data to a new responsible when leaving
</div>
<div
style=
"position:relative;left:50%; width:40%"
>
<img
src=
"slides/img/cleaning-table.jpg"
height=
"450px"
>
</div>
# Data housekeeping
## End of project
*
data should be kept as a single copy on server-side storage
*
no copies on desktops and external devices
*
non-proprietary formats
*
minimal metadata:
*
source
*
context of generation
*
data structure
*
content
*
sensitive data (e.g. whole genome)
**must**
be encrypted
<br/>
<br/>
*
If not specified otherwise, data must be kept for
**10 years**
following project end for reproducibility purposes
<aside
class=
"notes"
>
Note: sometimes it is hard to find/understand dataset 10 days old
</aside>
## In doubt on data archival?
Contact R
<sup>
3
</sup>
for support on archival of datasets using tickets:
*
https://service.uni.lu/sp
*
Home > Catalog > LCSB > Biocore: Application services > Request for: Support
# Data housekeeping - Summary
## Server is your friend!
*
Allows a consistent backup policy for your datasets
*
Keeps number of copies to minimum
*
Specification of clear access rights
*
High accessibility
*
Data are discoverable
*
Server can't be stolen
## General guidelines
*
Use institutional media for storage of
**all**
data
*
Research data (particularly sensitive data) should be in a single source location
*
Enable encryption for data stored on movable media
*
Clarify and separate source and intermediate data
*
Disable write access to relevant source data (read-only)
*
Backup research data!
*
Download Anti-virus software
*
Generate checksums
2020/2020-09-08_IT101-DM/slides/data-introduction.md
0 → 100644
View file @
5d3d7400
# Data and metadata
<div
style=
"display:grid;grid-gap:100px;grid-template-columns: 40% 40%"
>
<div
>
## Data
*
"
*information in digital form that can be transmitted or processed*
"
<p
align=
"right"
>
-- Merriam-Webster dictionary
</p>
*
"
*information in an electronic form that can be stored and processed by a computer*
"
<p
align=
"right"
>
--Cambridge dictionary
</p>
</div>
<div>
## Metadata
*
data describing other data
*
information that is given to describe or help you use other information
*
metadata are data
*
can be processed and analyzed
</div>
</div>
<div
class=
"fragment"
>
## Metadata examples:
<div
style=
"position:absolute"
>
<ul>
<li>
LabBook
</li>
<li>
author/owner of the data
</li>
<li>
origin of the data
<li>
data type
</ul>
</div>
<div
style=
"position:absolute;left:25%"
>
<ul>
<li>
description of content
</li>
<li>
modification date
</li>
<li>
description of modification
</li>
<li>
location
</li>
</ul>
</div>
<div
style=
"position:relative;left:50%;top:0.7em"
>
<ul>
<li>
calibration readings
</li>
<li>
software/firmware version
</li>
<li>
data purpose
</li>
<li>
means of creation
</li>
</ul>
</div>
</div>
<div
class=
"fragment"
>
<br>
</center>
<center
style=
"color:red"
>
!Insufficient metadata make the data useless!
</center>
</div>
<aside
class=
"notes"
>
Sometimes metadata collection takes more time than data collection
</aside>
# LCSB research data
three categories:
*
**Primary data**
*
scientific data
*
measurements, images, observations, notes, surveys, ...
*
models, software codes, libraries, ...
*
metadata directly describing the data
*
data dictionaries
*
format, version, coverage descriptions, ...
*
**Research record**
*
description of the research process, including experiment
*
experiment set-up
*
followed protocols
*
...
*
**Project accompanying documentation**
*
ethical approvals, information on the consent)
*
collaboration agreements
*
intellectual property ownership
*
other relevant documentation
2020/2020-09-08_IT101-DM/slides/data_flow.md
0 → 100644
View file @
5d3d7400
# Typical flow of data
<div style="display:grid;grid-gap:10px;grid-template-columns: 30% 20% 30%;
grid-auto-flow:column;grid-template-rows: repeat(4,auto);position:relative;left:8%">
<div
class=
"content-box fragment"
data-fragment-index=
"1"
>
<div
class=
"box-title red"
>
Source data
</div>
<div
class=
"content"
>
*
Experimental results
*
Large data sets
*
Manually collected data
*
External
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"2"
>
<div
class=
"box-title yellow"
>
Intermediate
</div>
<div
class=
"content"
>
*
Derived data
*
Tidy data
*
Curated sets
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"3"
>
<div
class=
"box-title blue"
>
Analyses
</div>
<div
class=
"content"
>
*
Exploratory
*
Model building
*
Hypothesis testing
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"4"
>
<div
class=
"box-title green"
>
Dissemination
</div>
<div
class=
"content"
>
*
Manuscript, report, presentation, ...
</div>
</div>
<center>
<img
src=
"slides/img/data-flow_sources.png"
height=
60%
>
</center>
<center>
<img
src=
"slides/img/data-flow_transformation.png"
height=
60%
>
</center>
<center>
<img
src=
"slides/img/data-flow_chart.png"
height=
60%
>
</center>
<center>
<img
src=
"slides/img/data-flow_paper.png"
height=
60%
>
</center>
<div
class=
"content-box fragment"
data-fragment-index=
"5"
>
<div
class=
"box-title red"
>
Preserve
</div>
<div
class=
"content"
>
*
Version data sets
*
Backup
*
Protect
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"6"
>
<div
class=
"box-title yellow"
>
Reproduce
</div>
<div
class=
"content"
>
*
Automate your builds
*
Use workflow tools (e.g. Snakemake)
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"7"
>
<div
class=
"box-title blue"
>
Trace
</div>
<div
class=
"content"
>
*
Multiple iterations.
*
Code versioning (Git)
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"8"
>
<div
class=
"box-title green"
>
Track
</div>
<div
class=
"content"
>
*
Through multiple versions
</div>
</div>
</div>
<aside
class=
"notes"
>
flow of the data is downstream (mostly), but you are going back and forth
applies to all data (financial report, lab safety assessment)
</aside>
2020/2020-09-08_IT101-DM/slides/fair-principles.md
0 → 100644
View file @
5d3d7400
# FAIR (meta)data principles
*
dates back to 2014
*
well accepted by scientific community
*
necessity in data driven science
*
officially embraced by EU and G20
*
required by funding agencies and journal publishers
<center>
<img
src=
"slides/img/fair-principles.png"
height=
"400px"
>
</center>
<br>
<br>
2020/2020-09-08_IT101-DM/slides/howtos.md
0 → 100644
View file @
5d3d7400
# LCSB How-Tos
<br>
https://howto.lcsb.uni.lu/
<center>
<iframe
data-src=
"https://howto.lcsb.uni.lu/"
height=
"600px"
width=
"1200px"
></iframe>
</center>
2020/2020-09-08_IT101-DM/slides/img
0 → 120000
View file @
5d3d7400
../../2020-06-09_IT101-DM/slides/img/
\ No newline at end of file
2020/2020-09-08_IT101-DM/slides/index.md
0 → 100644
View file @
5d3d7400
# IT101 - Working with computers
<br>
IT101 - Working with computers
<br>
## June 09th, 2020
<div
style=
"top: 6em; left: 0%; position: absolute;"
>
<img
src=
"theme/img/lcsb_bg.png"
>
</div>
<div
style=
"top: 5em; left: 60%; position: absolute;"
>
<img
src=
"slides/img/r3-training-logo.png"
height=
"200px"
>
<br><br><br><br>
<h3></h3>
<br><br><br>
<h4>
Vilem Ded
<br>
Data Steward
<br>
vilem.ded@uni.lu
<br>
<i>
Luxembourg Centre for Systems Biomedicine
</i>
</h4>
</div>
2020/2020-09-08_IT101-DM/slides/ingestion.md
0 → 100644
View file @
5d3d7400
# Data housekeeping
## Available data storage
<div
class=
'fragment'
style=
"position:absolute"
>
<img
src=
"slides/img/LCSB_storages_full.png"
height=
"750px"
>
</div>
<div
class=
'fragment'
style=
"position:absolute"
>
<img
src=
"slides/img/LCSB_storages_personal-crossed.png"
height=
"750px"
>
</div>
# Data ingestion/transfer
## Receiving and sending data
<img
height=
"450px"
style=
"position:relative;left:10%"
src=
"slides/img/banned_exchange_channels.png"
><br>
<div
style=
"position:absolute; left:10%;width:30%"
>
## E-mail is not for data transfer
*
Avoid transfer of any data by e-mail
*
E-mail is a poor repository
*
Data can be read on passage
</div>
<div
class=
"fragment"
style=
"left:50%; width:30%; position:absolute"
>
## Exchanging data
*
Share on Atlas server
*
OwnCloud share (LCSB - BioCore)
*
DropIt service (SIU)
*
AsperaWeb share for sensitive data
</div>
</div>
# Data ingestion/transfer
Data can be corrupted:
*
(non-)malicious modification
*
faulty file transfer
*
disk corruption
<div
class=
"fragment"
>
### Solution
*
disable write access to the source data
*
Generate checksums!
<div
style=
"position:absolute;left:40%"
>
<img
src=
"slides/img/checksum.png"
width=
"500px"
>
</div>
</div>
<div
class=
"fragment"
style=
"position:relative; left:0%"
>
## When to generate checksums?
*
before data transfer
-
new dataset from collaborator
-
upload to remote repository
*
long term storage
-
master version of dataset
-
snapshot of data for publication
</div>
2020/2020-09-08_IT101-DM/slides/introduction.md
0 → 100644
View file @
5d3d7400
# Introduction
<div
class=
"fragment"
style=
"position:absolute"
>
<img
height=
"450px"
src=
"slides/img/wordcloud.png"
><br>
## Learning objectives
*
How to manage your data
*
How to look and analyze your data
*
Solving issues with computers
*
Reproduciblity in the research data life cycle
</div>
<div
class=
"fragment"
style=
"position:relative;left:50%; width:40%"
>
<div
>
<center>
<img
height=
"405px"
src=
"slides/img/rudi_balling.jpg"
><br>
Prof. Dr. Rudi Balling, director
</center>
</div>
## Pertains to practically all people at LCSB
*
Scientists
*
PhD candidates
*
Technicians
*
Administrators
</div>
2020/2020-09-08_IT101-DM/slides/list.json
0 → 100644
View file @
5d3d7400
[
{
"filename"
:
"index.md"
},
{
"filename"
:
"introduction.md"
},
{
"filename"
:
"data-introduction.md"
},
{
"filename"
:
"data_flow.md"
},
{
"filename"
:
"ingestion.md"
},
{
"filename"
:
"storage_setup.md"
},
{
"filename"
:
"data-housekeeping.md"
},
{
"filename"
:
"howtos.md"
},
{
"filename"
:
"reproducibility.md"
},
{
"filename"
:
"code_versioning.md"
},
{
"filename"
:
"visualization.md"
},
{
"filename"
:
"problem_solving.md"
},
{
"filename"
:
"fair-principles.md"
},
{
"filename"
:
"r3_group.md"
},
{
"filename"
:
"thanks.md"
}
]
2020/2020-09-08_IT101-DM/slides/overview.md
0 → 100644
View file @
5d3d7400
## Overview
0.
Introduction - learning objectives + targeted audience
1.
Data workflow
1.
Ingestion:
*
receiving/sending/sharing data
*
file naming
*
checksums
*
backup
1.
making data tidy
*
what is table
*
1.
Learning to code workflows and analyses - excel files, coding
1.
Code versioning and reproducibility
1.
Visualization
*
see the data
1.
problem solving
*
guide
*
rubberducking
*
google for help
*
oracle
1.
R3 team
1.
Acknowledgment
1.
data minimization
2020/2020-09-08_IT101-DM/slides/problem_solving.md
0 → 100644
View file @
5d3d7400
# Problem solving
A guide for solving computing issues
1.
Express the problem
*
Write down what you want to achieve
2.
Search for help
*
Read
**FAQs**
,
**help pages**
and the
**official documentation**
well before turning to Google
*
Use stack exchange, forums and related resources carefully
3.
Ask an expert
*
You have to submit the problem in writing
*
Make the question interesting
*
If you supply a trivial problem, it will stop answering
2020/2020-09-08_IT101-DM/slides/r3_group.md
0 → 100644
View file @
5d3d7400
# Responsible and Reproducible Research (R<sup>3</sup>)
## What is R<sup>3</sup>?
A multi-facetted change management
process built on 3 pillars:
-
R3 pathfinder
-
R3 school
-
R3 accelerator
Common link module: R3 clinic
<div
style=
"top: -1em; left: 50%; position: absolute;"
>
<img
src=
"slides/img/3pillars-full.png"
>
</div>
<br>
<br>
<br>
<br>
<aside
class=
"notes"
>
Pathfinder - policies, data management changes
<br>
School - courses, howtos, trainnings
<br>
Accelerator - advanced teams and their boost/support, CI/CD setup
<br>
Clinic - hands-on, meetings in groups, code review + suggestions
<br>
</aside>
## R<sup>3</sup> Training
*
LCSB's Monthly Data Management and Data Protection training
*
ELIXIR Luxembourg's trainings
<br>
https://elixir-luxembourg.org/training
*
R
<sup>
3
</sup>
school Git basics - every 4 months
<aside
class=
"notes"
>
Direct newcommers to this monthly training
</aside>
# Responsible and Reproducible Research (R<sup>3</sup>)
<center><img
src=
"slides/img/r3-training-logo.png"
height=
"200px"
></center>
Your R
<sup>
3
</sup>
contacts:
<div
style=
"display:block;text-align:center;position:relative"
>
<div
class=
"profile-container"
>
*
Christophe Trefois
*
<img
src=
"slides/img/R3_profile_pictures/christophe_trefois.png"
>
*
R
<sup>
3
</sup>
coordination