Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
Todor Kondic
courses
Commits
a696bf39
Commit
a696bf39
authored
Feb 19, 2020
by
Vilem Ded
Committed by
Laurent Heirendt
Feb 19, 2020
Browse files
IT101 data management - February
parent
7b47143c
Changes
58
Hide whitespace changes
Inline
Side-by-side
2020/2020-02-17_IT101-DM/slides/code_versioning.md
0 → 100644
View file @
a696bf39
# Code versioning
<div
style=
"position:absolute; width:40%"
>
**git**
*
Current standard for code versioning
*
Maintain versions of your code as it develops
*
Local system, which does not require an online repository
*
Repositories allow distributed development
<img
align=
"middle"
height=
"300px"
src=
"slides/img/Git-logo.png"
>
</div>
<div
class=
"fragment"
style=
"position:absolute; left:50%; width:40%"
"
>
**git@lcsb**
*
Recommended, supported repository
*
Allows tracking of issues
*
Ready for continous integration - code checked on commits to the repository.
*
[
https://git-r3lab.uni.lu
](
https://git-r3lab.uni.lu
)
**Use at LCSB**
*
All analyses code should be in a repository
*
Minimally at submission of a manuscript
*
Better daily
*
Even better "analyses chunkwise"
</div>
<aside
class=
"notes"
>
Policy! - code in central repository
</aside>
2020/2020-02-17_IT101-DM/slides/data-housekeeping.md
0 → 100644
View file @
a696bf39
# Data housekeeping
## File names
<div
style=
"display:flex; position:static; width:100%"
>
<div
class=
"fragment"
data-fragment-index=
"0"
style=
"position:static; width:30%"
>
### General pricinples
*
Machine readable
*
Human readable
*
Plays well with default ordering
</div>
<div
class=
"fragment"
data-fragment-index=
"1"
style=
"position:absolute; left:33%; width:30%"
>
### Separators
*
No spaces
*
Underscore to separate
*
Hyphen to combine
</div>
<div
class=
"fragment"
data-fragment-index=
"2"
style=
"position:absolute; left:66%; width:30%"
>
### Date format follows **ISO 8601**<br>
2018-12-03
<br>
2018-12-06_1700
</div>
</div>
<div
class=
"fragment"
data-fragment-index=
"3"
style=
"width:100%; position:static"
>
<div
style=
"position:absolute;width:40%"
>
<b>
Bad
</b>
names
```
PhD-project-Jan19 alldata_final.foo
Finacial detailes BIocore 19/11/12.xls
ATACseq1Londonmapped.bam
```
</div>
<div
style=
"position:relative;width:40%; bottom:20%; left:50%"
>
<b>
Good
</b>
names
```
Iris-setosa_samples_1927-05-12.csv
PI102_Mouse12_EEG_2018-11-03_1245.tsv
Bioinfiniti_FullProposal_2018-11-15_1655.do
```
</div>
</div>
<div
class=
"fragment"
data-fragment-index=
"3"
style=
"width:100%; position:static"
>
From Jenny Bryan by CC-BY
(https://speakerdeck.com/jennybc/how-to-name-files)
</div>
# Data housekeeping
## File organization
*
Have folder organization conventions for your
**group**
*
Per Paper
*
Per Study/Project
*
Per Collaborator
*
Keep
<b>
readme files
</b>
for data
*
Title
*
Date of Creation/Receipt
*
Instrument or software specific information
*
People involved
*
Relations between multiple files/folders
*
Separate files you are actively working from the old ones
*
Orient newcomers to the group's conventions
# Data housekeeping
<div
style=
"position:absolute"
>
## When working
*
Clarify and separate source and intermediate data
*
keep data copies to a
**minimum**
*
Cleanup post-analysis
*
Cleanup copies created for presentations or for sharing
*
Handover data to a new responsible when leaving
</div>
<div
style=
"position:relative;left:50%; width:40%"
>
<img
src=
"slides/img/cleaning-table.jpg"
height=
"450px"
>
</div>
# Data housekeeping
## End of project
*
data should be kept as a single copy on server-side storage
*
no copies on desktops and external devices
*
non-proprietary formats
*
minimal metadata:
*
source
*
context of generation
*
data structure
*
content
*
sensitive data (e.g. whole genome)
**must**
be encrypted
<br/>
<br/>
*
If not specified otherwise, data must be kept for
**10 years**
following project end for reproducibility purposes
<aside
class=
"notes"
>
Note: sometimes it is hard to find/understand dataset 10 days old
</aside>
## In doubt on data archival?
Contact R
<sup>
3
</sup>
for support on archival of datasets using tickets:
*
https://service.uni.lu/sp
*
Home > Catalog > LCSB > Biocore: Application services > Request for: Support
# Data housekeeping - Summary
*
Use institutional media for storage of
**all**
data
*
Research data (particularly sensitive data) should be in a single source location
*
Enable encryption for data stored on movable media
*
Clarify and separate source and intermediate data
*
Disable write access to relevant source data (read-only)
*
Backup research data!
*
Download Anti-virus software
*
Generate checksums
<div
class=
"fragment"
>
## Server is your friend!
</div>
2020/2020-02-17_IT101-DM/slides/data-introduction.md
0 → 100644
View file @
a696bf39
# Data and metadata
<div
style=
"display:grid;grid-gap:100px;grid-template-columns: 40% 40%"
>
<div
>
## Data
*
"
*information in digital form that can be transmitted or processed*
"
<p
align=
"right"
>
-- Merriam-Webster dictionary
</p>
*
"
*information in an electronic form that can be stored and processed by a computer*
"
<p
align=
"right"
>
--Cambridge dictionary
</p>
</div>
<div>
## Metadata
*
data describing other data
*
information that is given to describe or help you use other information
*
metadata are data
*
can be processed and analyzed
</div>
</div>
<div
class=
"fragment"
>
## Metadata examples:
<div
style=
"position:absolute"
>
<ul>
<li>
LabBook
</li>
<li>
author/owner of the data
</li>
<li>
origin of the data
<li>
data type
</ul>
</div>
<div
style=
"position:absolute;left:25%"
>
<ul>
<li>
description of content
</li>
<li>
modification date
</li>
<li>
description of modification
</li>
<li>
location
</li>
</ul>
</div>
<div
style=
"position:relative;left:50%;top:0.7em"
>
<ul>
<li>
calibration readings
</li>
<li>
software/firmware version
</li>
<li>
data purpose
</li>
<li>
means of creation
</li>
</ul>
</div>
</div>
<div
class=
"fragment"
>
<br>
</center>
<center
style=
"color:red"
>
!Insufficient metadata make the data useless!
</center>
</div>
<aside
class=
"notes"
>
Sometimes metadata collection takes more time than data collection
</aside>
# LCSB research data
three categories:
*
**Primary data**
*
scientific data
*
measurements, images, observations, notes, surveys, ...
*
models, software codes, libraries, ...
*
metadata directly describing the data
*
data dictionaries
*
format, version, coverage descriptions, ...
*
**Research record**
*
description of the research process, including experiment
*
experiment set-up
*
followed protocols
*
...
*
**Project accompanying documentation**
*
ethical approvals, information on the consent)
*
collaboration agreements
*
intellectual property ownership
*
other relevant documentation
2020/2020-02-17_IT101-DM/slides/data_flow.md
0 → 100644
View file @
a696bf39
# Typical flow of data
<div style="display:grid;grid-gap:10px;grid-template-columns: 30% 20% 30%;
grid-auto-flow:column;grid-template-rows: repeat(4,auto);position:relative;left:8%">
<div
class=
"content-box fragment"
data-fragment-index=
"1"
>
<div
class=
"box-title red"
>
Source data
</div>
<div
class=
"content"
>
*
Experimental results
*
Large data sets
*
Manually collected data
*
External
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"2"
>
<div
class=
"box-title yellow"
>
Intermediate
</div>
<div
class=
"content"
>
*
Derived data
*
Tidy data
*
Curated sets
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"3"
>
<div
class=
"box-title blue"
>
Analyses
</div>
<div
class=
"content"
>
*
Exploratory
*
Model building
*
Hypothesis testing
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"4"
>
<div
class=
"box-title green"
>
Dissemination
</div>
<div
class=
"content"
>
*
Manuscript, report, presentation, ...
</div>
</div>
<center>
<img
src=
"slides/img/data-flow_sources.png"
height=
60%
>
</center>
<center>
<img
src=
"slides/img/data-flow_transformation.png"
height=
60%
>
</center>
<center>
<img
src=
"slides/img/data-flow_chart.png"
height=
60%
>
</center>
<center>
<img
src=
"slides/img/data-flow_paper.png"
height=
60%
>
</center>
<div
class=
"content-box fragment"
data-fragment-index=
"5"
>
<div
class=
"box-title red"
>
Preserve
</div>
<div
class=
"content"
>
*
Version data sets
*
Backup
*
Protect
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"6"
>
<div
class=
"box-title yellow"
>
Reproduce
</div>
<div
class=
"content"
>
*
Automate your builds
*
Use workflow tools (e.g. Snakemake)
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"7"
>
<div
class=
"box-title blue"
>
Trace
</div>
<div
class=
"content"
>
*
Multiple iterations.
*
Code versioning (Git)
</div>
</div>
<div
class=
"content-box fragment"
data-fragment-index=
"8"
>
<div
class=
"box-title green"
>
Track
</div>
<div
class=
"content"
>
*
Through multiple versions
</div>
</div>
</div>
<aside
class=
"notes"
>
flow of the data is downstream (mostly), but you are going back and forth
applies to all data (financial report, lab safety assessment)
</aside>
2020/2020-02-17_IT101-DM/slides/fair-principles.md
0 → 100644
View file @
a696bf39
# FAIR (meta)data principles
*
dates back to 2014
*
well accepted by scientific community
*
necessity in data driven science
*
officially embraced by EU and G20
*
required by funding agencies and journal publishers
<center>
<img
src=
"slides/img/fair-principles.png"
height=
"400px"
>
</center>
<br>
<br>
2020/2020-02-17_IT101-DM/slides/howtos.md
0 → 100644
View file @
a696bf39
# LCSB How-Tos
<br>
https://howto.lcsb.uni.lu/
<center>
<iframe
data-src=
"https://howto.lcsb.uni.lu/"
height=
"600px"
width=
"1200px"
></iframe>
</center>
2020/2020-02-17_IT101-DM/slides/img/3pillars-full.png
0 → 100644
View file @
a696bf39
131 KB
2020/2020-02-17_IT101-DM/slides/img/DinoSequentialSmaller.gif
0 → 100644
View file @
a696bf39
4.34 MB
2020/2020-02-17_IT101-DM/slides/img/Git-logo.png
0 → 100644
View file @
a696bf39
5.55 KB
2020/2020-02-17_IT101-DM/slides/img/LCSB_storages_backed-up.png
0 → 100644
View file @
a696bf39
151 KB
2020/2020-02-17_IT101-DM/slides/img/LCSB_storages_backup.png
0 → 100644
View file @
a696bf39
139 KB
2020/2020-02-17_IT101-DM/slides/img/LCSB_storages_full.png
0 → 100644
View file @
a696bf39
150 KB
2020/2020-02-17_IT101-DM/slides/img/LCSB_storages_personal-crossed.png
0 → 100644
View file @
a696bf39
154 KB
2020/2020-02-17_IT101-DM/slides/img/R3_profile_pictures/alexey_kolodkin.png
0 → 100644
View file @
a696bf39
25 KB
2020/2020-02-17_IT101-DM/slides/img/R3_profile_pictures/christophe_trefois.png
0 → 100644
View file @
a696bf39
56.7 KB
2020/2020-02-17_IT101-DM/slides/img/R3_profile_pictures/laurent_heirendt.png
0 → 100644
View file @
a696bf39
45.4 KB
2020/2020-02-17_IT101-DM/slides/img/R3_profile_pictures/maharshi_vyas.png
0 → 100644
View file @
a696bf39
54 KB
2020/2020-02-17_IT101-DM/slides/img/R3_profile_pictures/noua_toukourou.png
0 → 100644
View file @
a696bf39
49.9 KB
2020/2020-02-17_IT101-DM/slides/img/R3_profile_pictures/pinar_alper.png
0 → 100644
View file @
a696bf39
65.6 KB
2020/2020-02-17_IT101-DM/slides/img/R3_profile_pictures/reinhard_schneider.png
0 → 100644
View file @
a696bf39
53 KB
Prev
1
2
3
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment