www.databio.org/slides
## The three levels of collaboration
0- None
1- One-way communication
2- Conferencing
3- Coordination
## Why collaborate on software?
## Why collaborate on software?
Because collective progress increases with increased collaboration.
But I don't develop software!
Yes you do. Data analysis is software development
# Levels of collaboration
## 0. None
I write and use code for my project.
## 1. One-way Communication.
I give you my script and you run it. Analogy: TV
## 2. Conferencing.
Interactive work toward a shared goal; collecting bug reports and user feedback. Analogy: Brainstorming conference call.
## 3. Coordination.
Interdependent work toward a shared goal. Analogy: a sports team. Everyone contributes, adjusts to others, and does something different.
How do we move toward coordination?
0- None
1- One-way communication
2- Conferencing
3- Coordination
Git
a distributed version-control system that tracks changes in software development
created by Linus Torvalds in 2005 for development of the Linux kernel
free and open-source (GPL2)
Github
a web-based hosting service for version control using Git
company started Feb. 2008
purchased by Microsoft for $7.5 billion in 2018
# git/github ecosystem
## version control
[centralized vs distributed](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control)
[git vs svn](https://trends.google.com/trends/explore?date=all&geo=US&q=git,svn)
## distribution
[the octoverse](https://octoverse.github.com/)
## collaboration
[dashboard](https://github.com/orgs/databio/dashboard)
# Git solves problems
## Version control
## Problem 1
### My computer crashed and I lost all my code.
Solution: Remote backup (S3?)
*or*
git + GitHub
## Problem 2
### I want to work on my code from my home and work computers
Solution: Remote working copy (Dropbox?)
*or*
git + GitHub
## Problem 3
### My changes broke this function and I can't remember how it used to work.
Solution: Manual version control: "code1.R" and "code2.R"?
*or*
git + GitHub
## Problem 4
### I can't remember what code I used on this sample last year. Or, I want to note this particular version because I used it for the initial paper submission.
Solution: Version control + unstructured notes/logs?
*or*
git + GitHub tags
## Problem 5: My remote backup crashed and I lost all my history.
Solution: More remote backups (*distributed* VCS)?
*or*
git + GitHub
# Git solves problems
## Distribution
## Problem 1
### I want to publish my code with my paper so others can find and use it. How should I do it?
Solution: Website?
*or*
git + GitHub
## Problem 2
### How can I get a permanent, fast URL for my software so I can build an automated container that will download and install it automatically?
Solution: A high-quality code hosting service?
*or*
git + GitHub
## Problem 3
### I'd like other people to be able to find and use my code. How can I advertise it?
Solution: Google adwords?
*or*
git + GitHub
## Problem 4
### How can I find software that people actually use that's relevant for my project?
Solution: Google?
*or*
git + GitHub
# Git solves problems
## Collaboration
## Problem 1
### Someone else found a bug in my code and wants to show me how to fix it.
Solution: E-mail?
*or*
User submits a pull request on GitHub. You can also [point to specific lines](https://github.com/databio/pypiper/blob/653216887cb2b2ad8e9119b76f40b39da58ec115/pypiper/ngstk.py#L72-L75).
## Problem 2
### My friend and I are working on a similar problem. How can we share our code with one another, but not with anyone else?
Solution: E-mail? Dropbox?
*or*
GitHub collaborators or organizations
## Problem 3
### My collaborator wants to keep using my code for this current project while I develop and test a new feature.
Solution: Duplicate the code?
*or*
git branches + GitHub
## Problem 4
### A user is having trouble getting something to work. How do they know who I am and how to contact me?
Solution: An E-mail address on a web page?
*or*
git + public GitHub issues
## Problem 5
### I figured out how to adapt this published tool to work for my data. How can I contribute these changes back to the original authors?
Solution: E-mail?
*or*
git + GitHub pull request
## Problem 6:
### Our lab/center all needs to do on a similar thing over and over, with slight differences. How can we share effort but also keep things separate?
Solution: Lots of duplicated scripts with minor tweaks?
*or*
git + GitHub branches and tags
# Key git/github concepts
## repository *vs* remote
## branch *vs* clone
## clone *vs* fork
## pull request *vs* merge
## commit *vs* push
## issue, tag, [stage](https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository)
# How git works
## And things to avoid
## Do: commit text files
Git uses line-by-line comparison.
See this [pull request on the `peppy` repository](https://github.com/pepkit/peppy/pull/238/files)
## Don't: commit binary files
## Do: commit small versioned files
Git retains a copy of everything you've committed, even if you delete it.
## Don't: commit large static files
## Do: make commits frequently
Nothing can't be undone. Frequent commits helps you track your work.
## Don't: be scared to break something
## Do: learn to use branches
[Branches](https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell) are a super useful organizational structure
## Don't: be scared of using branches
## Do: use the command line
Write your own [aliases](https://github.com/nsheff/env/blob/master/alias_git.sh) for commands you use frequently.
## Don't: just rely on the web interface
## Do: use the issue tracker
Every project can enable a GitHub issue tracker, which links nicely to code.
## Don't: use e-mail to document problems and solutions
## Other niceties
- *GitHub pages*: free hosting for static web pages
- *Jekyll*: Github's blog-aware static site generator
- *Git hooks*: executes scripts before or after events
- *Github Wiki*: a no-frills wiki on every repository
- *GitHub project tracker*: integrates a simple kanban system
- *Github API*: provides programmatic access
- *Gists*: small code snippets
- Free private repositories for individuals
- Free private repositories for academic groups
## Git's utility transcends software
- analytical code, not just tools
- VCS/collaboration for writing grants, papers, CV/biosketch
- VCS and host for lab web page and all code documentation
- citation management database
- shared lab instructions
- Environments: modulefiles, Dockerfiles, config files
- a shared figure repository for lab members
- presentations
- communicating with groups of people, brainstorming
## Git is a single infrastructure that provides solutions to a huge number of problems
[peppy repository](https://github.com/pepkit/peppy)