Workflow
Many modern statistical methods require some programming. This is especially true of bayesian modeling. But how do I best write code that others can use, understand, and collaborate on? How do I seek help effectively? We take a look at writing reproducible examples, the version control tool Git, and the code collaboration platform GitHub.
You are viewing the session notebook. Click here for slides.
Problem
Generally, we want to
- Do Bayesian data analyses
- Write reproducible code
- Seek help effectively
- Collaborate with others
- Make code/work available?
In this intro, we look at tools that facilitate achieving these goals. The more complex your analyses get, the more helpful these tools (might?) be.
Set-up
Let’s first make sure we’ve set up the tools
-
- To ensure Git works
-
- Help connecting to GitHub here
Git
- Git is a version control tool—a program on your computer
- Organize projects into repositories
- Local repository:
~/Users/matti/Documents/workshop/
(actually~/Users/matti/Documents/workshop/.git
) - Remote repository:
https://github.com/mvuorre/workshop.git
- Local repository:
- Functions to
- Commit states to history
- Push and pull history from/to remote repository
- and more…
- Powers most software collaborations
Git
- Git can get extremely complicated
- I wrote a whole paper about it (Vuorre and Curley 2018), but still Kagi everything
- We want to know just enough and not more
GitHub
- GitHub is a Microsoft-owned developer platform owned by Microsoft
- GH hosts remote Git repositories with interesting additions (live demo)
- Get the workshop’s source code from GitHub:
# In a directory where you're comfortable putting stuff
git clone https://github.com/mvuorre/workshop.git
cd workshop.git
There are many alternative services such as GitLab and Codeberg.
Collaborating with Git and GitHub
General workflow for contributing to others’ projects
- Find a problem and let the author know about it
- –> Submit an issue
- Fix the problem and submit your fix
- –> Submit a “pull request”
- In many cases want to show examples of what’s going wrong and how
- Reproducible example
- Idea applies equally to e.g. seeking help for your own problems on forums etc.
Reproducible examples
Practice 1
- Create a reproducible example
- Submit your reprex as a new “example” issue at https://github.com/mvuorre/workshop/issues
- We’ll solve your problems together
Practice 2
Live example: Contributing to common repo (https://www.atlassian.com/git/tutorials/comparing-workflows)
- Get added as collaborator to https://github.com/brms-workshop/stuff
git clone https://github.com/brms-workshop/stuff.git
- Find code that needs fixing, and let others know with an issue
- Fix code in a new branch
- e.g. Create a new file—this is an example.
- Submit branch to GitHub and open a pull request
- Discuss changes with others in pull request
Practice 3
Live example: Contributing to someone else’s repo
- Fork the workshop repo to your GitHub account
- Clone your remote repo to your computer
git clone https://github.com/{your-name}/workshop.git
- Make changes
- For example, fix the
reprex.R
file
- For example, fix the
- Push local changes to your remote
- Open a pull request
- Discuss changes with others in pull request
Wrap-up
- Bayesian statistics?!?
- Reproducible examples are essential for seeking help
- There will come a time when you need help!
- Proper tools help us collaborate better
- Visibility
- Can choose public/private repos
- Be careful if this is something you’re concerned about
References
Vuorre, Matti, and James P. Curley. 2018. “Curating Research Assets: A Tutorial on the Git Version Control System.” Advances in Methods and Practices in Psychological Science 1 (2): 219–36. https://doi.org/10.1177/2515245918754826.