11. Git and GitHub#
This chapter was coauthored by Jason DeBacker and Richard W. Evans.
Two warnings that a seasoned Git and GitHub user should always give a new entrant to this type of version control and code collaboration are the following.
The learning curve is steep.
The workflow initially is not intuitive.
These two obstacles seem to work together to make this form of collaboration harder than the sum of their parts initially. However, once you begin collaborating on open source projects or on large-group academic or research projects, you start to see the value of all the different steps, methods, and safeguards invoved with using Git and GitHub. Figure 11.1 below is a diagram of the main pieces and actions in the primary workflow that we advocate in this book. You will notice that a version of this figure is the main image for the book and is also the favicon
for the tabs of the web pages of the online book. This figure of a Git and GitHub workflow diagram looks complicated, but these actions will become second nature. And following this workflow will save the collaborators time in the long-run.
11.1. Brief definitions#
(Repository)
A repository or “repo” is a directory containing files that are tracked by a version control system. A local repository resides on a local machine. A remote repository resides in the cloud.
(Git)
Git is an open source distributed version control system (DVCS) software that resides on your local computer and tracks changes and the history of changes to all the files in a directory or repository. See the Git website https://git-scm.com/ and the Git Wikipedia entry [Wikipedia Contributors, 2020] for more information.
(GitHub)
GitHub or GitHub.com is a cloud source code management service platform designed to enable scalable, efficient, and secure version controlled collaboration by linking local Git version controlled software development by users. GitHub’s main business footprint is hosting a collection of millions of version controlled code repositories. In addition to being a platform for distributed version control system (DVCS), GitHub’s primary features include code review, project management, continuous integration unit testing, GitHub actions, and associated web page (GitHub pages) and documentation hosting and deployment.
To be clear at the outset, Git is the version control software that resides on your local computer. It’s main functionalities are to track changes in the files in specified directories. But Git also has some functionality to interact with remote repositories. The ineraction between Git and GitHub creates an ideal environment and platform for scaleable collaboration on code among large teams.
11.2. Wide usage#
Every year in November, GitHub publishes are report entitled, “The State of the Octoverse”, in which they detail the growth and developments in the GitHub community in the most recent year. The most recent State of the Octoverse was published on November 17, 2022 and covered developments from October 1, 2021 to September 30, 2022. Some interesting statistics from that report are the following.
more than 94 million developers on GitHub
85.7 million new repositories in the last year for a total of about 517 million code repositories
more than 413 million contributions were made to open source projects on GitHub in 2022
The two most widely used programming languages on GitHub are 1st JavaScript (the language of web dev) and 2nd Python
more than 90% of Fortune 100 companies use GitHub
Open source software is now the foundation of more than 90% of the world’s software
Alternatives to GitHub include GitLab, Bitbucket. Other alternatives are documented in this June 2020 post by Software Testing Help. But GitHub has the largest user base and largest number of repositories.
11.3. Git and GitHub basics#
Create, clone, fork, remote, branch, push, pull, pull request.
Include a discussion of git pull
vs. git pull --ff-only
vs. git pull --rebase
. A good blog post is “Why You Should Use git pull –ff-only” by Shane at ssfc’s Tech Blog.
11.3.1. Fork a repository and clone it to your local machine#
For this example, let the primary repository is OG-Core
which is in the PSLmodels GitHub organization. This primary repository has a master
branch that is the lead branch to which we want to contribute and stay up to date.[1] If you wanted to contribute to or modify this repository, and you were following the workflow described in Figure 11.1, you would execute the following three steps.
Fork the repository. In your internet browser, go to the main page of the GitHub repository you want to fork (PSLmodels/OG-Core). Click on the “Fork” button in the upper-right corner of the page. This will open a dialogue that confirms the repository owned by you to which you will create the forked copy. This will create an exact copy of the OG-Core repository on your GitHub account or GitHub organization.
Clone the repository. In your terminal on your machine, navigate to the directory in which you want your Git repository to reside. Use the
git clone
command plus the URL of the repository on your GitHub account. In the case of my GitHub repository and the OG-Core repository, the command would be the following. Note that you are not cloning the primary repository.
DirectoryAboveRepo >> git clone https://github.com/rickecon/OG-Core.git
Add an
upstream
remote to your fork. Once you have cloned the repository to your local machine, change directories to the new repository on your machine by typingcd OG-Core
in your terminal. If you typegit remote -v
, you’ll see that there is automatically a remote namedorigin
. Thatorigin
name is the name for all the branches on your GitHub account in the cloud associated with the repository. In Figure 11.1,origin
represents boxes B and E. You want to add another remote calledupstream
that represents all the branches associated with the primary repository.
OG-Core >> git remote add upstream https://github.com/PSLmodels/OG-Core.git
11.3.2. Updating your main or master branch#
Let the primary repository is OG-Core
which is in the PSLmodels GitHub organization. This primary repository has a master
branch that is the lead branch to which we want to contribute and stay up to date. This repository is represented by box A in Figure 11.1. You have forked that repository, and your remote fork master
branch is represented by box B in Figure 11.1 and your local master
branch is represented by box C.
Suppose that OG-Core has been updated with some pull requests (PRs) that have been merged in. You want to update your remote and local master
branches (boxes B and C) with the new code from the primary branch (box A).
11.3.3. Create a development branch to make changes#
OG-Core >> git checkout -b DevBranchName
11.3.4. Adding, committing, pushing changes to remote repository#
11.3.5. Submit a pull request from your development branch#
11.3.6. Resolve merge conflicts#
11.4. Git and GitHub Cheat Sheet#
About 99% of the commands you’ll type in git
are summarized in the table below:
Functionality |
Git Command |
---|---|
See active branch and uncommitted changes for tracked files |
|
Change branch |
|
Create new branch and change to it |
|
Track file or latest changes to file |
|
Commit changes to branch |
|
Push committed changes to remote branch |
|
Merge changes from master into development branch |
|
Merge changes from development branch into master |
(change to development branch, then…) |
List current tags |
|
Create a new tag |
|
Pull changes from remote repo onto local machine |
|
Merge changes from remote into active local branch |
|
Clone a remote repository |
|
11.5. Footnotes#
The footnotes from this chapter.