14 February 2021 ; tagged in: git , workshop

Git for beginners part 4a: remote repositories

Pushing, pulling, cloing, and forking.

Git for beginners part 4a: remote repositories

Back in part 1 of this series, I said that git is a distributed version control system. But so far, I've only been talking about a single user contributing to a single repository. When you have multiple people working on the same project (or even on person working on a project on multiple machines), you need to understand distribution in git.

The distributed nature of git comes from how different repositories communicate with each other. The key thing to note is that every machine in a distributed setup has its own, complete copy of the repository. There's a complete repository on your laptop, there's a complete repository on your collaborator's machine, there's a complete repository on Github.

Changes on one repository stay local to that repository until you send the changes to another repository, or you ask for a repository to be updated. But in any case, your own work won't be overwritten by what happens elsewhere, even if you ask for remote changes to be applied to your own repository.

Generally, a team organises around a shared repository hosted on some server. Individual developers take a copy of a repository, work on it, then push the changes back to the central repository (there's a lot of detail about exactly how that happens). Once there, other team members can pull those changes back into their individual, local repositories. (It's possible to have direct peer-to-peer communication between repositories on developers' machines, but that's not common.)

Sites like Github and Gitlab offer a convenient home for those central server-hosted repositories. They're well-known, easy to use, and can offer a showcase for a project or a team. These servers also act as a home for off-site backups of your own projects.  

In this post, I'll use Github in the examples. Gitlab is very similar.

Dealing with remote repositories has some technicalities, and there are different ways of dealing with hte challenges. I've split the topic into three posts.

  1. This post deals with "plumbing", setting up remote repositories and using them.
  2. The next post shows the git-only case of two developers collaborating by creating a remote branch, with both of them pushing changes to that branch and pulling the other's changes.
  3. The final part deals with an extension used by Github, pull requests. (GitLab calls them merge requests.)
If you haven't already, create a Github account.

The plumbing: transfer protocols and keys

Before we can dive into using remotes repositories, we have to understand how data moves between them, and is authorised. We don't want to allow our repositories to be vandalised by anyone on the internet, so we need to know now to authorise certain people (including us!) to write to our repositories.

Git uses two protocols for moving data between remotes: https and git. The https protocol is good for allowing anyone to read and copy a repository, but you have to provide a username and password when you want to write to the repo (though tools like Github Desktop will remember the passwords for you for a while). git uses ssh keys, secrets shared between your computer and the remote site, to authenticate you for reads and writes. It takes a bit more effort to set up, but you don't need to enter passwords all the time.

In this post, I'll assume you're using the git protocol for moving things around.

If you haven't already, install Github Desktop. Ideally, tell Github about your SSH keys as well.

Remotes and (remote) tracking branches

Now you have the general idea, it's time to start looking at the detail of how these different repositories interact.

Different repositories related to the same project are connected by the notion of remotes, those other connected repositories. You can pull updates from a remote to your local repository, and push updates from your local repository to a remote.

Cloning a repository will make a copy of a remote repository and place it on your local machine. It will also create a remote reference to the source repository, which it calls origin. (origin is just the default nickname for the remote used to clone this repository; you can name remotes anything.)

The git remote command tells you about the remote repositories that git knows about.

Each individual repository tracks its own branches, as we've seen in earlier posts. With remotes, git also tracks some of the branches in the remote repositories (the remote tracking branches). Push, fetch, and pull commands send updates between these remotes. The Git book has a good section explaining the detail of remote branches. In this post, I'll take you though some examples of how to use them.

Create a remote repository

The first step is to create a remote repository on Github and connect it to your local repository. Find the + button at the top-right of the Github page and create a new repository.

When you create the repository, make sure you do not create a README or licence file in the repository.

Call your repository what you want. I'm using git-workshop, so the URL of this repository is git@github.com:NeilNjae/git-workshop.git. You can find this by clicking on the green "Code" button.

Next, push the local copy of your repository up to Github. You do this in three stages:

~/Programming/alice/git-workshop$ git switch master
~/Programming/alice/git-workshop$ git remote add origin git@github.com:NeilNjae/git-workshop.git
~/Programming/alice/git-workshop$ git push --set-upstream origin master

The first line switches to the master branch. The second line tells your repository about the remote one you just created on Github; you're giving the repository the local nickname origin . The third line pushes your master branch to that remote origin repository. The --set-upstream flag in the command connects the local master branch to the remote master branch.

A look at the log on your local repository should show the new remote tracking branch, shown as origin/master.

~/Programming/alice/git-workshop$ git lg
* b8d04bd - (3 days ago) Fancy headings - Neil Smith (HEAD -> master, origin/master)
* 016dfb6 - (4 days ago) Added fourth paragraphs - Neil Smith
| * 7033bb8 - (4 days ago) Headers now uppercase - Neil Smith (capitals)
| * 364ed5f - (4 days ago) Started adding capitals - Neil Smith
|/  
* 81bb00e - (4 days ago) Third commit - Neil Smith
* 98690f1 - (4 days ago) Second commit - Neil Smith
* 7c0dffe - (4 days ago) First commit - Neil Smith

If you now refresh the repo's page on Github, you should see your files there, but only the master branch: your capitals branch (and any others you've created) remain only known in your local repository.

Collaborative working

Let's now have multiple people working on one repository. If you happen to have multiple people lying around, you can use them to discover how to collaborate via a repository. If you're working through this on your own, you can create a second local copy of the repository in a different directory.

If you want to add a new person to a repository, you'll need their Github account name (or email address). In your repository's home page, click first on Settings at the top right, then Manage Access on the left, then the green Invite Collaborator button at the bottom. Type in their account name, and they should get an invitation to participate in your repository.

Adding a collaborator on Github

If you're on your own, you already have access to your own repository.

In either case, you will have two identities set up to access the same repository. I'll call them Alice and Bob. Alice is the one who initially created the repository. Bob is the new person who will be collaborating.

Before Bob can do anything, he'll need a copy of the repository on his own computer. Copying a remote repository to your local machine is called cloning.

To clone a repository, click on the same green "Code" button and get the repo's URL. Bob then needs to open terminal/shell on his computer and change to the directory that will be the parent of the local copy of the repository.

If you're doing both halves of this on your own, it'll be easier if you open a new terminal window for Bob. Make sure that the Bob terminal is not in the same directory as Alice's repo, or in a sub-directory of Alice's repo. I suggest using this structure:

.
├── alice
│   └── git-workshop
└── bob

Once Bob has is own directory, he can clone the repository.

~/Programming/bob$ git clone git@github.com:NeilNjae/git-workshop.git
Cloning into 'git-workshop'...
remote: Enumerating objects: 23, done.
remote: Counting objects: 100% (23/23), done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 23 (delta 9), reused 19 (delta 8), pack-reused 0
Receiving objects: 100% (23/23), 4.25 KiB | 1.42 MiB/s, done.
Resolving deltas: 100% (9/9), done

Cloning creates a new directory and places the cloned repository in it. The directory structure will now look like this:

.
├── alice
│   └── git-workshop
└── bob
    └── git-workshop

To make things easier to see in the logs, I suggest changing the user names used in each repo.

$ cd ~/Programming/alice/git-workshop
$ git config user.name "Alice"
$ cd ~/Programming/bob/git-workshop
$ git config user.name "Bob"

Bob can now change into this directory and see what branches are there.

$ cd ~/Programming/bob/git-workshop
~/Programming/bob/git-workshop$ git lg
* b8d04bd - (3 days ago) Fancy headings - Neil Smith (HEAD -> master, origin/master, origin/HEAD)
* 016dfb6 - (4 days ago) Added fourth paragraphs - Neil Smith
| * 7033bb8 - (4 days ago) Headers now uppercase - Neil Smith
| * 364ed5f - (4 days ago) Started adding capitals - Neil Smith
|/  
* 81bb00e - (4 days ago) Third commit - Neil Smith
* 98690f1 - (4 days ago) Second commit - Neil Smith
* 7c0dffe - (4 days ago) First commit - Neil Smith

Note that cloning creates a link back to the remote repository that was cloned, again called origin. Cloning also creates the master branch and sets it up to track the remote master branch as well.

But if you look above, Alice has the capitals branch in her repository, but Bob doesn't know about it. The capitals branch is local to only Alice, so only she can see or manipulate that branch.

That means there are two sides to remote working: multiple developers working on an existing branch; and one developer publishing a branch so that others know about it and can work on it.

Now we've seen how to set up remote repositories, the next post will show how to use them!

Credits

Cover photo by unsplash-logoMael BALLAND