Git for beginners part 1: commits and restore

    This is the first of a set of posts supporting a workshop on Git for Beginners. The main source for the workshop is the set of slides I've prepared.

    What is Git?

    Git is a distributed version control system. Which probably means nothing.

    A version control system is a system that keeps track of changes in a bunch of files, such as a programming project.

    The distributed part means that it works for teams, with everyone having a copy of the project. With git, everyone is on an equal footing; there are no bosses.

    Github is just a website that makes sharing projects easy.

    There are alternatives! Mercury and Subversion are alternative version control systems, but much less popular. Gitlab and Bitbucket are alternatives to Github, and are relatively popular.

    Installing Git

    You can install the latest version of Git from the Git website. If you're using Linux or a Mac, you're probably best installing Git from the standard software sources. If you're using Windows, Git for Windows is a good choice to get started (and these workshops will be easier to follow if you install "Git Bash").

    Github Desktop is a good front-end for using Github, so long as you're only doing straightforward things; as soon as you want to get complicated, you'll need to use a different tool.

    You can do a lot of things with the Git-gui tool, but the command line is the most powerful and flexible way of using git. Once you know the concepts of git, picking up a GUI tool should be fairly simple.

    Repositories and commits

    Concept: the repository

    A repository is everything in a project that Git knows about. It's also the complete history of everything in the project. Nothing is every removed from that history, so you're always able to go back to a previous version. This means that mistakes aren't a bad as they could be.

    Commits are cheap: git only stores the differences between commits, so each commit doesn't take up much space. If in doubt, include things in the repository. Include:

    • code
    • configuration files
    • database schemas
    • tests

    Don't include things like:

    • secret information (API keys, passwords)
    • logs
    • automatically-generated files (compiled or minified files, pull-in dependencies, etc.

    The .gitignore file will look after some of that for you, but that's outside the scope of this workshop.

    To show git in action, we'll take on the "project" of copying some old novels. Pick your favourite two novels from Project Gutenberg (I'm using Frankenstein by Mary Shelly and Carmilla by J. Sheridan LeFanu).

    Making a repository

    Start bash

    $ mkdir yourname-git-workshop
    $ cd yourname-git-workshop
    $ git init
    $ ls -a
    .  ..  .git

    That hidden .git directory is where git stores all the history. You are unlikely to need to look inside it.

    Including some files

    1. For each novel, create one text file (name it something with all lower case, no spaces, and ending .txt).
    2. In each file, put the first couple of sentences of each novel.
    3. Add the marker # Commit 1 before the chunk of text.
    4. Save the files.

    The files should look like these:

    # Frankenstein, by Mary Shelly 
    > Text from
    # Commit 1
    ## Letter 1
    _To Mrs. Saville, England._
    St. Petersburgh, Dec. 11th, 17—.
    You will rejoice to hear that no disaster has accompanied the
    commencement of an enterprise which you have regarded with such evil
    forebodings.  I arrived here yesterday, and my first task is to assure
    my dear sister of my welfare and increasing confidence in the success
    of my undertaking.
    # Carmilla, by J. Sheridan LeFanu
    > Text from
    # Commit 1
    In Styria, we, though by no means magnificent people, inhabit a castle,
    or schloss. A small income, in that part of the world, goes a great way.
    Eight or nine hundred a year does wonders. Scantily enough ours would
    have answered among wealthy people at home. My father is English, and I
    bear an English name, although I never saw England. But here, in this
    lonely and primitive place, where everything is so marvelously cheap, I
    really don't see how ever so much more money would at all materially add
    to our comforts, or even luxuries.

    We're now ready to include these files in the repository. This means we need to understand commits.

    Concept: a commit

    A commit is a snapshot of a project. It contains all the files in all the directories in the project. It's taken at a particular moment in time, and exists forever in the project's history. Each commit knows its parent commit, and that allows you to go back through the entire history of a project.

    Each commit has a unique key (such as 1dbb1a9). You can always refer to another commit, get files from that commit, or even rewind history to a commit.

    Commits are cheap to make, as git only stores the changes in each commit from its parent. So commit early and often, so that you don't lose too much work when you need to recover from a mistake.

    Making a commit

    With that understanding, let's make our first commit. The command git status is your friend and will tell you a lot about what git thinks is going on.

    $ git status
    On branch master
    No commits yet
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
    nothing added to commit but untracked files present (use "git add" to track)

    This tells us git has found two files it could include in the repository, but we've not yet asked it to keep an eye on them.

    We ask git to include these files.

    $ git add --all
    $ git status
    On branch master
    No commits yet
    Changes to be committed:
      (use "git rm --cached <file>..." to unstage)
            new file:   carmilla.txt
            new file:   frankenstein.txt

    Now we commit these files to the repository, and git will forever remember them.

    $ git commit -m "First commit"
    [master (root-commit) 0a411f8] First commit
     2 files changed, 33 insertions(+)
     create mode 100644 carmilla.txt
     create mode 100644 frankenstein.txt
    $ git status
    On branch master
    nothing to commit, working tree clean

    We'll now add two more commits.

    1. In each file, add another paragraph
    2. Head the paragraph with the # Commit 2 heading (you'll need them later)
    3. Add and commit the changed files
    4. Add a third paragraph to each file
    5. Add and commit the changes again
    # Frankenstein, by Mary Shelly 
    # Commit 1
    You will rejoice to hear that no disaster has accompanied ...
    # Commit 2
    I am already far north of London, and as I walk in the streets ...
    # Commit 3
    I try in vain to be persuaded that the pole is the seat of ...

    git status should show "working tree clean". git log should show the three commits you have made, with an identifying code for each one (your codes will differ).

    $ git log --oneline
    7b2f7e5 Third commit
    707372d Second commit
    0a411f8 First commit

    Making, and fixing, mistakes

    In one of the files, change every 'e' to '!'. Save the file.

    This is a mistake.

    # Frank!nst!in, by Mary Sh!lly 
    > T!xt from https://www.gut!nb!!s/84/84-0.txt
    # Commit 1
    ## L!tt!r 1
    _To Mrs. Savill!, !ngland._
    St. P!t!rsburgh, D!c. 11th, 17—.
    You will r!joic! to h!ar that no disast!r has accompani!d th!
    comm!nc!m!nt of an !nt!rpris! which you hav! r!gard!d with such !vil

    We can now use git to recover from this mistake. git status gives us a clue:

    $ git status
    On branch master
    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git restore <file>..." to discard changes in working directory)
            modified:   frankenstein.txt
    no changes added to commit (use "git add" and/or "git commit -a")

    The restore command might do what we want. Try it!

    $ git restore frankenstein.txt
    $ git status
    On branch master
    nothing to commit, working tree clean

    Now look at the file. The changes have been reversed.

    But what happened?

    Concept: three trees

    Git knows of three places where data sits: the working directory, the Index, and in the commits (the current one is called HEAD).

    • your working directory is what's on your local machine, independent of git
    • the Index is what will go into the next commit
    • HEAD is the most recent commit Git knows about

    (Index is separate so that, if you make many changes at once, you can bundle some changes into one commit and others into another commit. But it's also very confusing.)

    The add command puts files in the Index (also known as staging the files). The commit command creates a new commit from the Index (and updates HEAD). The restore command, by default, takes things from the index and puts them in the working directory.

    That's what happened above. The good version of the file was in the Index, and the bad version was in the working directory. git restore took the good version from the Index and replaced the bad version.

    • git restore file.txt copies the file in the Index to the working directory
    • git restore --staged file.txt copies the file in HEAD to the Index
    • git restore --staged --worktree --source=HEAD file.txt copies to the file in HEAD into both the Index and working directory
    A note on versions and commands. Up to git version 2.23, the checkout command was used to restore files. Confusingly, checkout was also used to switch between branches, and the syntax for both uses was itself confusing. In version 2.23, checkout became the two commands restore and switch. But many tutorials online were written using the old commands, so you may see lots of references to checkout being used to do what restore does now.

    Fixing bigger mistakes

    Let's make a mistake and commit it.

    1. $ git status
    2. Replace all the e with !
    3. $ git status
    4. $ git add frankenstein.txt
    5. $ git status
    6. $ git commit -m "No more vowels"
    7. $ git status

    The status of the three trees looks like this:

    What have we done? How do we get the good file back?

    Hint: HEAD~1 means "the parent of HEAD"

    That means we can get the good file from an earlier commit.

    $ git restore --source=HEAD~ file.txt

    will get the file back. We can now continue to work on it, and commit it, as we would any other file.

    End of the first part

    That's enough for the first post. It's covered several concepts, including:

    • repositories
    • commits
    • three trees
    • The commandsadd, commit, and restore

    Neil Smith

    Read more posts by this author.