What is a git branch

Learn Git

To really understand how Git does branching, we need to take a step back and examine how Git stores the data. As you may recall from Chapter 1, Git stores its data not as a series of changes or differences, but as a series of snapshots.

When you commit in Git, Git stores what is known as a commit object. This contains a pointer to the snapshot with the objects of the staging area, the author, the commit metadata and a pointer to the direct parents of the commit. An initial commit has no parent commits, a normal commit comes from a parent commit, and a merge commit, which results from the merging of two or more branches, has just as many parent commits.

To make this clear, let's assume you have a directory with three files, all of which you add to the staging area and pack into a commit. By staging the files, Git generates a checksum for each file (the SHA-1 hash that we mentioned in Chapter 1), stores this version of the file in the Git repository (Git references these as blobs) and adds the checksum to the Staging area added:

When you create a commit with the command, Git generates a checksum for each project directory and saves it as a so-called -object in the Git repository. Git then creates a commit object that contains the metadata and the pointer to the object of the root directory so that the snapshot can be created again if necessary.

Your Git repository now contains five objects: a blob for the content of each of the three files, a tree that lists the contents of the directory and specifies which file name belongs to which blob, and a pointer to the root of the project tree and the metadata of the commit. In principle, your data in the Git repository looks like Figure 3-1.

Figure 3-1. Repository data of a single commit.

If you change something again and commit again, it will contain a pointer that points to the previous one. After two more commits, the history could look like Figure 3-2.

Figure 3-2. Git object data for multiple commits.

A branch in Git is nothing more than a simple pointer to one of these commits. The default name of a Git branch is. With the initial commit you get a branch that points to your last commit. It automatically moves forward with every commit.

Figure 3-3. Branch that points to a commit in the history.

What happens when you create a new branch? Well, first of all, a new pointer is created. Let's say you create a new branch with the name. You do that with the command:

This creates a new pointer that points to the same commit you're working on (see Figure 3-4).

Figure 3-4. Several branches show the commit history

How does Git know which branch you're currently using? There is a special pointer called HEAD for this. Note that this concept is fundamentally different from other HEAD concepts of other VCS, such as Subversion or CVS. In Git, HEAD is a pointer that points to your current local branch. In that case, however, you are still in the branch. The command only created a new branch, but did not switch to it (see Figure 3-5).

Figure 3-5. The HEAD pointer points to your current branch.

To switch to another branch, use the command. Now let's switch to our new branch:

This means that HEAD has recently made reference to the “testing” branch (see Figure 3-6).

Figure 3-6. If you change branches, HEAD points to a new branch.

And what does that mean? Ok, let's do one more commit:

Figure 3-7 illustrates the result.

Figure 3-7. The HEAD pointer advances with each subsequent commit.

This is interesting because your branch has now moved forward and your branch is still pointing to its last commit. The commit you last edited before switching the current branch. Let's switch back to the branch:

Figure 3-8 shows the result.

Figure 3-8. HEAD points to another branch after a checkout.

The command did two things. On the one hand, it moves the HEAD pointer back to the branch; on the other hand, it resets all files in the working directory to the processing status of the last commit in this branch. However, this also means that all changes to the project are now made completely independently of older project versions. In short, all changes made in the branch are temporarily undone and you have the opportunity to take a completely new path in development.

Let's make a few changes and commit them:

Now the project courses branch out (see Figure 3-9). You created a branch and switched to it, worked a little, went back to your main branch and did something completely different there. Both works exist completely independently of each other in two different branches. You can switch between the two branches at will and merge them when you think the time has come. And you did all of this with simple and commands.

Figure 3-9. The history diverges.

Branches can be easily created and removed in Git because they are just small files that contain a 40-character SHA-1 checksum of the commits they reference. Creating a new branch takes just as much effort as writing a 41-byte file (40 characters and a line break).

This is in stark contrast to the path most other VCS tools take when it comes to branching. These often copy each new development branch into another directory, which - depending on the size of the project - can take several minutes, whereas Git does this job immediately. Since we also always keep the original commit, a common basis for a merge can be easily found and implemented. This property is intended to encourage developers to create and use development branches frequently.

Let's see why you should do this.