Anirban

Northern Arizona University

Basic Git concepts explained

This post contains my explanations to some of the basic concepts revolving around Git.

Contents

Git

Git is a piece of software, or more as to generally speak, a ‘system’ that records changes that we make to our files over time. It’s main purpose it to keep track of the changes one makes to their set of files (which they want to be recorded), and to help in different scenarios where this tracking can be essentially helpful. For instance, if the need arises, one can revert back to a point in time wherein everything worked fine, if the newly updates files had changes that were undesired.

The second purpose of git is to make collaboration easy, and to establish a common ground where all its users (such as software developers) can refer to a codebase and the changes to it at different points, plus make changes to it one by one in a systematic and convenient way.
With the use of git, many people can collaboratively work on a project together by sharing and updating their own different versions of it.

Repository

A ‘repository’ in general means a place to store stuff (which can be anything - resources, objects etc.). As a term in Computer Science, it generally means a folder/directory wherein files are kept (i.e. a file storage location). A ‘git repository’ is likewise, a regular file storing folder (can think of it as a container), except that it has git initialized and incorporated into it (in a separately sub-directory .git/), which helps to keep track of the changes made to those files over time.

Remote

Remote in general refers to something distant or far away, and its meaning in context to git repositories also shares the same sense, i.e. to say remote here is a git repository which is hosted on GitHub’s servers somewhere in the world, distant and separate from the local repository which resides on our computer. It is an online version of our repository and can have different changes uploaded therein with respect to the version on our local machine. (which can then be pulled locally by using the git pull command, or on the contrary, the local changes can be pushed there onto GitHub by using the git push command, to keep the online version updated)

Branching

‘Branching’ in general refers to the act of separating out into different pathways, from a single one. In git too, the same concept applies, wherein one initially has a single stream of development (for the first/initial branching, this has to be the main branch, which was previously known as master) or a ‘branch’, and this is separated out into another instance (or rather a copy) of this developmental branch at a given point of time based on the user’s wish. (likewise, the created branch can be further segregated into different branches, with a parallel workflow) This tends to be crucial if one would like to have a clean working (bug-free, or at least without ones that might be introduced with updates) copy of the main branch, with major changes (such as features) isolated into different branches.

I find branching to be an interesting phenomenon which can be literally applied everywhere, right from decision making to fictional time travel theories (here’s something that I wrote along the lines of it!), with ample of examples which everyone can understand and relate to within their domain.

Merging

‘Merging’ in general refers to the act of combining two different pathways into one. In git, this can be thought of as taking the independent lines of development created by branching at one point (the time their histories diverged from the current branch) and integrating them into a single branch (i.e. the current branch, with the other being left unaffected).

Merge conflicts

It’s not always expected for an integration or difference-incorporating process to go smoothly, as for example in the real world, opinions may overlap. Likewise in git too, when different changes are made to the same file or to the same lines in two separate branches, then merging them isn’t something that git can do without your help, since it has to know which of the updates to the copy is correct, and hence it creates a merge conflict which one needs to resolve, by going with either of the choices. To avoid this, it is important for contributors to keep track of overlapping updates, push/pull often or separately work, with different code segments/files assigned to each contributor/collaborator.

The three git states

A file or a set of files can have one among three states in the world of git:

  • Modified:
    As the name suggests, the files here at this state are recognized by git to be modified, i.e. to have some changes with respect to the previous git snapshot (which is a saved instance of a commit object containing all the user-added changes at one point that git saves locally - think of it like a savepoint in a game!), but they are not tracked by git to have them staged yet, and then to subsequently be ready to get committed (pertains to the next two states in order).
  • Staged:
    Being ‘staged’ refers to the state of the files wherein git not only notices the differences in, but also keeps track of the files in a staging area, alternatively referred to as the ‘index’ (interchangeable terms). One needs to explicitly mark the modified files ready (usually in anticipation to include those changes into the next commit snapshot) via use of the git add command, which literally adds the files to the index, which again to reiterate in plain words, is an area where the files tracked by Git are kept for the moment, which can later either be unstaged (leading back to the modified state, with untracked changes) or committed (leading to the next state in order, with changes being committed, or saved in a way).
  • Committed:
    This is the final state right after staging (which in most cases, is basically a checkpoint before proceeding to save the changes locally), with all the staged files (having the applied modifications) being saved (and safely stored in a local database maintained by git, which can be pushed onto a remote on GitHub) with a distinct commit ID given by a unique sha-1 hash, which labels the commit in the commit history (which holds different snapshots of the committed files from previous commits).