DevOps

How to Search Files in a Git Repository

If you have questions about searching Git repositories, such as what was the content of a particular file at an earlier time? What has changed since then? And who is responsible for this change? This blog post is for you.

 

Numerous commands, including git show, git diff, and git blame, can help answer these and other questions.

 

Viewing Old Versions of a File (git show)

The git show <revision>:<file> command outputs the <file> file in the state it was in when the <revision> commit was current. So, if you’ve tagged version 2.0 of your program with the v2.0 tag and you want to know what the index.php file looked like back then, you would run the following command:

 

git show v2.0:index.php

 

Of course, you can also redirect the output to another file so that you have both versions (the current one and the old one) available in parallel with the following command:

 

git show v2.0:index.php > old_index.php

 

Viewing Differences between Files (git diff)

To determine what has changed between the current version and an old version of a file, you should use git diff. Let’s consider how the index.php file has changed since version 2.0. The output consists of several blocks, which are introduced with @@ and indicate the position. For orientation, a few lines of code help set the context. This information is followed by the changed lines, preceded by - or + depending on whether the line was deleted or added.

 

git diff v2.0 index.php

 

   diff --git a/index.php b/index.php

   index a41783c..d1e3af2 100644

   --- a/index.php

   +++ b/index.php

   @@ -10,9 +10,9 @@ try {

      exit();

   }

   -try {

   - $ctl->checkAccess();

   -} catch (Exception $e) {

   +if ($ctl->checkAccess() === TRUE) {

   + $ctl->showRequestedPage();

   +} else {

      if ($ctl->isJSONRequest()) {

         $data = new stdClass();

         $data->error = true;

   @@ -29,4 +29,3 @@ try {

         exit();

   }

}

-$ctl->showRequestedPage();

 

If you’re only interested in the scope of the changes, you can additionally pass the -- compact-summary option:

 

git diff --compact-summary v2.0 index.php

   index.php | 7 +++----

   1 file changed, 3 insertions(+), 4 deletions(-)

 

The git diff <revision1>..<revision2> <file> command shows the changes between two old versions:

 

git diff --compact-summary v1.0..v2.0 index.php

 

Of course, to git diff, you can pass the hash codes of commits, the names of branches, or other references instead of tags or versions. Note that the rather convenient HEAD@{2.weeks.ago} notation for timing only works for locally performed commits (i.e., only for actions stored in the reflog). Apart from this notation, no other options can time a comparison commit. You may need to first use git log to find a timed commit and then pass its hash code to git diff.

 

Range Syntax with Three Periods: The git diff <rev1>...<rev2> variant is especially useful when the revisions are branches. In this case, git diff first determines the last common base of both branches and then shows what has changed in <rev2> compared to the last common commit. Unlike <rev1>..<rev2>, however, all changes that have happened in <rev1> since then are ignored.

 

Viewing Differences between Commits

If you choose not to specify a file in git diff, Git will show you all files changed since the specified version or between two versions/commits. Again, the --compact-summary option is useful if you just want an overview for the time being.

 

In case of extensive changes, not enough space is available to output a + or a - for each changed line. Instead, the total number of changed lines is specified after |. The number of plus signs and minus signs is relative to the file with the largest number of changes. The longer the bar of characters, the more extensive the changes.

 

git diff --compact-summary v1.0..v2.0 index.php

 

   css/autocompleteList.css                | 225 +-

   css/editproject.css (new)               | 13 +

   css/edituser.css                        | 99 +-

   css/iprot.css                           | 648 ++++-

   css/iprot/jquery-ui-1.8.13.custom.css   | 2 +-

   css/mobile.css (new)                    | 17 +

   ...

   269 files changed, 22819 insertions(+), 12792 deletions(-)

 

In rare cases, you’re simply interested in all the changes. Two options can help you to specifically limit the result:

  • You can use -G <pattern> to specify a search pattern (a regular expression). git diff will then return only the text files whose changes contain the search expression, with exact case matching.
  • --diff-filter=A|C|D|M|R filters out those files that have been added, copied, deleted, modified, or renamed, respectively.

For example, the following command returns the files that have been modified between version 1.0 and 2.0 and whose code contains the search text PDF:

 

git diff -G PDF --diff-filter=M --compact-summary v1.0..v2.0

 

Changes since the Last Commit: Before running git commit, a good idea is to retrieve an overview of the changes in all the files flagged for commit, which is exactly what git diff --staged does. If you haven’t run git add yet or plan to use git commit -a, the git diff command will display all recent changes without any additional parameters. (This option doesn’t include new files that aren’t yet under version control.)

 

Searching Files (git grep)

In the numerous files of your huge project, at what points is function X called or an object of class Y created? The answer to such questions is provided by git grep <pattern>. By default, this command considers all files in the project directory and lists the lines where the search expression occurs in exact case. (If you don’t want to differentiate between uppercase and lowercase, add the -i option.)

 

git grep SKAction

   ios-pacman/Maze.swift: let setGlitter = SKAction.setTextur...

   ios-pacman/Maze.swift: let setStandard = SKAction.setText...

   ios-pacman/Maze.swift: let waitShort = SKAction.wait(forDu...

   ...

 

You can get a more compact search result by using --count. In this case, git grep only shows how many times the search expression occurs in each file:

4

git grep --count CGSize

   ios-pacman/CGOperators.swift:6

   ios-pacman/Global.swift:1

   ios-pacman/Maze.swift:4

   ...

 

You can limit the search by specifying files or directories. The following command searches the files in the css directory for the keyword margin. Because of the -n option, the line number is also given for each location:

 

git grep -n margin css/

   css/config.json:100: "@form-group-margin-bottom": "15px",

   css/config.json:144: "@navbar-margin-bottom": "@line-heig...

   css/editglobal.css:25: margin-top: 1px;

   css/editglobal.css:29: margin-top: 0px;

   ...

 

Of course, you can also search old versions of your code by specifying the desired revision before the filenames or directories. If the search expression contains special characters or spaces, you must place it between apostrophes. For instance, the following example looks for UPDATE commands in version 2.0 of the program that modify the person table:

 

git grep 'UPDATE person' v2.0

   v2.0:lib/delete.php:             $sql = "UPDATE person SET sta...

   v2.0:lib/person.php:             $sql = sprintf("UPDATE person...

   v2.0:lib/personengruppe.php:     $sql = sprintf("UPDATE person...

   ...

 

What makes git grep difficult to use is when you don’t know which commit to look in or when you’re dealing with changes that were only made temporarily and later removed from the codebase. In these cases, use git rev-list v1.0..v2.0 to create a list of the hash codes for all the commits during the period in question. You can then process this list using git grep.

 

For example, you can use git grep to count how many times the SQL keyword UPDATE occurs in various versions of the lib/chapter.php file. As with git log, the latest commit is considered first. The -- characters separate the hash code list generated by git revlist from the filename:

 

git grep -c 'UPDATE' $(git rev-list v1.0..v2.0) -- user.php

   262d67fed686cda939092e7b0cb337bbc1e2dbe9:user.php:5

   96d0a06d389784ec93f252a097185ee3678a2c1c:user.php:5

   c07c2f0ce5682bea898ba3a65a15bf5230dd23dc:user.php:4

   ...

 

Determining the Authorship of Code (git blame)

When you’ve found the file you’re actually interested in with the commands we’ve described so far, the next question is of course: Who is responsible for the code contained there? A great tool for this purpose is git blame <file>. Without any further options, this command displays the file in question line by line and indicates, for each line, some key information, including which commit changed the line, by which author, and on what date.

 

Authorship of the Linux Kernel File “signal.c”

 

The option -L 100,200 considers only the line numbers 100 to 200. A great help in reading the outputs are the following two options:

  • --color-lines displays continuation lines from the same commit in blue color.
  • --color-by-age indicates freshly changed code in red (changes in the preceding month) and moderately new code in white (changes in the preceding year).

An even clearer representation of the use of git blame is provided by the websites for GitLab, GitHub, etc. In addition, you can view the relevant commit directly on those platforms with just a few clicks.

 

Boundary Commits: If not all commits are contained in the local repository, individual hash codes are prefixed with the ^ character (called a caret), as in ^1da177e4c3f4. This character points to a boundary commit, that is, the last commit available in the repository.

 

Editor’s note: This post has been adapted from a section of the book Git: Project Management for Developers and DevOps Teams by Bernd Öggl and Michael Kofler.

 

Recommendation

Git: Project Management for Developers and DevOps Teams
Git: Project Management for Developers and DevOps Teams

Get started with Git—today! Walk through installation and explore the variety of development environments available. Understand the concepts that underpin Git’s workflows, from branching to commits, and see how to use major platforms, like GitHub. Learn the ins and outs of working with Git for day-to-day development. Get your versioning under control!

Learn More
Rheinwerk Computing
by Rheinwerk Computing

Rheinwerk Computing is an imprint of Rheinwerk Publishing and publishes books by leading experts in the fields of programming, administration, security, analytics, and more.

Comments