2.1.3. Some related Unix tools¶
2.1.3.1. Hashing¶
You may have wondered what the hashes are about that git uses.
In general, a hash function takes any data as input and creates a hash value of the data. You can try this yourself by e.g. running the following command on Unix:
$ sha1sum with.py
4f2fb68a29c3a1f9978be115a1798371a57e9ae9 with.py
Here, we run the command sha1sum which calculates the SHA-1 hash of the file with.py. If you don’t have sha1sum you may try e.g. md5sum (or possibly md5 on Mac). Hash functions can have the following properties:
- Changing a file insignificantly (e.g. by adding one byte) may significantly change the hash (e.g. result in a completely different hash)
- The hash function may be cryptographically secure - i.e. it is difficult or impossible to modify the input data such that the resulting hash would still be the same
In general, if you know the hash of a file, you can calculate the hash to check whether the file has been modified or corrupted. git uses hashes to uniquely identify commits and to protect against data corruption.
Exercise: Look up the definition of SHA-1 hash function online. You can e.g. find an implementation in pseudocode.
2.1.3.2. diff and patch¶
“git diff” gives a practical output of a difference between a file before and after a change:
$ git diff
diff --git a/with.py b/with.py
index f61db97..d63b0bf 100644
--- a/with.py
+++ b/with.py
@@ -1,3 +1,3 @@
with open('test.txt', 'w') as f:
for i in xrange(5):
- f.write("%f %f\n" % (0.2, 0.5))
+ f.write("%f %f\n" % (0.0, 1.0))
In general, you can diff any two files by running the utility “diff”. Conventionally the switch “-u” is used to display the output in unified form, which is also the default git uses:
$ diff -u with2.py with.py
--- with2.py 2018-03-25 22:34:47.530840487 +0200
+++ with.py 2018-03-25 22:05:25.477035716 +0200
@@ -1,3 +1,3 @@
with open('test.txt', 'w') as f:
for i in xrange(5):
- f.write("%f %f\n" % (0.0, 1.0)
+ f.write("%f %f %f\n" % (0.0, 0.5, 1.0)
What can be useful is redirecting diff output to a file. There’s another utility called patch which takes the output from diff to actually make changes to a file, i.e. patch them. Let’s say someone sent us the above diff output and we had our file with.py which we wanted to patch:
$ patch -p0 < with.diff
patching file with.py
Here, “patch” will modify our with.py according to the diff.