Some related Unix tools
-----------------------

Hashing
=======

You may have wondered what the hashes are about that git uses.

In general, a hash function takes any data as input and creates a hash value of the data. You can try this yourself by e.g. running the following command on Unix:

.. code-block:: bash

    $ sha1sum with.py
    4f2fb68a29c3a1f9978be115a1798371a57e9ae9  with.py

Here, we run the command sha1sum which calculates the SHA-1 hash of the file with.py. If you don't have sha1sum you may try e.g. md5sum (or possibly md5 on Mac). Hash functions can have the following properties:

* Changing a file insignificantly (e.g. by adding one byte) may significantly change the hash (e.g. result in a completely different hash)
* The hash function may be *cryptographically secure* - i.e. it is difficult or impossible to modify the input data such that the resulting hash would still be the same

In general, if you know the hash of a file, you can calculate the hash to check whether the file has been modified or corrupted. git uses hashes to uniquely identify commits and to protect against data corruption.

*Exercise*: Look up the definition of SHA-1 hash function online. You can e.g. find an implementation in pseudocode.

diff and patch
==============

"git diff" gives a practical output of a difference between a file before and after a change:

.. code-block:: bash

    $ git diff
    diff --git a/with.py b/with.py
    index f61db97..d63b0bf 100644
    --- a/with.py
    +++ b/with.py
    @@ -1,3 +1,3 @@
     with open('test.txt', 'w') as f:
         for i in xrange(5):
    -        f.write("%f %f\n" % (0.2, 0.5))
    +        f.write("%f %f\n" % (0.0, 1.0))

In general, you can *diff* any two files by running the utility "diff". Conventionally the switch "-u" is used to display the output in *unified form*, which is also the default git uses:

.. code-block:: bash

    $ diff -u with2.py with.py
    --- with2.py	2018-03-25 22:34:47.530840487 +0200
    +++ with.py	2018-03-25 22:05:25.477035716 +0200
    @@ -1,3 +1,3 @@
     with open('test.txt', 'w') as f:
         for i in xrange(5):
    -        f.write("%f %f\n" % (0.0, 1.0)
    +        f.write("%f %f %f\n" % (0.0, 0.5, 1.0)
    
What can be useful is redirecting diff output to a file. There's another utility called *patch* which takes the output from diff to actually make changes to a file, i.e. patch them. Let's say someone sent us the above diff output and we had our file with.py which we wanted to patch:

.. code-block:: bash

    $ patch -p0 < with.diff
    patching file with.py

Here, "patch" will modify our with.py according to the diff.