Hashing means using a mathematical algorithm (aka a function) to map data of virtually any size into one, simple string of letters and numbers. These letters and numbers represent the data, also known as a hash value. There are different algorithms to produce hash values, such as MD5, SHA-1, and SHA-256.
There are many similarities in concept between hashing and something like an invoice number. An invoice number is used as a method to map a customer's order into a simple series of letters and/or numbers. One can simply pull up all the information on an order by using the invoice number.
A hash value is special in that it is a one-way process, meaning if you hash a document, a tweet, or a video you will not be able to tell what the content is based on the hash alone. This is done for security. Another way that hashing is special, is that it is consistent.
Check out this example:
Here we have a phrase, which is considered data or an input
“Hashing is easy with computers”
Running this message through the MD5 hash algorithm produces the above hash value. Every time you use the MD5 algorithm for this statement, you would always get the same hash value.
If you change the letter casing of the sentence to all UPPERCASE, you would get a different hash value
“HASHING IS EASY WITH COMPUTERS”
While the statements are the same, the data is different. Uppercase characters are different than one uppercase followed by all lowercase. Therefore the hash value is different.
With the Shinydocs Cognitive Toolkit, we use the same industry-standard algorithms to generate hash values of your files. Our tool uses these algorithms to make a hash value of a file’s data, down to the binary 1’s and 0’s. The name, extension, location, creation time, etc. do not change or impact the hash value of a file. The file’s true contents are processed through the Cognitive Toolkit and a hash value is calculated.
Here’s another example:
File A: WelcomeToAcmeCorp.pptx
File B: saved_presentation.pptx
Notice how the hash value is the same, but the names are different? That’s because the actual content is EXACTLY the same. If there was even one letter changed, one extra space or an image in the Powerpoint moved a couple of pixels, the hash value would be different.
0 byte files, aka files that contain no data will always result in the same hash value. This is due to how consistent hash functions are. A 0 byte file will be the same no matter where in the world you are.
0 Byte File Hash Values