Cryptographic hash functions are complex mathematical calculations. Therefore understanding them requires a considerable amount of time and patience. However, they all have things in common: an input, a cryptographic algorithm, and an output.

Recently, I had a chance to study some popular cryptographic hash functions, such as MD5 and SHA-1, and tried to understand how they really work. Wikipedia pages I linked include a considerable amount of information already, and more can be found online, but what I want to do was understand similarities between them and write my own primitive hashing function in Go.

Splitting an input into chunks, dividing chunks into fixed-sized bit blocks, adding bits, padding messages with 0 bit, shifting bits, running xor on bits, and so on are some common methods that can you come across in many hashing algorithms. I will not use all these practices in my “primitive” implementation, but I will cover enough to make a reader get started.

First, let’s start by remembering the requirements for any hashing algorithm:

  • Infeasible to produce a given digest (you can’t guess possible plaintext messages)
  • Impossible to extract original message (hashing must be one-way)
  • Slight changes in the original message should produce drastic changes in the digest
  • Resulting digest is fixed-width (length)
  • Should be fast, but also not too fast!

Besides these strict requirements, we also want as few hash collision as possible. The thing is, collisions can’t be avoided in hashing, because it’s a byproduct of fixed width digests. They can only be made rare. To make them rarer, we can simply use big block sizes, and as a result, we end up with big checksums. I won’t be concerned too much about collisions or security here, as what I’m trying to write is just a primitive hashing function for educational purposes, and not something production-ready.

Here is the full code: https://gist.github.com/msdundar/f88aee1a89d603b3d81f9087e3dc1012

Step 1: Convert string to binary

We will be using a binary representation to run an XOR machine later. Therefore, any input needs to be converted to binary first:

    h        e        l        l        o        w        o        r       l        d
01101000 01100101 01101100 01101100 01101111 00100000 01110111 01101111 01110010 01101100
// stringToBinary converts strings to binary
func stringToBinary(s string) (binaryString string) {
  for _, c := range s {
    binaryString = fmt.Sprintf("%s%.8b", binaryString, c)
  }
  return 
}

Step 2: Split binary into bit-blocks and pad blocks with 0-bit

Here I use blocks of 32bits for simplicity, but in actual implementation it’s 128bits:

              hell                             owor                             ld
01101000011001010110110001101100 01101111001000000111011101101111 01110010011011000000000000000000

For this example, I decided to use blocks of 128bits, but you can configure it depending on your needs. Keep in mind, a bigger block size will perform badly for short messages but will reduce the chances of collision for long messages. Think about these tradeoffs when modifying this value.

Here, I’m splitting the binary string into blocks of 128bits, and I pad blocks with 0 bit if they are shorter than 128bits:

// splitToBlocks takes a binary string and splits it into blocks
func splitToBlocks(msg string) []string{
  desiredBlockSize := 128
  numberOfBits := len(msg)

  requiredSliceCap := int(math.Ceil(
    float64(numberOfBits)/float64(desiredBlockSize)),
  )

  fourByteSlice := make([]string, requiredSliceCap)

  byteIndex := 0

  for i := 0; i < len(msg); i++ {
    if i != 0 && i % desiredBlockSize == 0 {
      byteIndex += 1
    }

    fourByteSlice[byteIndex] = fourByteSlice[byteIndex] + string(msg[i])
  }

  // Pad with 0s to ensure all blocks are the same size
  for ; len(fourByteSlice[byteIndex]) < desiredBlockSize; {
    fourByteSlice[byteIndex] = fourByteSlice[byteIndex] + "0"
  }


  return fourByteSlice
}

Step 3: Create a simple XOR machine

Running XOR on bits is a very common technique for many hashing algorithms. When using big block sizes, each XOR makes it much harder to reverse the hashing process. However, with short messages, there will be fewer blocks to use with XOR, and the process can be reversed if no additional hardening is implemented. However, we aren’t much concerned about collisions in this primitive example.

// xorMachine runs longitudinal parity checks on given blocks
func xorMachine(item1 string, item2 string) string{
  var xorOutput string

  for i, _ := range item1 {
    if item1[i] == item2[i] {
      xorOutput += "0"
    } else {
      xorOutput += "1"
    }
  }

  return xorOutput
}

Step 4: Pass bit-blocks to XOR machine one-by-one

In this step, all we need to do is pass bit-blocks to the XOR machine one by one:

binaryStringSlc := splitToBlocks(
  stringToBinary("Hello world! This is Serhat! This post is all about implementing a primitive hashing function in Go."),
)

for i := 0; i < len(binaryStringSlc); i++{
  if !(i+1 == len(binaryStringSlc)) {
    comparisonStr := xorMachine(binaryStringSlc[i], binaryStringSlc[i+1])

    binaryStringSlc[i+1] = comparisonStr
  }
}

fmt.Println(binaryStringSlc[len(binaryStringSlc)-1])

And voila! Here is our checksum in binary:

01001111010111110010000001101001000001110110010101011010000111110001101101011101000010110000100001010110000000010000011100011000

Needless to say, you should never use this hashing function on production and you should never try to write your own encryption algorithm either. Existing hashing functions are 100x more complicated than this, and probably 1000x more secure. Therefore, you should only use battle-tested hashing algorithms such as SHA-3, bcrypt, and so on.