Hash

Base64

Base64 is a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 (6 bits) representation (1 char representation 1 byte).

Example, how many chars of base64 to represent 6 Billion urls?

$$ \lceil\log_{64}{6e9}\rceil = \lceil5.4\rceil = 6

$$

import base64
encoded = base64.b64encode('data to be encoded')
print encoded  # >>> ZGF0YSB0byBiZSBlbmNvZGVk
data = base64.b64decode(encoded)
print data  # >>> data to be encoded

Hashing

Shortening an url:

hashed_url = base64(md5(url+salf))[:6]

Constructors for hash algorithms that are always present in this module are md5(), sha1(), sha224(), sha256(), sha384(), and sha512().

import hashlib
m = hashlib.md5()
m.update("Nobody inspects")
m.update(" the spammish repetition")  # m.update(a); m.update(b) is equivalent to m.update(a+b)
print m.hexdigest()

print hashlib.md5("whatever your string is").hexdigest()

Bytes and scale

  • 1M (million): 1MB
  • 1B (million): 1GB
  • 1T (trillion): 1TB