PHP Unique Hash

I want a short alphanumeric hash that’s unique and who’s sequence is difficult to deduce. I could run it out to md5 and trim the first n chars but that’s not going to be very unique. Storing a truncated checksum in a unique field means that the frequency of collisions will increase geometrically as the number of unique keys for a base 62 encoded integer approaches 62^n. I’d rather do it right than code myself a timebomb. So I came up with this.

I’ll walk us through the finer points below.

<?php
 
class PseudoCrypt {
 
    /* Key: Next prime greater than 62 ^ n / 1.618033988749894848 */
    /* Value: modular multiplicative inverse */
    private static $golden_primes = array(
        '1'                  => '1',
        '41'                 => '59',
        '2377'               => '1677',
        '147299'             => '187507',
        '9132313'            => '5952585',
        '566201239'          => '643566407',
        '35104476161'        => '22071637057',
        '2176477521929'      => '294289236153',
        '134941606358731'    => '88879354792675',
        '8366379594239857'   => '7275288500431249',
        '518715534842869223' => '280042546585394647'
    );
 
    /* Ascii :                    0  9,         A  Z,         a  z     */
    /* $chars = array_merge(range(48,57), range(65,90), range(97,122)) */
    private static $chars62 = array(
        0=>48,1=>49,2=>50,3=>51,4=>52,5=>53,6=>54,7=>55,8=>56,9=>57,10=>65,
        11=>66,12=>67,13=>68,14=>69,15=>70,16=>71,17=>72,18=>73,19=>74,20=>75,
        21=>76,22=>77,23=>78,24=>79,25=>80,26=>81,27=>82,28=>83,29=>84,30=>85,
        31=>86,32=>87,33=>88,34=>89,35=>90,36=>97,37=>98,38=>99,39=>100,40=>101,
        41=>102,42=>103,43=>104,44=>105,45=>106,46=>107,47=>108,48=>109,49=>110,
        50=>111,51=>112,52=>113,53=>114,54=>115,55=>116,56=>117,57=>118,58=>119,
        59=>120,60=>121,61=>122
    );
 
    public static function base62($int) {
        $key = "";
        while(bccomp($int, 0) > 0) {
            $mod = bcmod($int, 62);
            $key .= chr(self::$chars62[$mod]);
            $int = bcdiv($int, 62);
        }
        return strrev($key);
    }
 
    public static function hash($num, $len = 5) {
        $ceil = bcpow(62, $len);
        $primes = array_keys(self::$golden_primes);
        $prime = $primes[$len];
        $dec = bcmod(bcmul($num, $prime), $ceil);
        $hash = self::base62($dec);
        return str_pad($hash, $len, "0", STR_PAD_LEFT);
    }
 
    public static function unbase62($key) {
        $int = 0;
        foreach(str_split(strrev($key)) as $i => $char) {
            $dec = array_search(ord($char), self::$chars62);
            $int = bcadd(bcmul($dec, bcpow(62, $i)), $int);
        }
        return $int;
    }
 
    public static function unhash($hash) {
        $len = strlen($hash);
        $ceil = bcpow(62, $len);
        $mmiprimes = array_values(self::$golden_primes);
        $mmi = $mmiprimes[$len];
        $num = self::unbase62($hash);
        $dec = bcmod(bcmul($num, $mmi), $ceil);
        return $dec;
    }
 
}
 
echo "<pre>";
 
foreach(range(1, 10) as $n) {
    echo $n." - ";
    $hash = PseudoCrypt::hash($n, 6);
    echo $hash." - ";
    echo PseudoCrypt::unhash($hash)."<br/>";
}

Output:

1 - cJinsP - 1
2 - EdRbko - 2
3 - qxAPdD - 3
4 - TGtDVc - 4
5 - 5ac1O1 - 5
6 - huKpGQ - 6
7 - KE3d8p - 7
8 - wXmR1E - 8
9 - YrVEtd - 9
10 - BBE2m2 - 10

Pretty random-looking, huh?

Base 62

Hashes are base 62 encoded base 10 integers. 1=1, 10=a, 36=Z, 61=z, 62=10, 72=1a, etc.

62, 3844, 238328, 14776336, 916132832, 56800235584, 3521614606208

1 character = 62 permutations, 2 characters = 3844 permutations, etc.

41, 2377, 147299, 9132313, 566201239, 35104476161, 2176477521929

41 = next highest prime from golden mean of 62.
2377 = next highest prime from golden mean of 3844.

Uniqueness

I chose to use primes to ensure hash uniqueness. Any prime greater than one half 62^n will do, but if you use a prime near 62^n or 62^n/2 or 2*62^n/3 etc, you will detect a linearity in the sequence at certain points in the ring.

Appearance of randomness

I chose primes near the golden ratio to maximize the appearance of randomness. Given a small set of hashes (even with the associated id) it would be difficult for anyone to guess the next hash.

This is a minimum security technique.

Keep your primes a secret and limit the number of hashes a user can get his hands on to make it harder for script kiddies to reverse engineer your algo. This is a thin rotation and base re-encoding obfuscation algorithm, not an encryption algorithm. Don’t use this to crypt sensitive info. Use it to obfuscate integer IDs.

Tools Used

Golden Ratio Calculator
Prime Numbers Generator

Special thanks to:

34 Comments so far

  1. German on February 9th, 2010

    Nice! this is what I was looking for. But it’d be nicer if you could code a decoder to get the original number back. Is there any chance I could get that? I don’t think ‘m capable to code it myself. Thanks for your time anyway!

  2. Pablo on February 11th, 2010

    Hey Kev, that’s very useful!
    Obviously not if you need something secure, but to obfuscate ids it’s quite simple and powerful (compared with some things going around in the web).

    Have you ever needed to unhash its results?
    it may be of some use if you are hashing ids…

  3. Tim on April 1st, 2010

    Wonderful function — but the output diverges above 58 on two systems I’m testing on. I suspect it’s because one’s a 32bit machine, and the other 64.

    Any thoughts on a fix for that?

    If I find one, I’ll come back and post.

  4. Morteza on July 6th, 2010

    Hey, that’s very useful!! I wanted to use unique IDs to generate a code as password for my users so they can come back later on the site, and enter this code, so they can edit what they entered. your algorithm is useful in my case.

    thank you.

    @Tim: I think, not needing the inverse function, it is not important to have different values for one number.

  5. Warren Wilkinson on March 17th, 2011

    Hey German, if you want decodable ones — would printing random numbers in base 64 work? Most languages let you print in base 16 (hex) and usually (with some sleuthing) have ways to read/write base 64.

  6. Druczki Pocztowe on March 28th, 2011

    Try rining the script with: PseudoCrypt::udihash($id, 10)

    All keys wil have values:
    000000000000
    000000000000
    000000000000
    000000000000
    000000000000
    000000000000
    000000000000
    000000000000
    000000000000
    000000000000

    How to solve it?

  7. KevBurnsJr on March 28th, 2011

    10 is probably too large. I suggest you run it w/ 5 or 6 and append some random data.

    $hash = PseudoCrypt::udihash($id, 5);
    $rand = PseudoCrypt::base62(rand(0,pow(62,5)));
    $rand = str_pad($rand, 5, “0″, STR_PAD_LEFT);
    $key = $hash.$rand;

  8. John on March 28th, 2011

    Hi, I implemented this in Factor, if you’re curious:

    http://re-factor.blogspot.com/2011/03/unique-hash.html

  9. Jenny on May 6th, 2011

    How many total “unique” possibilities are we talking about here?

  10. Jenny on May 6th, 2011

    Me again :)
    I wanted to know if you think it would be safe to run this command instead of your look.

    echo PseudoCrypt::udihash(time());

    I’m trying to create unique HASH to be inserted as KEY on database, but generating your way the hash always stays the same.

    Please let me know if there is any draw back to this solution.

    Thank you!

    Jenny

  11. KevBurnsJr on May 6th, 2011

    Using time() would probably not be a good idea, just because there’s a good chance that two records may be created within the same second.

    What I usually do instead for keys is

    echo time().’-’.base_convert(rand(1,getrandmax()), 10, 36);

    which yields a key like 1304134276-udxxi7

  12. Alec Gorge on September 19th, 2011

    Kevin,

    Great algorithm! I wanted to use it in my upcoming Node.js project, so I made a CommonJS/Node.js module available here: https://github.com/alecgorge/node-psuedohash

    Thanks again for putting this together!

  13. Carl on November 23rd, 2011

    @Druczki

    I realize your post is very old but I figure this might help someone else.

    You will need more primes. If you look inside udihash function, the prime number is the one at the index corresponding to the length of your hash.

    The golden_primes array only contains 8 values so referencing index 10 will fail. If you have error_reporting disabled, this will likely produce a false, null, or zero result which will result in the zero hashes you are getting.

  14. Ernesto on December 19th, 2011

    Using php standard BC Math library You can run PseudoCrypt::udihash($id, 10)

    class PseudoCrypt {

    /* Next prime greater than 62 ^ n / 1.618033988749894848 */
    private static $golden_primes = array(
    ‘1′,
    ‘41′,
    ‘2377′,
    ‘147299′,
    ‘9132313′,
    ‘566201239′,
    ‘35104476161′,
    ‘2176477521929′,
    ‘134941606358731′,
    ‘8366379594239857′,
    ‘518715534842869223′
    );

    /* Ascii : 0 9, A Z, a z */
    /* $chars = array_merge(range(48,57), range(65,90), range(97,122)) */
    private static $chars = array(
    0=>48,1=>49,2=>50,3=>51,4=>52,5=>53,6=>54,7=>55,8=>56,9=>57,10=>65,
    11=>66,12=>67,13=>68,14=>69,15=>70,16=>71,17=>72,18=>73,19=>74,20=>75,
    21=>76,22=>77,23=>78,24=>79,25=>80,26=>81,27=>82,28=>83,29=>84,30=>85,
    31=>86,32=>87,33=>88,34=>89,35=>90,36=>97,37=>98,38=>99,39=>100,40=>101,
    41=>102,42=>103,43=>104,44=>105,45=>106,46=>107,47=>108,48=>109,49=>110,
    50=>111,51=>112,52=>113,53=>114,54=>115,55=>116,56=>117,57=>118,58=>119,
    59=>120,60=>121,61=>122
    );

    public static function base62($int) {
    $key = “”;
    while(bccomp($int,0) > 0) {
    $mod = bcmod($int,62);
    $key .= chr(self::$chars[$mod]);
    $int = bcdiv($int,62);
    }
    return strrev($key);
    }

    public static function udihash($num, $len = 5) {
    $ceil = bcpow(62, $len);
    $prime = self::$golden_primes[$len];
    $dec = bcmod(bcmul($num,$prime),$ceil);
    $hash = self::base62($dec);
    return str_pad($hash, $len, “0″, STR_PAD_LEFT);
    }

    }

    I use this to create random-looking unique ids:

    function bchexdec($hex) {
    if(strlen($hex) == 1) {
    return hexdec($hex);
    } else {
    $remain = substr($hex, 0, -1);
    $last = substr($hex, -1);
    return bcadd(bcmul(16, bchexdec($remain)), hexdec($last));
    }
    }

    $uid=bchexdec(uniqid());
    $hash=PseudoCrypt::udihash($uid,10);

    I got over 20 million ID without collision before deciding to stop the script.
    Nice algorithm, many thanks!!!

  15. fatih on January 14th, 2012

    veryclever indeed. how about 3 char length hashes? is it safe to use it that low?

  16. [...] of an md5 or something like that, but why reinvent the wheel – I found a small libaray here: http://blog.kevburnsjr.com/php-unique-hash which does the trick quite [...]

  17. kalya on June 11th, 2012

    How to decode it back to its ids? In this example to 1,2 … 10? or i missed the point and this is one-way hash?

  18. KevBurnsJr on June 11th, 2012

    Ya, definitely a 1-way hash.

  19. Padraig on June 19th, 2012

    It can be reversed. Find the modular multiplicative inverse of the prime and the ceiling then run the process with that number in place of your $prime.

    http://en.wikipedia.org/wiki/Modular_multiplicative_inverse

    Example, say 27 is the index we want to obfuscate:

    27 * 2176477521929 mod 62^7 = 2419059392755

    http://www.wolframalpha.com/input/?i=27+*+2176477521929+mod+62%5E7

    The modular multiplicative inverse is:
    294289236153

    http://www.wolframalpha.com/input/?i=PowerMod%5B2176477521929%2C-1%2C62%5E7%5D

    And so 2419059392755 * 294289236153 mod 62^7 = 27

    PS: If you are dealing with very long numbers like this, you should be using the bcmath functions in PHP.

  20. Manish on December 2nd, 2012

    Added these for using only uppercase and numbers. Simple test cases looked ok. Any comments/issues to anticipate?

    public static function base36($int) {
    $key = “”;
    while($int > 0) {
    $mod = $int-(floor($int/36)*36);
    $key .= chr(self::$chars[$mod]);
    $int = floor($int/36);
    }
    return strrev($key);
    }

    public static function udihash36($num, $len = 5) {
    $ceil = pow(36, $len);
    $prime = self::$golden_primes[$len];
    $dec = ($num * $prime)-floor($num * $prime/$ceil)*$ceil;
    $hash = self::base36($dec);
    return str_pad($hash, $len, “0″, STR_PAD_LEFT);
    }

  21. Lucius on December 15th, 2012

    Talk about zombie posting, but I thought I’d share some modifications.

    It now has a constructor in place of the public static udihash function, so you can use it as $foo = new udihash($bar);

    Rather than computing the modulus longhand, I replaced the calculations with the modulus operator. (%)

    I also removed the lookup array for the letters, and calculated them directly as offset from $mod.

    and finally, I added an internal container for the udihash object, and a __toString() function, so that the hash can be used by any function requiring a string.

    class udihash {

    /* Next prime greater than 62 ^ n / 1.618033988749894848 */
    public $golden_primes = array(
    1,41,2377,147299,9132313,566201239,35104476161,2176477521929
    );
    public $udi;

    public static function base62($int) {
    $key = “”;
    while($int > 0) {
    $mod = $int % 62;
    if($mod 17 && $mod golden_primes[$len];
    $dec = ($num * $prime) % $ceil;
    $hash = $this->base62($dec);
    $this->udi = str_pad($hash, $len, “0″, STR_PAD_LEFT);
    }
    public function __toString() {
    return $this->udi;
    }

    }

  22. KevBurnsJr on February 8th, 2013

    Updated to use bcmath for up to 10 chars and now includes unhash method!

  23. Brian on February 11th, 2013

    Thanks very much for updating with bcmath! :) I’d been working on that too!

    When I tried your updated code I got some issues with the hash coming out as 12 characters even though I’d set it to 6.

    I discovered that bcdiv was producing answers with fractional parts, and therefore using extra digits to the right of the decimal point. This made the foreach loop in “base62″ keep going for longer than it needed to (since we only need to deal with integers here).

    I fixed it by adding the optional “scale” parameter for bcdiv and setting it to zero. Now the result of the division is truncated to be an integer, which is the correct behaviour.
    $int = bcdiv($int, 62, 0);

    When I tested this I got the same results as you’ve posted for the first 10 values. And in theory it should be slightly quicker since bcdiv doesn’t have to bother calculating any fractions!

    The thing I can’t work out is why my original problem only seemed to occur intermittently. Sometimes the hash was 6 characters as expected, sometimes it was 12. That’s a bit scary so if anyone has any theories, please let me know!

  24. Omar on February 13th, 2013

    Thank you very much,

    I found a problem in this class
    when i try to generate hash for number (1)
    PseudoCrypt::hash(1, 6)
    some times i got: 000000cJinsP
    I have fixed this issue by replacing (in base62 method)
    while(bccomp($int, 0) > 0) {
    to:
    while(bccomp($int-1, 0) > 0) {

    is this true?

  25. Soo on February 13th, 2013

    Call me stupid but I believe one should generate their own $golden_primes array correct? It’s not entirely clear – to those of us with math deficiencies – how we are to do this …

  26. KevBurnsJr on February 13th, 2013

    Any prime number greater than 1/2 62^n will work.

    Here’s a tool for finding prime numbers
    http://www.numberempire.com/primenumbers.php

  27. Nebula on March 3rd, 2013

    I’ve learnt a lot from this, thanks for keeping it updated.

    The only thing I’m stuck on is how you calculated the modular multiplicative inverse values in your `golden_primes` array?

  28. KevBurnsJr on March 18th, 2013

    See Padraig’s comment above.

  29. [...] In my search for efficient algorithms that create unique indexes based on arbitrary values, I found this wonderful snippet of code on a Developer’s blog. [...]

  30. Wayne Smallman on May 31st, 2013

    Hi Kevin, and thanks for the code!

    I’m attempting to use it purely for URL shortening purposes, rather than encryption. I’m using: hash(rand(1,6), 6) to generate the code.

    However, I’ve hit a snag; every code being generated is preceded by 6 zeros, like “000005ac1O1″.

    While I can see that you’re using str_pad() to do this, why isn’t this happening with your own codes?

    And when I run your foreach() I get an error on: $mmi = $mmiprimes[$len]; in the unhash() function.

    I don’t know enough about mathematics to even begin disentangling things, so any advice would be great.

  31. Wayne Smallman on May 31st, 2013

    Also, the codes that are being generated are exactly the same as those in the foreach() here, cycling over and over again.

  32. KevBurnsJr on May 31st, 2013

    So your use of random there is actually not the way this library is intended to be used. The first parameter to the hash function should be one of a unique series of integers (for instance, an autoincrement primary key from a relational database). Inputing the same number twice will produce the same result.

    What’s the error you’re getting?
    Did you copy the code correctly?
    Are you on a 32 or 64 bit machine?
    What operating system?
    Does your php environment have bcmath php ext installed?

    I suspect that you are on a 32 bit system.

  33. Cliff Stanford on July 9th, 2013

    This is wonderful stuff.

    I’ve recoded it into Perl and made it work for 62 bit and 36 bit (upper [or lower] case only). When I finish the current project, I’ll push it to cpan but it’s a bit frantic at the moment.

    FWIW, 36 bit hashes with a string length of 4 will happily encode 0 .. 99_999 uniquely which is brilliant for the application I have: a shortener that can be read out over the phone.

    Thanks again, Mr Burns.

Leave a reply