PHP Unique Hash
I want a short alphanumeric hash that’s unique and who’s sequence is difficult to deduce. I could run it out to md5 and trim the first n chars but that’s not going to be very unique. Storing a truncated checksum in a unique field means that the frequency of collisions will increase geometrically as the number of unique keys for a base 62 encoded integer approaches 62^n. I’d rather do it right than code myself a timebomb. So I came up with this.
I’ll walk us through the finer points below.
<?php class PseudoCrypt { /* Key: Next prime greater than 62 ^ n / 1.618033988749894848 */ /* Value: modular multiplicative inverse */ private static $golden_primes = array( '1' => '1', '41' => '59', '2377' => '1677', '147299' => '187507', '9132313' => '5952585', '566201239' => '643566407', '35104476161' => '22071637057', '2176477521929' => '294289236153', '134941606358731' => '88879354792675', '8366379594239857' => '7275288500431249', '518715534842869223' => '280042546585394647' ); /* Ascii : 0 9, A Z, a z */ /* $chars = array_merge(range(48,57), range(65,90), range(97,122)) */ private static $chars62 = array( 0=>48,1=>49,2=>50,3=>51,4=>52,5=>53,6=>54,7=>55,8=>56,9=>57,10=>65, 11=>66,12=>67,13=>68,14=>69,15=>70,16=>71,17=>72,18=>73,19=>74,20=>75, 21=>76,22=>77,23=>78,24=>79,25=>80,26=>81,27=>82,28=>83,29=>84,30=>85, 31=>86,32=>87,33=>88,34=>89,35=>90,36=>97,37=>98,38=>99,39=>100,40=>101, 41=>102,42=>103,43=>104,44=>105,45=>106,46=>107,47=>108,48=>109,49=>110, 50=>111,51=>112,52=>113,53=>114,54=>115,55=>116,56=>117,57=>118,58=>119, 59=>120,60=>121,61=>122 ); public static function base62($int) { $key = ""; while(bccomp($int, 0) > 0) { $mod = bcmod($int, 62); $key .= chr(self::$chars62[$mod]); $int = bcdiv($int, 62); } return strrev($key); } public static function hash($num, $len = 5) { $ceil = bcpow(62, $len); $primes = array_keys(self::$golden_primes); $prime = $primes[$len]; $dec = bcmod(bcmul($num, $prime), $ceil); $hash = self::base62($dec); return str_pad($hash, $len, "0", STR_PAD_LEFT); } public static function unbase62($key) { $int = 0; foreach(str_split(strrev($key)) as $i => $char) { $dec = array_search(ord($char), self::$chars62); $int = bcadd(bcmul($dec, bcpow(62, $i)), $int); } return $int; } public static function unhash($hash) { $len = strlen($hash); $ceil = bcpow(62, $len); $mmiprimes = array_values(self::$golden_primes); $mmi = $mmiprimes[$len]; $num = self::unbase62($hash); $dec = bcmod(bcmul($num, $mmi), $ceil); return $dec; } } echo "<pre>"; foreach(range(1, 10) as $n) { echo $n." - "; $hash = PseudoCrypt::hash($n, 6); echo $hash." - "; echo PseudoCrypt::unhash($hash)."<br/>"; }
Output:
1 - cJinsP - 1 2 - EdRbko - 2 3 - qxAPdD - 3 4 - TGtDVc - 4 5 - 5ac1O1 - 5 6 - huKpGQ - 6 7 - KE3d8p - 7 8 - wXmR1E - 8 9 - YrVEtd - 9 10 - BBE2m2 - 10
Pretty random-looking, huh?
Base 62
Hashes are base 62 encoded base 10 integers. 1=1, 10=a, 36=Z, 61=z, 62=10, 72=1a, etc.
62, 3844, 238328, 14776336, 916132832, 56800235584, 3521614606208
1 character = 62 permutations, 2 characters = 3844 permutations, etc.
41, 2377, 147299, 9132313, 566201239, 35104476161, 2176477521929
41 = next highest prime from golden mean of 62.
2377 = next highest prime from golden mean of 3844.
Uniqueness
I chose to use primes to ensure hash uniqueness. Any prime greater than one half 62^n will do, but if you use a prime near 62^n or 62^n/2 or 2*62^n/3 etc, you will detect a linearity in the sequence at certain points in the ring.
Appearance of randomness
I chose primes near the golden ratio to maximize the appearance of randomness. Given a small set of hashes (even with the associated id) it would be difficult for anyone to guess the next hash.
This is a minimum security technique.
Keep your primes a secret and limit the number of hashes a user can get his hands on to make it harder for script kiddies to reverse engineer your algo. This is a thin rotation and base re-encoding obfuscation algorithm, not an encryption algorithm. Don’t use this to crypt sensitive info. Use it to obfuscate integer IDs.
Tools Used
Golden Ratio Calculator
Prime Numbers Generator
Special thanks to:
- Ernesto Preste for adding bc math support
- Padraig Kennedy for adding hash reversibility
- John Benediktsson for porting this to Python and Factor
- Alec Gorge for porting this to Node.js
Nice! this is what I was looking for. But it’d be nicer if you could code a decoder to get the original number back. Is there any chance I could get that? I don’t think ‘m capable to code it myself. Thanks for your time anyway!
Hey Kev, that’s very useful!
Obviously not if you need something secure, but to obfuscate ids it’s quite simple and powerful (compared with some things going around in the web).
Have you ever needed to unhash its results?
it may be of some use if you are hashing ids…
Wonderful function — but the output diverges above 58 on two systems I’m testing on. I suspect it’s because one’s a 32bit machine, and the other 64.
Any thoughts on a fix for that?
If I find one, I’ll come back and post.
Hey, that’s very useful!! I wanted to use unique IDs to generate a code as password for my users so they can come back later on the site, and enter this code, so they can edit what they entered. your algorithm is useful in my case.
thank you.
@Tim: I think, not needing the inverse function, it is not important to have different values for one number.
Hey German, if you want decodable ones — would printing random numbers in base 64 work? Most languages let you print in base 16 (hex) and usually (with some sleuthing) have ways to read/write base 64.
Try rining the script with: PseudoCrypt::udihash($id, 10)
All keys wil have values:
000000000000
000000000000
000000000000
000000000000
000000000000
000000000000
000000000000
000000000000
000000000000
000000000000
How to solve it?
10 is probably too large. I suggest you run it w/ 5 or 6 and append some random data.
$hash = PseudoCrypt::udihash($id, 5);
$rand = PseudoCrypt::base62(rand(0,pow(62,5)));
$rand = str_pad($rand, 5, “0″, STR_PAD_LEFT);
$key = $hash.$rand;
Hi, I implemented this in Factor, if you’re curious:
http://re-factor.blogspot.com/2011/03/unique-hash.html
How many total “unique” possibilities are we talking about here?
Me again :)
I wanted to know if you think it would be safe to run this command instead of your look.
echo PseudoCrypt::udihash(time());
I’m trying to create unique HASH to be inserted as KEY on database, but generating your way the hash always stays the same.
Please let me know if there is any draw back to this solution.
Thank you!
Jenny
Using time() would probably not be a good idea, just because there’s a good chance that two records may be created within the same second.
What I usually do instead for keys is
echo time().’-’.base_convert(rand(1,getrandmax()), 10, 36);
which yields a key like 1304134276-udxxi7
Kevin,
Great algorithm! I wanted to use it in my upcoming Node.js project, so I made a CommonJS/Node.js module available here: https://github.com/alecgorge/node-psuedohash
Thanks again for putting this together!
@Druczki
I realize your post is very old but I figure this might help someone else.
You will need more primes. If you look inside udihash function, the prime number is the one at the index corresponding to the length of your hash.
The golden_primes array only contains 8 values so referencing index 10 will fail. If you have error_reporting disabled, this will likely produce a false, null, or zero result which will result in the zero hashes you are getting.
Using php standard BC Math library You can run PseudoCrypt::udihash($id, 10)
class PseudoCrypt {
/* Next prime greater than 62 ^ n / 1.618033988749894848 */
private static $golden_primes = array(
‘1′,
‘41′,
‘2377′,
‘147299′,
‘9132313′,
‘566201239′,
‘35104476161′,
‘2176477521929′,
‘134941606358731′,
‘8366379594239857′,
‘518715534842869223′
);
/* Ascii : 0 9, A Z, a z */
/* $chars = array_merge(range(48,57), range(65,90), range(97,122)) */
private static $chars = array(
0=>48,1=>49,2=>50,3=>51,4=>52,5=>53,6=>54,7=>55,8=>56,9=>57,10=>65,
11=>66,12=>67,13=>68,14=>69,15=>70,16=>71,17=>72,18=>73,19=>74,20=>75,
21=>76,22=>77,23=>78,24=>79,25=>80,26=>81,27=>82,28=>83,29=>84,30=>85,
31=>86,32=>87,33=>88,34=>89,35=>90,36=>97,37=>98,38=>99,39=>100,40=>101,
41=>102,42=>103,43=>104,44=>105,45=>106,46=>107,47=>108,48=>109,49=>110,
50=>111,51=>112,52=>113,53=>114,54=>115,55=>116,56=>117,57=>118,58=>119,
59=>120,60=>121,61=>122
);
public static function base62($int) {
$key = “”;
while(bccomp($int,0) > 0) {
$mod = bcmod($int,62);
$key .= chr(self::$chars[$mod]);
$int = bcdiv($int,62);
}
return strrev($key);
}
public static function udihash($num, $len = 5) {
$ceil = bcpow(62, $len);
$prime = self::$golden_primes[$len];
$dec = bcmod(bcmul($num,$prime),$ceil);
$hash = self::base62($dec);
return str_pad($hash, $len, “0″, STR_PAD_LEFT);
}
}
I use this to create random-looking unique ids:
function bchexdec($hex) {
if(strlen($hex) == 1) {
return hexdec($hex);
} else {
$remain = substr($hex, 0, -1);
$last = substr($hex, -1);
return bcadd(bcmul(16, bchexdec($remain)), hexdec($last));
}
}
$uid=bchexdec(uniqid());
$hash=PseudoCrypt::udihash($uid,10);
I got over 20 million ID without collision before deciding to stop the script.
Nice algorithm, many thanks!!!
veryclever indeed. how about 3 char length hashes? is it safe to use it that low?
[...] of an md5 or something like that, but why reinvent the wheel – I found a small libaray here: http://blog.kevburnsjr.com/php-unique-hash which does the trick quite [...]
How to decode it back to its ids? In this example to 1,2 … 10? or i missed the point and this is one-way hash?
Ya, definitely a 1-way hash.
It can be reversed. Find the modular multiplicative inverse of the prime and the ceiling then run the process with that number in place of your $prime.
http://en.wikipedia.org/wiki/Modular_multiplicative_inverse
Example, say 27 is the index we want to obfuscate:
27 * 2176477521929 mod 62^7 = 2419059392755
http://www.wolframalpha.com/input/?i=27+*+2176477521929+mod+62%5E7
The modular multiplicative inverse is:
294289236153
http://www.wolframalpha.com/input/?i=PowerMod%5B2176477521929%2C-1%2C62%5E7%5D
And so 2419059392755 * 294289236153 mod 62^7 = 27
PS: If you are dealing with very long numbers like this, you should be using the bcmath functions in PHP.
Added these for using only uppercase and numbers. Simple test cases looked ok. Any comments/issues to anticipate?
public static function base36($int) {
$key = “”;
while($int > 0) {
$mod = $int-(floor($int/36)*36);
$key .= chr(self::$chars[$mod]);
$int = floor($int/36);
}
return strrev($key);
}
public static function udihash36($num, $len = 5) {
$ceil = pow(36, $len);
$prime = self::$golden_primes[$len];
$dec = ($num * $prime)-floor($num * $prime/$ceil)*$ceil;
$hash = self::base36($dec);
return str_pad($hash, $len, “0″, STR_PAD_LEFT);
}
Talk about zombie posting, but I thought I’d share some modifications.
It now has a constructor in place of the public static udihash function, so you can use it as $foo = new udihash($bar);
Rather than computing the modulus longhand, I replaced the calculations with the modulus operator. (%)
I also removed the lookup array for the letters, and calculated them directly as offset from $mod.
and finally, I added an internal container for the udihash object, and a __toString() function, so that the hash can be used by any function requiring a string.
class udihash {
/* Next prime greater than 62 ^ n / 1.618033988749894848 */
public $golden_primes = array(
1,41,2377,147299,9132313,566201239,35104476161,2176477521929
);
public $udi;
public static function base62($int) {
$key = “”;
while($int > 0) {
$mod = $int % 62;
if($mod 17 && $mod golden_primes[$len];
$dec = ($num * $prime) % $ceil;
$hash = $this->base62($dec);
$this->udi = str_pad($hash, $len, “0″, STR_PAD_LEFT);
}
public function __toString() {
return $this->udi;
}
}
Updated to use bcmath for up to 10 chars and now includes unhash method!
Thanks very much for updating with bcmath! :) I’d been working on that too!
When I tried your updated code I got some issues with the hash coming out as 12 characters even though I’d set it to 6.
I discovered that bcdiv was producing answers with fractional parts, and therefore using extra digits to the right of the decimal point. This made the foreach loop in “base62″ keep going for longer than it needed to (since we only need to deal with integers here).
I fixed it by adding the optional “scale” parameter for bcdiv and setting it to zero. Now the result of the division is truncated to be an integer, which is the correct behaviour.
$int = bcdiv($int, 62, 0);
When I tested this I got the same results as you’ve posted for the first 10 values. And in theory it should be slightly quicker since bcdiv doesn’t have to bother calculating any fractions!
The thing I can’t work out is why my original problem only seemed to occur intermittently. Sometimes the hash was 6 characters as expected, sometimes it was 12. That’s a bit scary so if anyone has any theories, please let me know!
Thank you very much,
I found a problem in this class
when i try to generate hash for number (1)
PseudoCrypt::hash(1, 6)
some times i got: 000000cJinsP
I have fixed this issue by replacing (in base62 method)
while(bccomp($int, 0) > 0) {
to:
while(bccomp($int-1, 0) > 0) {
is this true?
Call me stupid but I believe one should generate their own $golden_primes array correct? It’s not entirely clear – to those of us with math deficiencies – how we are to do this …
Any prime number greater than 1/2 62^n will work.
Here’s a tool for finding prime numbers
http://www.numberempire.com/primenumbers.php
I’ve learnt a lot from this, thanks for keeping it updated.
The only thing I’m stuck on is how you calculated the modular multiplicative inverse values in your `golden_primes` array?
See Padraig’s comment above.
[...] In my search for efficient algorithms that create unique indexes based on arbitrary values, I found this wonderful snippet of code on a Developer’s blog. [...]
Hi Kevin, and thanks for the code!
I’m attempting to use it purely for URL shortening purposes, rather than encryption. I’m using: hash(rand(1,6), 6) to generate the code.
However, I’ve hit a snag; every code being generated is preceded by 6 zeros, like “000005ac1O1″.
While I can see that you’re using str_pad() to do this, why isn’t this happening with your own codes?
And when I run your foreach() I get an error on: $mmi = $mmiprimes[$len]; in the unhash() function.
I don’t know enough about mathematics to even begin disentangling things, so any advice would be great.
Also, the codes that are being generated are exactly the same as those in the foreach() here, cycling over and over again.
So your use of random there is actually not the way this library is intended to be used. The first parameter to the hash function should be one of a unique series of integers (for instance, an autoincrement primary key from a relational database). Inputing the same number twice will produce the same result.
What’s the error you’re getting?
Did you copy the code correctly?
Are you on a 32 or 64 bit machine?
What operating system?
Does your php environment have bcmath php ext installed?
I suspect that you are on a 32 bit system.
This is wonderful stuff.
I’ve recoded it into Perl and made it work for 62 bit and 36 bit (upper [or lower] case only). When I finish the current project, I’ll push it to cpan but it’s a bit frantic at the moment.
FWIW, 36 bit hashes with a string length of 4 will happily encode 0 .. 99_999 uniquely which is brilliant for the application I have: a shortener that can be read out over the phone.
Thanks again, Mr Burns.
[...] http://blog.kevburnsjr.com/php-unique-hash [...]