A few months back, attackers got the email addresses and passwords of 130 million Adobe users. Adobe encrypted the passwords and was ridiculed for not hashing the values (considered best practice), but were they wrong?
As of late, I have been digging into passwords breaches, hashing, etc. I have been trying to grab as many of the breached password datasets as possible (please let me know if you have any) and came across the leaked adobe passwords. Normally, passwords are hashed; a one way transformation of the password that is stored on the system.
HashFunction(Password) -> PasswordHash
When the user authenticates, the provided password is hashed in the same manner and the hashed value is checked against the stored value for a match. Passwords are stored like this because if the hashed value is lost, there is no way to “decrypt” the data / get back to the original password. The first problem is if two users have the same password, it will result in the same hashed value, so each user has a unique salt value that is concatenated with the password before the hash function.
HashFunction(concatenateString(Salt, Password)) -> PerUserUniquePasswordHash
So, when the lists of these hashes are broken, how do attackers get the passwords? Classily, the hashing mechanisms were a single pass of algorithms like MD5 or SHA. The attackers will attempt to hash a large number (trillions) of various possible passwords combinations to see if they can find a match. The classic hash functions are computationally simple. Computing power in Graphics Processing Units (GPU), aka graphics cards, has reached the point, that it is not very expensive to build a machine that can check 60+ billion hashes a second.
So, to help thwart this problem, systems will use hash functions and iterate several thousand times, increasing the cost to break.
ItterationCount*HashFunction(concatenateString(Salt, Password)) -> PerUserUniquePasswordHash
Another technique (like scrypt) is to use algorithms that are computationally hard for GPUs, usually requiring lots of memory. Although scrypt usage must be carefully chosen to make it computationally hard for GPU – but that is for another post. Even with these computationally hard algorithms, targeted attacks can still breach passwords, but again, that’s for another post.
So, back to the original premise! Adobe had a breach where the attackers gained access and published the email addresses, passwords and password hints for 130 million of their users. Adobe did something “bad”; they encrypted the passwords (as opposed to hashing) so the values could all be reversed/decrypted. The security community ridiculed Adobe for storing the passwords in a reversible / decryptable format. But was Adobe wrong?
Now, I need to point something out at this time. Usually, when hashed password databases are leaked, researchers end up breaking 70%+ through various brute force methods and thus far, no Adobe passwords have been broken. The only way to break Adobe’s leaked passwords is to figure out the key used in encrypting which, thus far, is not publicly known. Well, almost none, as some have been “broken” through data inference but not by decrypting.
Lets look at some of their mistakes:
1. Along with the passwords were also listed the password hints that, in many cases, contained the passwords themselves. These should have also been encrypted.
147381145-|–|-XXXXX@XXXXX.net-|-EQ7fIpT7i/Q=-|-123456|–
So it can be assumed that the encrypted value of “EQ7fIpT7i/Q” is 123456; as a result, the person with the above .net password probably uses 123456 at other sites.
2. The individual passwords were not salted (in this case no IV) so users with the same password had the same encrypted value.
Raw data from the adobe break file:
147381145-|–|-XXXXX@XXXXX.net-|-EQ7fIpT7i/Q=-|-123456|–
155033406-|–|-XXXXX@XXXXXX.us-|-EQ7fIpT7i/Q=-|-123456|–
Both of these users have the same hint of 123456 and the same encrypted value. a grep for ‘EQ7fIpT7i/Q’, the encrypted string piped into wc -l shows 1911938 matches.
3. Adobe was also using an eight byte block cipher which was probably DES which encrypts the data in eight byte increments. Because they did not use a salt, the first eight bytes of encrypted text for password and password123 would be the same.
Value |
DES ECB Encrypted Value in Hex |
password1234 |
0x8a65e0e80532b5fa bf67cab8afccfa27 |
P@55w0Rd1234 |
0xfdf058ce01785b90 bf67cab8afccfa27 |
Looking at all the Adobe password hints beginning with the string ‘password’ results some interesting patterns in the encrypted values. I added what I believe the cleartext value is:
Count |
Encrypted Value in Hex |
Suspected Cleartext |
521 |
EQ7fIpT7i/Q |
123456 |
662 |
pIjmc+3/mLzioxG6CatHBw |
pa55word |
726 |
STWrgIvDDp3ioxG6CatHBw |
P@ssw0rd |
729 |
ygtKdMXm1tHioxG6CatHBw |
drowssap |
1466 |
IbF1vGcYjCrioxG6CatHBw |
passw0rd |
2726 |
L8qbAD3jl3jSPm/keox4fA |
password1 |
5890 |
L8qbAD3jl3jioxG6CatHBw |
password |
102 |
g+/hUkh3HrbioxG6CatHBw |
Password |
So, the next question becomes, if the algorithm is using an eight byte boundary, why do all these passwords with length 8 have the encrypted text ‘ioxG6CatHBw’?
Doing a grep for password hint of ‘1-8’ resulted in:
1484 |
j9p+HwtWWT/ioxG6CatHBw |
12345678 |
I can only assume the password system was including a null or other extra character? Searching for the string ‘ioxG6CatHBw’ turns up 36,045,481 occurrences, thus is can be assumed these password all have a string length of eight. More testing about the extra character, looking for password hint with the string ‘1-7’ results in 2343 occurrences with the majority having the encrypted string of ‘dQi0asWPYvQ’. Digging further, there are 124,253 passwords with the exact string of ‘dQi0asWPYvQ’; no passwords had this a partial. If the extra character is a null, then it would not be typeable.
So they did not properly normalize their passwords; I am sure that made for great portability.
Most of Adobe’s problems were crypto related – doing cryptography right is hard and they failed. Even with this failure, they still ended up with better protection then regular hashing because the attackers did not get the symmetric key with the data. Could they have done better, without a doubt, but had followed industry best practices many more of the passwords would have been broken. If the attackers had attained the password with the dataset, then it would have been trivial to get the cleartext values for all of the passwords.