Leak 2 = 177.5 million password hashes for 165.6 million Linked in users
But those 6.4 million unique hashes posted on a Russian password-cracking forum in June 2012 only accounted for a fraction of the total LinkedIn database. This second dump, on the other hand, contains 177.5 million password hashes for 164.6 million users, which aligns perfectly with LinkedIn’s user count in the second quarter of 2012. After validating the data that I received with several individuals, I concluded that this does appear to be a nearly complete dump of the user table from the 2012 LinkedIn hack.
As Ars explained a few months after the first batch of LinkedIn passwords spilled, password cracking is an endless feedback loop. We crack the passwords so that we can learn about passwords which helps us to crack more passwords, which we can then analyze and use to crack more passwords. We start off with a small amount of data that enables us to crack a small number of passwords. Those passwords then give us some insight into how passwords are created, which enables us to crack more in the future.
And it’s not just passwords we’re interested in, either. Any short, low-entropy, human-generated string—e.g. usernames and screen names, e-mail addresses, etc.—are all potentially useful. Similar to what we’ve learned in the absence of external factors such as password complexity policies, the username selection process is not all that different from the password selection process. The more data we can accumulate and analyze, the more successful we are at cracking passwords.
Back in the early days of password cracking, we didn’t have much insight into the way people created passwords on a macro scale. Sure, we knew about passwords like 123456, password, secret, letmein, monkey, etc., but for the most part we were attacking password hashes with rather barbaric techniques—using literal dictionaries and stupid wordlists like klingon_words.txt. Our knowledge of the top 1,000 passwords was at least two decades old. We were damn lucky to find a password database with only a few thousand users, and when you consider the billions of accounts in existence even back then, our window into the way users created passwords was little more than a pinhole.
Those were the dark ages of password cracking. The age of enlightenment came after 32 million non-unique plaintext passwords from RockYou were leaked to the Internet. Suddenly that pinhole turned into a porthole, and for the first time in history we got a solid look at how users were creating passwords on a mass scale.
The RockYou breach revolutionized password cracking. No longer were we using crap like list_of_kitchen_appliance_manufacturers.txt for wordlists. Everyone was just using rockyou.txt, and they were cracking a significant percentage of passwords. Markov statistics, mangling rules, everything was being based off what we learned from the RockYou passwords.
The RockYou breach coincided with another turning point in password cracking history: the advent of general-purpose GPU computing. By harnessing the parallel processing capabilities of graphics cards we could now crack password hashes tens of times faster than with a regular CPU. Meanwhile, software like Hashcat helped bring GPU password cracking into the mainstream, displacing now-obsolete techniques like rainbow tables. Instead of pushing pixels, we were pushing RockYou-powered passwords, and we were cracking password hashes with unprecedented speed and success. This fueled a wave of new password research, and when other large password breaches came our way—eHarmony, Stratfor, Gawker, and LinkedIn, for instance—we were ready and waiting.
A global failure made worse
Let’s quickly remember why we hash passwords in the first place: password hashing is an insurance policy. It ensures that should the password database be compromised in any way or through any vector, including physical theft, the passwords will not be recovered until engineers have an opportunity to identify and contain the breach, notify the public, and give users an opportunity to change their passwords anywhere else they may have used them. The stronger and slower the password hashing is, the more time a sites buys for itself and its users in the event of a breach.
Therein lies the problem. We’ve known about the necessity of slow hashing since the 1970s, yet due to a global failure in threat modeling, adoption has been extremely low. It is only in light of a string of high-profile breaches in the last five years that slow hashing has begun to make its way into the mainstream. Thanks to services like LinkedIn, who negligently failed to employ slow hashing (the combined 184 million passwords dumped in 2012 and this year all used unsalted SHA1), hackers have had more than a few fantastic opportunities to collect and analyze massive amounts of password data.