Friday, December 9, 2016

Secure Hashing of Passwords

Numerous breaches happen every day due to security vulnerabilities. On this blog, I have previously analyzed some of the biggest breaches that have been made public. All of them serve as a testament to the inadequate security many companies employ.

The Time it Takes to Crack a Hash

I often get asked: "How long does it take to crack an MD5 hash?" - implying that the cryptographic hash algorithm is the most important factor, which rarely is the case. It actually depends on a couple of factors, which I've ordered in descending importance below:
  1. The length of the password
  2. The charsets used in the password
  3. The amount of hardware you have
  4. The methodology you use for cracking

Secure Passwords

Password lengths alone are not enough, you also have to use a good charset to control the search space hackers have to go though. Let's make an example to show how much the search space changes:

Charset = a-z, A-Z, 0-9
Length 6: (26+26+10)^6 = 56.800.235.584
Length 7: (26+26+10)^7 = 3.521.614.606.208
Length 8: (26+26+10)^8 = 218.340.105.584.896

As you can see, the number gets a lot bigger for each character we add. Now lets look at charsets:

Length = 8
Charset: a-z = (26)^8 = 208.827.064.576
Charset: a-z, A-Z = (26+26)^8 = 53.459.728.531.456
Charset: a-z, A-Z, 0-9 = (26+26+10)^8 = 218.340.105.584.896

Charsets are at least as important as the length is. In reality, we need to increase both in other to get good passwords. For reference, MD5 hashes on a Nvidia Titan X graphics card can be cracked with around 15.000.000.000 hashes per second. A password of a-z, A-Z, 0-9 of length 8 would take (26+26+10)^8 / 15.000.000.000 = 14556 seconds (about 4 hours). That is not long enough to deter hackers - we need to do something about the hashing algorithm as well.

The Cryptographic Hash Algorithms

Through time, developers have used many different cryptographic hash algorithms to store passwords. Actually, most systems still use rather weak hash algorithms, which is why this blog post even exist. If we take a look at the hashing algorithms from strong to weak:

1. scrypt, bcrypt, PBKDF2
2. SHA512, SHA256
3. SHA1, MD5
4. NTLM, MD4
5. LM, DES (unix crypt)

I could write 100 blog posts about each algorithm, but suffice to say that LM and DES used to be VERY widespread in the days of Windows XP and Linux 2.x. They have since moved on to stronger algorithms, but Windows is stuck on NTLM, and Linux still use MD5, SHA256 and SHA512 for the most part, but is transitioning nicely to scrypt and PBKDF2.

You might be wondering what the speeds are of each algorithm, so I'll include each of them here. A larger number of hashes is worse. The speeds are from a Nvidia Geforce 1080 graphics card with Hashcat:
  1. scrypt - 434.000 H/s
  2. bcrypt - 13.226 H/s
  3. PBKDF2 (SHA1) - 3218.000 H/s
  4. SHA512 - 1.021.000.000 H/s
  5. SHA256 - 2.829.000.000 H/s
  6. SHA1 - 8.491.000.000 H/s
  7. MD5 - 24.809.000.000 H/s
  8. NTLM - 41.354.000.000 H/s
  9. MD4 - 43.145.000.000 H/s
  10. LM - 18.429.000.000 H/s
  11. DES - 909.000.000 H/s
You might say: Hold on a minute! DES is actually slower than SHA512, does that not mean it is more secure?

Nope! That's because the way the Linux guys implemented DES limited it a maximum of 8 characters, and they are also using the 2 first characters as salt, which means we only need to crack 6 character long passwords. LM hashes are also limited to 14 characters, but the way Microsoft made the algorithm is flawed, and as such, we only need to crack 2x7 character passwords.

The Salted Hash

Password salts are needed in order to protect against rainbow table attacks. Hackers have access to large precomputed tables of character sets in many different cryptographic hash algorithms today, and when they target just a single hash, it can be quite effective.

The salt is just a binary blob that gets appended to the password in the hashing process. A password of '123456' becomes '123456[SaltHere]', thereby making it impossible to precompute a table that contains '123456[SaltHere]'. Salts should be unique for each user in the database, thereby making the protection even more effective.

The right way to salt the password is to use a cryptographically secure pseudo-random number generator (CSPRNG) to create a salt which is unique for each user. It needs to be at least 64 bits (8 bytes) to be effective. The salt is stored together with the hash inside the user database in plain text.

The Slow Hash

Cryptographic hash functions like MD5 and SHA1 are designed with high performance in mind. This means that they can quickly produce a hash of an arbitrary input, but it also means that they are good for brute force and wordlist attacks. To slow down the hashing procedure, we use something called a key derivation function (KDF). scrypt, bcrypt and PBKDF2 are such a function. They take in a password, a salt and a number of iterations. Most implementations actually just require a password and a "workload factor", which simplifies the use of the algorithms.

They work by cryptographic primitives like SHA1, SHA256 or Blowfish, which is then run the number of iterations together with a salt. They also apply the salt in an HMAC fashion to be completely secure against many different cryptographic hash attacks.

As you can see from the hash speed list above, scrypt, bcrypt and PBKDF2 are much, much slower than their relatives. This is very configurable and perfect for password hashing. Developers that implement these algorithms should aim for the number of iterations it takes to make a password hashing operation last 50 ms or more on their server.

If we take the example of a 8 character password with the charset a-z, A-Z, 0-9 from above, which took 4 hours to crack with MD5, and try to crack it using PBKDF2, it would take around 2 years to crack!

Conclusion

Use scrypt, bcrypt or PBKDF2 with a high enough work factor that it takes at least 50 ms to compute a single hash on your server. It takes care of hashing the correct way, applying a salt and usually uses a CSPRNG to generate the salt. Stay away from everything else.

Saturday, December 3, 2016

The problem with Network Address Translation

As a technology ideologist, Network Address Translation (NAT) is one of my biggest concerns when it comes to the future of the Internet. The Internet was built as a communications tool to facilitate the sharing of information in digital form. It has vastly improved the communication between humans around the planet and it has been one of the most important inventions we have ever made.

My concerns is with the fact that NAT is a direct inhibitor of the nature of the Internet, which goes against everything we want the Internet to be.

Network Address Translation

The Internet uses routers to route data from network to network, but to do so, we need addresses of each network. Today we use Internet Protocol version 4 (IPv4), which you usually see in dottet format such as 199.181.132.250. In its raw form, IPv4 is a 32 bit (4 bytes) addressing scheme which is able to address 2^32 (4.294.967.296) networks, which was enough back when the Internet was created in the 1950s, but it is nowhere near enough for the Internet today. To solve this problem, Internet Protocol version 6 (IPv6) was created, which has 128 bits that give us 2^128 (340.282.366.920.938.463.463.374.607.431.768.211.456) addresses, which should be enough to last us for a long time.

Routers need a firmware upgrade in order to route 128-bit addresses, and back in the day when IPv6 was devised (1998), routers simply did not have the capacity to route the larger address space. A quick workaround was to create NAT, which is a simple mechanism to translate one address to another. With this capability, Internet Service Providers (ISP) began to distribute routers pre-configured with NAT enabled. They had DHCP enabled on the LAN interface with non-routed IP addresses (192.168.0.0/24), which then got translated to a routed IP address on the WAN interface. This allowed ISPs to extend the lifetime of IPv4 and keep using low capacity consumer routers.

That does not sound too bad, right? It is a clever way to circumvent the limitations of IPv4 and keep using existing hardware which was produced for cheap in large quantities.

The Problem

A side-effect of NAT is that it splits the Internet into many smaller private networks, which can't communicate directly with each other unless you do port forwarding. This side effect has been documented as a "security feature" ever since it was conceived, as it essentially functions like a simple firewall between networks.

Remember what the purpose of the Internet was? That's right, to transmit digital information between machines! NAT completely contradicts the whole purpose of the Internet by segmenting the Internet, just because we saw NAT as a quick fix to our IPv4 addressing problem. We have become accustomed to the fact that NAT exists. It has served its purpose, and it is time we move on to IPv6 and make the Internet work as intended again.

The Solution

By now you should already know that IPv6 is the solution to the NAT problem, but there is still an unanswered question: What about the security NAT provides?

Let me counter-question you with this: What is it we are trying to achieve?

I'd agree that the Internet should not just be one big monolithic security boundary, where any hosted service can be reached by everyone else. That is why we have firewalls, which NAT is not. You still have a router with two networks, and in other to publish a service from one network to the other, a firewall will have to allow it. Firewalls are a much more efficient solution to the security problem than NAT is.

Let's say we enabled IPv6, what exactly would that mean?

That is the funny part; you are already running IPv6! That's right, you are running a dual-stacked network layer capable of both IPv4 and IPv6, you don't have to do anything. Most routers are also already dual-stacked, we just have to disable IPv4 and tadaaaa, you are IPv6 only.

For router manufacturers, developers and network engineers it is a huge advantage to completely disable IPv4, which is why many ISPs today are running IPv6-only networks internally. As a manufacturer or developer, you can remove NAT, TURN, STUN, IGDP and NAT-PMP from routers and communication software. Of course, we still need a network control protocol that publishes a service in the router's firewall, but it is so much more simple now that you don't have to take NAT into account.

A Real Example

The BitTorrent protocol is one of the most common occurring protocols on the Internet. It is a simple protocol to transmit binary data between clients in an efficient manner using Peer to Peer (P2P). Clients find each other by using a Tracker or what's known as a Distributed Hash Table (DHT). DHT is a simple protocol, but it is severely limited by NAT, simply by the fact that it works in a very socialistic manner. Each client in the DHT network has to store a chunk of data for it to work, but since most of them now sits behind NAT-enabled routers, they can't be reached by the rest of the network, thereby making the data they contain unreachable by others. This fundamentally destroys the whole concept of a Distributed Hash Table! All BitTorrent applications have implemented one of the many protocols to circumvent NAT, but none of them are perfect. If we switched to IPv6 only networks, DHTs can finally work again, and we would see a massive gain in performance as clients can find each other more efficiently.

Other P2P protocols like Skype would also work better and would not have to route through a third party. Back in 2011 when Microsoft bought Skype, they began transitioning the Skype P2P protocol from a highly distributed node network into a more centralized platform based on Microsoft Notification Protocol (MSNP). The centralized protocol works "better" because Microsoft's servers work as a broker between you and other clients, thereby circumventing NAT. If the clients both have IPv6 and publishes the Skype service in their router firewall, they could reach each other directly, and instantly get a massive performance and stability gain.

Maybe this is just wishful thinking, but it does seem that the IPv6 adoption is gaining speed, which is good news for the internet as a whole.

Monday, December 1, 2014

.NET Compression Libraries Benchmark

I often find myself needing a compression library in a software development project, and after a few tests with disappointing results I usually fall back to spawning 7Zip as an external process. This time however, the application specification prohibits any data written to disk, which 7Zip needs to do in order to work correctly. I needed to do everything in memory, and so I had to test some of the C# libraries available for compression as well as the different compression algorithms.

Here is a full list of the tested libraries:

 

 Compression 101

Before I get to the results of the benchmark, I think it is important to understand the different archive formats and compression algorithms in order to compare the results with other benchmarks/and or tests. Lets get to it!

A compression algorithm can be compared to a small machine that takes in some data, do some math on it and transform it to some new data - just smaller in size. It sounds simple, but it involves a lot of math to do it efficient. The best compression algorithms are the ones which have a high compression ratio while only using small amounts of RAM and CPU. One such algorithm is the very well known DEFLATE which internally uses the famous LZ77 algorithm created by Abraham Lempel and Jacob Ziv. The DEFLATE algorithm has become the de facto standard when it comes to compression, and it is used in nearly all instances when a compression algorithm is needed. There are many compression algorithms out there, some are better than DEFLATE and also very wide spread. Examples include LZW, BWT, LZSS and LZMA.

Each of the compression algorithms only take in a single file, either as a blob of bytes, or a stream of bytes. The question is, how do compression applications such as 7Zip, WinZip and WinRAR compress multiple files?

Enter the world of archive formats. We all know about Zip, BZip2, GZip and RAR. They all simply describe the content of themselves either in terms of files or streams of data, along with some metadata such as original size, file name and path. The archive formats are more complex that you would think, simply because there are a lot of features. Some of them support multiple data streams (for multi-threading), encryption, password protection, file name obfuscation, self-extracting archives and more.

Figure 1: The relationship between files, compression algorithms and archive formats.

Many of the archiving formats are now synonymous with the compression algorithm they use. We say "Zip the file" or "Make a 7z of the folder". In reality, you can interchange the different archive formats with any of the compression algorithms, and as such you can have combinations such as Zip with LZMA or 7z with DEFLATE. That being said, there are some default compression algorithms in some archive formats. I've listed some examples here in archive/algorithm format:
  • Zip / DEFLATE
  • 7zip / LZMA
  • Bzip2 / BWT
  • Gzip / DEFLATE
  • GIF / LZW
  • PNG / DEFLATE
  • Tar / none
  • Rar / LZSS
The avid reader noticed that I put a couple of image formats in the list (GIF, PNG). Both of them use compression algorithms to minimize the size of the image data, and are therefore technically archive formats. The same applies to Microsoft Excel 2013 files (xlsx / DEFLATE) and Java Archives (Jar / DEFLATE).

In the Linux world, the solution to compressing multiple files was solved using an archiving format called Tar. It is a pretty simply format designed for tape archiving, which preserves the filesystem information such as timestamps, owner information and file size on the files held inside the archive. To compress a directory, Tar simply store all the files within the directory inside a single file, then compress it using DEFLATE (gzip) or BWT (bzip2).

 Benchmark

Now that we know the difference between compression algorithms and archive formats, we can continue to the benchmark. The libraries tested here are overlapping in the sense that they are simply different implementations of different compression algorithms, and even though the name of the library might contain Zip, it has nothing to do with the actual Zip archiving format, but everything to do with the compression algorithm used by default in Zip files.

The benchmark uses a 128 MB test file that contains images, music, text and some random data. The file represent arbitrary data usually found on a filesystem.

Library Type Packed (MB)   Pack (ms)     Unpack (ms)    Ratio   Speed (MB/s)
.NET 4.5.1 Zip 34.10 2,921 895 3.58 43.82
ICSharpCode Zip 34.07 6,373 1,524 3.58 20.08
ICSharpCode Gzip 34.07 6,347 1,511 3.58 20.17
ICSharpCode BZip2 35.69 45,002 6,811 3.42 2.84
7z SDK LZMA 28.51 40,235 3,254 4.28 3.18
Zlib.NET Zlib 33.97 4,837 985 3.59 26.46
DotNETZip Zip 33.97 5,120 1,482 3.59 25.00
DotNETZip Gzip 35.67 31,739 4,556 3.42 4.03
DotNETZip Zlib 33.97 4,509 887 3.59 28.39
LZMA-managed   LZMA 28.07 35,555 7,402 4.35 3.60
QuickLZ QuickLZ 42.90 1,015 1,010 2.85 126.11

 

Conclusion

The fastest algorithm was QuickLZ, but it is also the worst compression ratio. The best compression ratio was gained by LZMA, but it is also the third slowest compression speed. It would seem the built-in DeflateStream in .NET is the overall winner when it comes to speed vs. compression ratio.

Monday, August 27, 2012

Analysis of the Gamigo hashes


This leak analysis is dedicated to Steve Gibson of Gibson Research Corporation.
Thanks for a great show Steve!

Back in Februrary, a large gaming site called Gamigo was hacked, and in July, the full list of email and hashes found their way onto the Internet. Pwnedlist.com are the kind providers of the hashes for this analysis, and as an added bonus, they also sent me the emails for this analysis. A huge thanks goes out to Pwnedlist.com for their kindness.

The leak contains 9.475.226 valid MD5 hashes and 8.261.454 emails. It has 7.028.067 unique hashes and 8.244.423 unique emails.

In the time-frame of 7 days, 19 hours and 3 seconds, I was able to crack 7.731.708 hashes (81,6%).

Cracking System

The cracking was done on an ordinary GeForce 560 TI, 1024MB RAM graphics card using Hashcat-Plus v0.081 64bit. The settings of Hashcat were set to low, which resulted in only 90% cracking efficiency. Had it been optimized, the full crack would have taken 6 days and 20 hours.

Different cracking techniques such as keyboard-walking, character repetition and leet speak were used to obtain the high number of cracked passwords.

Character Distribution

A healthy character distribution has equal amounts of each character type. Gamigo must have had a less than optimal password policy, as it seems that you could create passwords with whatever characters you want.



Unique Character Distribution

Again, we are reminded of the missing password policy at Gamigo. 28% of the passwords are digits only, with a good password policy, this would not have been possible.



Password Composition

When we take a look at the password composition, we count how many passwords contain symbols. It turns out that only 13.566 passwords that were recovered contained a symbol, and 7.718.142 that did not. This resulted in the rather boring pie chart:



Password Length

Gamigo have 2 passwords with length 0 and 20 passwords with length 1. This might be passwords created during the development of the site - and they might have been the way the hackers originally hacked the site (first password check in most brute forcers is nothing).

We observe something quite unusual here, and that is the high number of 10 character passwords. The password lengths also show the missing password policy, as you can have whatever length you like.



The positive side of the case, is that the high number of 10 character passwords pushed the average password length up to 8,52 characters; higher than usual.


10 Longest Passwords

The 10 longest passwords give an idea of how low strength long passwords can have. Repetition and common patterns are used to create long passwords; however, they are easily cracked using modern password cracking techniques.


Password Length
frontschweinefrontschweine 26
qwertyuiopasdfghjklzxcvbnm 26
stevieboy74stevieboy74 22
needforspeedmostwanted 22
lllllllllllllllllllll 21
qaywsxedcrfvtgbzhnujm 21
sivispacemparabellum 20
abcdefghijklmnopqrst 20
12345678900987600000 20
1qazxcvbnm1qazxcvbnm 20

10 Shortest Passwords

The 10 shortest passwords are hopefully not user passwords. Both a, b and c are among the shortest passwords, which leads me to believe that they are internal testing passwords or passwords created during development.


Password Length

0
x 1
g 1
a 1
2 1
½ 1
b 1
c 1
S 1
5 1

Password Strength

A strong password has 9 or more characters, 1 digit, 1 lowercase, 1 uppercase and 1 symbol. However, in Gamigo's case, only 1250 passwords are considered strong.



This low number of strong passwords is a combination of the fact that the passwords were cracked, but also because of the missing password policy. A password of 10+ lowercase characters is not considered secure, and a single symbol in a 10+ character password would exponentially increase the security of the password.

Wordlists

17% of the passwords were cracked right away using common wordlists. That is pretty normal on sites with limited or no password policy.



The IQ Wordlist (more info here) cracked 20% of the passwords.


10 Most Common Passwords

The passwords in this list are the most common passwords used by Gamigo users. Most of them are numbers only, but a few stand out. 'azerty' is used a lot by French users since they have AZERTY keyboard layouts (see the email analysis below for more info on French users). I'm unsure why 'perach' is on the list, it might be the German municipality or a password used multiple times by a bot.


Password Frequency
123456 35.621
123456789 24.377
czz000 6.166
12345 5.300
111111 4.765
azerty 4.614
12345678 4.547
perach 3.913
123123 3.749
1234567890 3.679

Email Analysis

The leak also contained a large amount of emails. So as an added bonus, I've made an analysis of the emails.

8.261.454 emails were in the leak and 8.244.423 (99,7%) of them are unique. 79065 of the emails were invalid according to RFC 2822, that leaves  8.165.358 (99% of unique) valid emails.

Domain Statistics

 In the domain statistics, we can see that "hotmail.com" and "hotmail.fr" are the two most used email providers. This hint to a large percentage of French users.

 

Top Level Domain Statistics

In the TLD statistics, we can see the 30% of the users of Gamigo were using German email providers. Gamigo is a German service, so we would have expected that. However, the second largest country in the statistics is France.



Email Username as Passwords

A total of 1.399.689 hashes (13%) could be cracked using the username part of the email, as the password. To maximize the efficiency, I removed dot, underscore and hyphen characters from the emails. This is a surprisingly large amount hashes cracked using the emails only. Note that I do not have email and hash pairs, but rather a bunch of emails and a bunch of hashes. Each email username was tested against all the hashes, so a user with the email john@gmail.com would have resulted in "john" being tried against all the hashes, resulting in multiple cracked hashes.



Conclusion

It seems that Gamigo, like so many other large online services, failed to implement a good password policy. Not only were they using raw MD5 as the hashing algorithm, but they also had little to no password strength requirements for their users, which resulted in over 80% of their user hashes cracked in about a week.

The lesson is that you should implement a stronger hashing algorithm and require some sort of minimum password strength of your users. Gamigo did force a password reset on all users, but many users probably just choose the same password or a password that closely resembles the old password. One good thing came of this, and that is Gamigo took responsibility and told their users of the hack, this greatly minimizes the damage that can potentially happen to the users.