Saturday, December 3, 2016

The problem with Network Address Translation

As a technology ideologist, Network Address Translation (NAT) is one of my biggest concerns when it comes to the future of the Internet. The Internet was built as a communications tool to facilitate the sharing of information in digital form. It has vastly improved the communication between humans around the planet and it has been one of the most important inventions we have ever made.

My concerns lies in the fact that NAT is a direct inhibitor of the nature of the Internet, which goes against everything we want the Internet to be.

Network Address Translation

The Internet uses routers to route data from network to network, but to do so, we need addresses of each network. Today we use Internet Protocol version 4 (IPv4), which you usually see in dottet format such as 199.181.132.250. It it's raw form, IPv4 is a 32 bit (4 bytes) addressing scheme which is able to address 2^32 (4.294.967.296) networks, which was enough back when the Internet was created in the 1950s, but it is nowhere near enough for the Internet today. To solve this problem, Internet Protocol version 6 (IPv6) was created, which has 128 bits that gives us 2^128 (340.282.366.920.938.463.463.374.607.431.768.211.456) addresses, which should be enough to last us for a long time.

Routers needs a firmware upgrade in order to route 128 bit addresses, and back in the day when IPv6 was devised (1998), routers simply did not have the capacity to route the larger address space. A quick workaround was to create NAT, which is a simple mechanism to translate one address to another. With this capability, Internet Service Providers (ISP) began to distribute routers pre-configured with NAT enabled. They had DHCP enabled on the LAN interface with non-routed IP addresses (192.168.0.0/24), which then got translated to a routed IP address on the WAN interface. This allowed ISPs to extend the lifetime of IPv4 and keep using low capacity consumer routers.

That does not sound too bad, right? It is a clever way to circumvent the limitations of IPv4 and keep using existing hardware which was produced for cheap in large quantities.

The Problem

A side effect of NAT is that it splits the Internet into many smaller private networks, which can't communicate directly with each other unless you do port forwarding. This side effect has been documented as a "security feature" ever since it was conceived, as it essentially functions like a simple firewall between networks.

Remember what the purpose of the Internet was? That's right, to transmit digital information between machines! NAT completely contradicts the whole purpose of the Internet by segmenting the Internet, just because we saw NAT as a quick fix to our IPv4 addressing problem. We have become accustomed to the fact that NAT exists. It has served it's purpose, and it is time we move on to IPv6 and make the Internet work as intended again.

The Solution

By now you should already know that IPv6 is the solution to the NAT problem, but there is still an unanswered question: What about the security NAT provides?

Let me counter-question you with this: What is it we are trying to achieve?

I'd agree that the Internet should not just be one big monolithic security boundary, where any hosted service can be reached by everyone else. That is why we have firewalls, which NAT is not. You still have a router with two networks, and in other to publish a service from one network to the other, a firewall will have to allow it. Firewalls are a much more efficient solution to the security problem than NAT is.

Let's say we enabled IPv6, what exactly would that mean?

That is the funny part; you are already running IPv6! That's right, you are running a dual-stacked network layer capable of both IPv4 and IPv6, you don't have to do anything. Most routers are also already dual-stacked, we just have to disable IPv4 and tadaaaa, you are IPv6 only.

For router manufactures, developers and network engineers it is a huge advantage to completely disable IPv4, which is why many ISPs today are running IPv6-only networks internally. As a manufacturer or developer, you can remove NAT, TURN, STUN, IGDP and NAT-PMP from routers and communication software. Of course we still need a network control protocol that publishes a service in the router's firewall, but it is so much more simple now that you don't have to take NAT into account.

A Real Example

The BitTorrent protocol is one of the most common occurring protocols on the Internet. It is a simple protocol to transmit binary data between clients in an efficient manner using Peer to Peer (P2P). Clients find each other by using a Tracker or what's known as a Distributed Hash Table (DHT). DHT is a simple protocol, but it is severely limited by NAT, simply by the fact that it works in a very socialistic manner. Each client in the DHT network has to store a chunk of data for it to work, but since most of them now sits behind NAT enabled routers, they can't be reached by the rest of the network, thereby making the data they contain unreachable by others. This fundamentally destroys the whole concept of a Distributed Hash Table! All BitTorrent applications have implemented one of the many protocols to circumvent NAT, but none of them are perfect. If we switched to IPv6 only networks, DHTs can finally work again, and we would see a massive gain in performance as clients can find each other more efficiently.

Other P2P protocols like Skype would also work better and would not have to route though a third party. Back in 2011 when Microsoft bought Skype, they began transitioning the Skype P2P protocol from a highly distributed node network into a more centralized platform based on Microsoft Notification Protocol (MSNP). The centralized protocol works "better" because Microsoft's servers work as a broker between you and other clients, thereby circumventing NAT. If the clients both have IPv6 and publishes the Skype service in their router firewall, they could reach each other directly, and instantly get a massive performance and stability gain.

Maybe this is just wishful thinking, but it does seem that the IPv6 adoption is gaining speed, which is good news for the internet as a whole.

Monday, December 1, 2014

.NET Compression Libraries Benchmark

I often find myself needing a compression library in a software development project, and after a few tests with disappointing results I usually fall back to spawning 7Zip as an external process. This time however, the application specification prohibits any data written to disk, which 7Zip needs to do in order to work correctly. I needed to do everything in memory, and so I had to test some of the C# libraries available for compression as well as the different compression algorithms.

Here is a full list of the tested libraries:

 

 Compression 101

Before I get to the results of the benchmark, I think it is important to understand the different archive formats and compression algorithms in order to compare the results with other benchmarks/and or tests. Lets get to it!

A compression algorithm can be compared to a small machine that takes in some data, do some math on it and transform it to some new data - just smaller in size. It sounds simple, but it involves a lot of math to do it efficient. The best compression algorithms are the ones which have a high compression ratio while only using small amounts of RAM and CPU. One such algorithm is the very well known DEFLATE which internally uses the famous LZ77 algorithm created by Abraham Lempel and Jacob Ziv. The DEFLATE algorithm has become the de facto standard when it comes to compression, and it is used in nearly all instances when a compression algorithm is needed. There are many compression algorithms out there, some are better than DEFLATE and also very wide spread. Examples include LZW, BWT, LZSS and LZMA.

Each of the compression algorithms only take in a single file, either as a blob of bytes, or a stream of bytes. The question is, how do compression applications such as 7Zip, WinZip and WinRAR compress multiple files?

Enter the world of archive formats. We all know about Zip, BZip2, GZip and RAR. They all simply describe the content of themselves either in terms of files or streams of data, along with some metadata such as original size, file name and path. The archive formats are more complex that you would think, simply because there are a lot of features. Some of them support multiple data streams (for multi-threading), encryption, password protection, file name obfuscation, self-extracting archives and more.

Figure 1: The relationship between files, compression algorithms and archive formats.

Many of the archiving formats are now synonymous with the compression algorithm they use. We say "Zip the file" or "Make a 7z of the folder". In reality, you can interchange the different archive formats with any of the compression algorithms, and as such you can have combinations such as Zip with LZMA or 7z with DEFLATE. That being said, there are some default compression algorithms in some archive formats. I've listed some examples here in archive/algorithm format:
  • Zip / DEFLATE
  • 7zip / LZMA
  • Bzip2 / BWT
  • Gzip / DEFLATE
  • GIF / LZW
  • PNG / DEFLATE
  • Tar / none
  • Rar / LZSS
The avid reader noticed that I put a couple of image formats in the list (GIF, PNG). Both of them use compression algorithms to minimize the size of the image data, and are therefore technically archive formats. The same applies to Microsoft Excel 2013 files (xlsx / DEFLATE) and Java Archives (Jar / DEFLATE).

In the Linux world, the solution to compressing multiple files was solved using an archiving format called Tar. It is a pretty simply format designed for tape archiving, which preserves the filesystem information such as timestamps, owner information and file size on the files held inside the archive. To compress a directory, Tar simply store all the files within the directory inside a single file, then compress it using DEFLATE (gzip) or BWT (bzip2).

 Benchmark

Now that we know the difference between compression algorithms and archive formats, we can continue to the benchmark. The libraries tested here are overlapping in the sense that they are simply different implementations of different compression algorithms, and even though the name of the library might contain Zip, it has nothing to do with the actual Zip archiving format, but everything to do with the compression algorithm used by default in Zip files.

The benchmark uses a 128 MB test file that contains images, music, text and some random data. The file represent arbitrary data usually found on a filesystem.

Library Type Packed (MB)   Pack (ms)     Unpack (ms)    Ratio   Speed (MB/s)
.NET 4.5.1 Zip 34.10 2,921 895 3.58 43.82
ICSharpCode Zip 34.07 6,373 1,524 3.58 20.08
ICSharpCode Gzip 34.07 6,347 1,511 3.58 20.17
ICSharpCode BZip2 35.69 45,002 6,811 3.42 2.84
7z SDK LZMA 28.51 40,235 3,254 4.28 3.18
Zlib.NET Zlib 33.97 4,837 985 3.59 26.46
DotNETZip Zip 33.97 5,120 1,482 3.59 25.00
DotNETZip Gzip 35.67 31,739 4,556 3.42 4.03
DotNETZip Zlib 33.97 4,509 887 3.59 28.39
LZMA-managed   LZMA 28.07 35,555 7,402 4.35 3.60
QuickLZ QuickLZ 42.90 1,015 1,010 2.85 126.11

 

Conclusion

The fastest algorithm was QuickLZ, but it is also the worst compression ratio. The best compression ratio was gained by LZMA, but it is also the third slowest compression speed. It would seem the built-in DeflateStream in .NET is the overall winner when it comes to speed vs. compression ratio.

Monday, August 27, 2012

Analysis of the Gamigo hashes


This leak analysis is dedicated to Steve Gibson of Gibson Research Corporation.
Thanks for a great show Steve!

Back in Februrary, a large gaming site called Gamigo was hacked, and in July, the full list of email and hashes found their way onto the Internet. Pwnedlist.com are the kind providers of the hashes for this analysis, and as an added bonus, they also sent me the emails for this analysis. A huge thanks goes out to Pwnedlist.com for their kindness.

The leak contains 9.475.226 valid MD5 hashes and 8.261.454 emails. It has 7.028.067 unique hashes and 8.244.423 unique emails.

In the time-frame of 7 days, 19 hours and 3 seconds, I was able to crack 7.731.708 hashes (81,6%).

Cracking System

The cracking was done on an ordinary GeForce 560 TI, 1024MB RAM graphics card using Hashcat-Plus v0.081 64bit. The settings of Hashcat were set to low, which resulted in only 90% cracking efficiency. Had it been optimized, the full crack would have taken 6 days and 20 hours.

Different cracking techniques such as keyboard-walking, character repetition and leet speak were used to obtain the high number of cracked passwords.

Character Distribution

A healthy character distribution has equal amounts of each character type. Gamigo must have had a less than optimal password policy, as it seems that you could create passwords with whatever characters you want.



Unique Character Distribution

Again, we are reminded of the missing password policy at Gamigo. 28% of the passwords are digits only, with a good password policy, this would not have been possible.



Password Composition

When we take a look at the password composition, we count how many passwords contain symbols. It turns out that only 13.566 passwords that were recovered contained a symbol, and 7.718.142 that did not. This resulted in the rather boring pie chart:



Password Length

Gamigo have 2 passwords with length 0 and 20 passwords with length 1. This might be passwords created during the development of the site - and they might have been the way the hackers originally hacked the site (first password check in most brute forcers is nothing).

We observe something quite unusual here, and that is the high number of 10 character passwords. The password lengths also show the missing password policy, as you can have whatever length you like.



The positive side of the case, is that the high number of 10 character passwords pushed the average password length up to 8,52 characters; higher than usual.


10 Longest Passwords

The 10 longest passwords give an idea of how low strength long passwords can have. Repetition and common patterns are used to create long passwords; however, they are easily cracked using modern password cracking techniques.


Password Length
frontschweinefrontschweine 26
qwertyuiopasdfghjklzxcvbnm 26
stevieboy74stevieboy74 22
needforspeedmostwanted 22
lllllllllllllllllllll 21
qaywsxedcrfvtgbzhnujm 21
sivispacemparabellum 20
abcdefghijklmnopqrst 20
12345678900987600000 20
1qazxcvbnm1qazxcvbnm 20

10 Shortest Passwords

The 10 shortest passwords are hopefully not user passwords. Both a, b and c are among the shortest passwords, which leads me to believe that they are internal testing passwords or passwords created during development.


Password Length

0
x 1
g 1
a 1
2 1
½ 1
b 1
c 1
S 1
5 1

Password Strength

A strong password has 9 or more characters, 1 digit, 1 lowercase, 1 uppercase and 1 symbol. However, in Gamigo's case, only 1250 passwords are considered strong.



This low number of strong passwords is a combination of the fact that the passwords were cracked, but also because of the missing password policy. A password of 10+ lowercase characters is not considered secure, and a single symbol in a 10+ character password would exponentially increase the security of the password.

Wordlists

17% of the passwords were cracked right away using common wordlists. That is pretty normal on sites with limited or no password policy.



The IQ Wordlist (more info here) cracked 20% of the passwords.


10 Most Common Passwords

The passwords in this list are the most common passwords used by Gamigo users. Most of them are numbers only, but a few stand out. 'azerty' is used a lot by French users since they have AZERTY keyboard layouts (see the email analysis below for more info on French users). I'm unsure why 'perach' is on the list, it might be the German municipality or a password used multiple times by a bot.


Password Frequency
123456 35.621
123456789 24.377
czz000 6.166
12345 5.300
111111 4.765
azerty 4.614
12345678 4.547
perach 3.913
123123 3.749
1234567890 3.679

Email Analysis

The leak also contained a large amount of emails. So as an added bonus, I've made an analysis of the emails.

8.261.454 emails were in the leak and 8.244.423 (99,7%) of them are unique. 79065 of the emails were invalid according to RFC 2822, that leaves  8.165.358 (99% of unique) valid emails.

Domain Statistics

 In the domain statistics, we can see that "hotmail.com" and "hotmail.fr" are the two most used email providers. This hint to a large percentage of French users.

 

Top Level Domain Statistics

In the TLD statistics, we can see the 30% of the users of Gamigo were using German email providers. Gamigo is a German service, so we would have expected that. However, the second largest country in the statistics is France.



Email Username as Passwords

A total of 1.399.689 hashes (13%) could be cracked using the username part of the email, as the password. To maximize the efficiency, I removed dot, underscore and hyphen characters from the emails. This is a surprisingly large amount hashes cracked using the emails only. Note that I do not have email and hash pairs, but rather a bunch of emails and a bunch of hashes. Each email username was tested against all the hashes, so a user with the email john@gmail.com would have resulted in "john" being tried against all the hashes, resulting in multiple cracked hashes.



Conclusion

It seems that Gamigo, like so many other large online services, failed to implement a good password policy. Not only were they using raw MD5 as the hashing algorithm, but they also had little to no password strength requirements for their users, which resulted in over 80% of their user hashes cracked in about a week.

The lesson is that you should implement a stronger hashing algorithm and require some sort of minimum password strength of your users. Gamigo did force a password reset on all users, but many users probably just choose the same password or a password that closely resembles the old password. One good thing came of this, and that is Gamigo took responsibility and told their users of the hack, this greatly minimizes the damage that can potentially happen to the users.