Posts

Escape the Crab Mentality

There is a well-known concept within psychology called Crab mentality, which is a way of thinking best described by "If I can't have it, neither can you". It is a mentality shared by millions around the world, everywhere from small teams to local communities and all the way up to large enterprises and countries. It is a way to survive when resources are scarce, as any single entity in the group who consumes too much must be forced to live under the same constraints as everyone else. However, the crab survivalist way of thinking can often transcend constraints on resources, and factors such as envy can exacerbate the mentality into a selfish ego-centric paradigm, which limits other people's potential to accelerate beyond their peers and become better. It is from this mentality we hear stories like, "I want to leave my local community to attend university, but our priest says I have to stay and help my parents". What we see here is that our protagonist is ...

Strings from a Security Perspective

There are times when you have to store secrets in your application. You wish to do it securely, but how exactly would you do that? It is more problematic than you think, and if you don't design your application around certain concepts, you will end up with a security nightmare that is nearly impossible to solve. Let's take a look at an example. Here we are using the popular S3 client from Amazon's official AWS SDK for .NET: AWSCredentials cred = new BasicAWSCredentials( "keyId" , "secretAccessKey" ); using (AmazonS3Client client = new AmazonS3Client(cred)) {     //do something with client } The problem here is that you are forced to enter the secret access key as a string, and strings are not designed around security best practices. In .NET, strings are garbage collected, in contrast to integers, DateTime and other data types, which are not garbage collected. It is not the only problem with strings, but let's take one issue at the t...

FindDupes 1.1 released

I've released a new version of FindDupes. It now groups files by size before hashing, thereby dramatically speeding up the deduplication process. If you have .NET Core installed, you can install the tool by running: dotnet tool install -- global FindDupes If you want to look through the code or report any issues, head over to  https://github.com/Genbox/FindDupes

Beware of BrotliStream in .NET Core 2.1

I love compression algorithms and have built a fair share of my own for fun and profit. It is a difficult task to build something that is efficient when it comes to the compromise between speed and compression ratio, so when I saw Google's Brotli algorithm mentioned in the  .NET Core 2.1 RC1 announcement , I was estatic! It lasted about 15 minutes until I put the BrotliStream to a test, only to find out that it took 140x longer to compress a 518 MB file! Curious if this was a bug, I delved into the code and realized that the default compression level is set to 11 . For those not familiar with Brotli, that is the highest level available, that prioritize compression ratio above speed. I reported it as an issue , but Microsoft seems to think that having the same configuration as Google's Brotli library is more important than a sane default, so it is only a matter of time before tons of StackOverflow posts popup complaining about the speed of BrotliStream. If you wan...

Performance Measurement Mistakes in Academic Publications

One of the fastest ways to get me to stop reading a paper is to make incorrect assumptions in the hypothesis or rely on previous work that is not completely solid. There is no doubt that research in the academic context is hard, and writing a paper about it is even harder, but the value of all this hard work is diminished if you make a mistake. This time, I will focus on performance measurement mistakes in computer science papers. They come in different flavours and are sometimes very subtle. Performance is paramount in algorithmics and some researchers don't do their due diligence when trying to prove theirs is faster. Here are the 3 most often occurring mistakes I see, in no particular order. Performance Measurements on Different Hardware When measuring performance, it is paramount to do it under the right circumstances. Every so often I come across a paper that states algorithm X is 4x faster than algorithm Y, but they measured it using absolute numbers between t...

Reducing the size of self-contained .NET Core applications

Just for note keeping, I've written down some methods of reducing the size of a .NET Core application. I thought others could use it as well, so here you go. Original Size First off, let's see how much disk space a self-contained 'hello world' application takes up. > dotnet new console > dotnet publish -r win-x86 -c release Size: 53.9 MB -  yuck! Trimming Microsoft has built a tool that finds unused assemblies and removes them from the distribution package. This is much needed since the 'netcoreapp2.0' profile is basically .NET Framework all over again, and it contains a truckload of assemblies our little 'hello world' application don't use. > dotnet new console > dotnet add package Microsoft.Packaging.Tools.Trimming -v 1.1.0-preview1-25818-01 > dotnet publish -r win-x86 -c release /p:TrimUnusedDependencies=true Size: 15.8 MB Much better! Although, we have only removed unused assemblies - what about unused...

Testing 36200 DNS servers

Introduction DNS is an important part of the Internet and the speed and security are paramount for a good browsing experience. I thought it would be a good idea to scan the internet for DNS servers and test every single one of them. However, the latency of a particular DNS server depends highly on the distance and connection technology between you and the DNS server, and since I'm geographically located in Denmark, the results speed-wise only pertain to people located in/around Denmark. Testing methodology I scanned the IPv4 Internet using NMap on port 53/UDP and stopped the scan after a few hours. The results were DNS 36200 servers, some of which are owned by ISPs, companies and a whole lot of private people. Since the IP scan was randomized, it should represent a good sample. Almost all DNS servers have some sort of caching mechanism that makes sure requested DNS names are kept for as long as the Time-To-Live (TTL) as defined by the domain owner. To ensure we don't ju...