Performance Measurement Mistakes in Academic Publications

One of the fastest ways to get me to stop reading a paper is to make incorrect assumptions in the hypothesis or rely on previous work that is not completely solid. There is no doubt that research in the academic context is hard, and writing a paper about it is even harder, but the value of all this hard work is diminished if you make a mistake.

This time, I will focus on performance measurement mistakes in computer science papers. They come in different flavours and are sometimes very subtle. Performance is paramount in algorithmics and some researchers don't do their due diligence when trying to prove theirs is faster. Here are the 3 most often occurring mistakes I see, in no particular order.

Performance Measurements on Different Hardware

When measuring performance, it is paramount to do it under the right circumstances. Every so often I come across a paper that states algorithm X is 4x faster than algorithm Y, but they measured it using absolute numbers between two very different hardware configurations. Of curse, if algorithm X takes 10 ms to run, and Y takes 20 ms to run on the same hardware, it might be faster. I see this problem occurring a lot in the fields of cryptocurrency (GPU acceleration) or Machine Learning (GPU/TPU acceleration).

Researchers should use asymptotic notation - also called Big O notation - to indicate the algorithmic complexity of their algorithm. If your algorithm is asymptotically slower than another, such as O(n^2) vs. O(n), then it is up to you to demonstrate under which conditions your algorithm is actually faster. Whether it be parallelization, specialization on specific architectures or even favoured by certain compiler optimizations, it is your responsibility to make sure it is illuminated by the paper.

If you want to demonstrate any of those specialized cases, you are measuring efficiency, which is measured in cycles pr. operation. Calculate how fast your algorithm performs compared to others in that and you will be much better off.

Parallelization as an Excuse for Speed

Today, CPUs are no longer getting faster, they are just getting more cores. Since all previous work in computer science was centred around single core execution, no care was taken to parallelization. Researchers today take advantage of the many cores in computers, which often leads them to make the mistake of assuming that parallelization means more speed.

Speed is not the same as throughput. When we measure an algorithm, we measure the efficiency of the execution of some data. The fact that you can do it on two cores at the same time might double your throughput, but shows nothing about efficiency or speed. I often see the same mistake when measuring network bandwidth. For example:

You have a 100 megabit/second network connection. That means 100.000.000 bits can flow through your network card per second, and therefore your throughput is 100 megabits/second.

However, in order for that to make sense, you need to send it somewhere. How fast your network card can send the data out is the speed. Let's say you send 3 packets of 5 bits each, and it takes 1 milliseconds between each packet:


Your throughput/bandwidth is now said to be 5 bits/ms, and the speed is also 5 bits/ms. You decide to add another network card so you can send data in parallel:


You just doubled your bandwidth, but speed is still 5 bit/ms!

In this analogy, network cards are CPU cores. Measuring the speed of your algorithm by doubling the number of CPU cores you use makes no sense. It is a lot better if you illustrate the fact that your algorithm is parallelizable compared to others, but don't try to put some data into it and say that your algorithm is faster because it did the job in 50% of the time.

Reduction of Asymptotic Notation

I've read papers that state their algorithms are constant (O(1)) in running time. Their argument is that they split the problem into a smaller set of problems, and each problem takes 1/n to solve, and therefore the same as O(1).

No. That is not how it works.

Algorithmic complexity does not change with data. It only changes if you change the algorithm. Data has nothing to do with the algorithm's complexity.

What you are really measuring is the algorithms ability to solve a particular problem, and even then you can't say your algorithm is faster because of less data. You can't speed up a sorting algorithm by running it on smaller chunks of data and claim it to be O(1). A sorting algorithm can never be below O(n) running time as you have to compare to all elements.


Popular posts from this blog

Reducing the size of self-contained .NET Core applications

.NET Compression Libraries Benchmark

Broad And Narrow Phase Collision Detection Systems