Posts

Showing posts with the label Optimization

Optimizing Performance On The Xbox 360 - Part 2

In part 1 I described the reasons why Xbox 360 is slower than an equivalent PC and  some of the basic recommendations for optimizing C# code to make it run faster. In this post I will mention a few other tricks to increase performance on the Xbox 360. Structures… again. In the .NET world, objects are allocated on the heap and structures are allocated on the stack. Remember, only the heap is garbage collected, this means that structures are not subject to the expensive garbage collection operation. Allocation on the stack is basically free as we only need to increase a stack pointer and call a constructor. This means you can reduce garbage collection and increase speed by using structures instead of objects – however, you need to know the difference between the two. You should also follow the general recommendations for using structures: Small data objects (data containers) Often destroyed or created Read-only (or mostly-read) objects Reuse those objects As m...

Optimizing Performance On The Xbox 360 - Part 1

Image
After my Performance Tips and Tricks series I got a lot of requests to write a post regarding the performance on the Xbox 360 platform. I will try to focus on why Xbox 360 is slower than PC and what we can do to get around it. But first we need to underline the importance of good programming. Algorithms and datastructure complexity I’ve been on the topic of complexity before . I can’t underline how important it is that we keep the complexity of our algorithms low. If you are sorting a large dataset, make sure you use the correct algorithm and if you are sorting a small dataset (<10 items), there is another and faster algorithm. Note: If you use .Net List<T>.Sort() , it automatically chooses the best algorithm depending on your dataset. If you are using pixel perfect collision detection, perhaps you should implement a physics system that uses a polygon collision system instead. You could also implement a broad phase system to filter out collisions that you don’t nee...

Performance tips and tricks series

In the past I posted 4 blogs posts with ways to optimize your applications using keywords, compiler tricks and approximations. Here is a quick overview of the blog posts: Part 1 – Common sense in application development. Use constants and loop manipulation. Part 2 – Most commonly used keywords in C#: sealed, static, struct, ref, out and unsafe Part 3 – Information about method inlining. Notes on virtual calls, struct arguments and exception handling. Short introduction to loop unrolling. Part 4 – Boxing and unboxing, more on exception handling and string operations Hopefully I will get the time to write some Xbox360 specific optimizations in the near future. I’m busy with the development of Farseer Physics Engine 3.0 at the moment, so it might take some time.

Floating and fixed point arithmetic

Image
A user posted a question on the Farseer Physics Engine forums about fixed point math. The topic include numeral systems , implementation details and performance. A whole lot of posts have been written about it. I’m not going to provide a full introduction to fixed point math, I’m simply going to provide an abstract answer to the questions (one that apply to similar questions). Background Back in the good old days (before the Intel i486 CPU ) all applications used fixed point arithmetic. Floating point was something new where you could get a tradeoff between space, value range and precision, but at the cost of performance. Let us take a quick example: A fixed point number with the radix (point/dot) after the forth digit: 5186.3924, would be represented with a floating point as 51863924 with an exponent of 3: 5.1863924*10^3 = 5186.3924 Let us have a look at the up’s and down’s of floating point math. Positive side of floating point There are both positive and negative s...

Convex polygon based collision detection

Image
A lot of people have asked me: Why use a convex polygon only engine like Box2D? In this post I will provide some info on why convex polygons is the optimal polygon type to use. What does convex mean? Convex polygons needs to satisfy the following requirements: All internal angles must be below 180 degrees Every line segment between two vertices can be contained fully inside or at the edge the polygon A non-convex polygon is called a concave polygon and it must have an internal angle measuring greater than 180 degrees. As you might have thought about yourself; concave polygons can be quite handy. Convex-only engines can’t represent a simple player polygon like Pac-Man . Why use convex-only? There are really only one big juicy reason: performance Physics engines are able to take quite a few shortcuts and use cheap algorithms to determine the closest vertex, shortest distance and a lot more. An algorithm like Separating Axis Theorem and the Gilbert – Johnson –...

Performance tips and tricks – part 4

Image
In part 3 I explained some common manual compiler optimization tricks. This time we go back to the code itself to find places where optimizations does a difference. Boxing and unboxing Boxing is the process of encapsulating a value type into a reference type. Unboxing is the reverse process. They both take some time and might happen several times in your application without your knowledge. Here is some example code on boxing/unboxing:   Since int is a primitive and thus a value type, it gets boxed and stored on the heap (where reference types live, compared to stack where value types live). This makes our lifes easy, but it also comes at a performance penalty . You can easily identify boxing and unboxing operations by their IL code names: Boxing is called box and unboxing is called unbox in IL code. You can use Reflector to have a look at the IL code generated by your application. Boxing is very expensive performance wise. Try to avoid encapsulating value typ...

Raycasting explained

Image
Raycasting can be an extremely useful tool in computer games. There is a lot of confusion on what raycasting really is and how it can be used. I’ll try to explain some it in simple terms and give some simple examples. Raycasting - as the name suggests involves a ray – but what exactly is a ray? A ray is part of a line which is finite in one direction, but infinite in the other. It can be defined by two points, the initial point, A, and one other, B. Source: Wikipedia In Farseer Physics Engine we define a ray as two vectors; start and end. We can for obvious reasons not have an infinite length ray, we have to have some kind of boundary where the ray stops. This means that rays essentially become lines and then we simply use those lines to test for collisions with other stuff like other lines, AABB or geometries. Collision testing There are 4 test cases that can be performed: Point Line AABB Geometry The great thing about all those (except number ...

Simulators And Determinism

Image
When working with simulators, it is important to understand determinism and why we have trouble with determinism across different platform and configurations. Simulators that are iterative converge their solution over multiple iterations and can accumulate system-inherited errors and thus can become unstable over time. Why is determinism important Does determinism in simulation systems even matter? The short answer is yes, very important. The long answer is that simulations systems usually base their current world on the previous world. It takes one step at the time (time step) and calculate everything based on the previous step. If you shoot a projectile towards a target it, the trajectory of the projectile will travel in a certain curve. If you run the same simulation on a different platform, the projectile might end up somewhere totally different than the first simulation. Take the following example with a projectile trajectory. They both have the same initial conditions. S...

Performance tips and tricks – part 3

Image
In part 2 I gave a list of the different C# keywords that could potentially increase the performance of your code. This time we take a look at the different manual inlining techniques – it will not be like the two previous posts as this one can’t be presented in simple code snippets. The compiler is normally in charge of doing those kinds of optimizations, but sometimes it might not because of some requirements. First off, let’s get an overview of the different optimizations done by the C# compiler: Constant folding Constant and copy propagation Common subexpression elimination Code motion of loop invariants Dead store and dead code elimination Register allocation Method inlining Loop unrolling (small loops with small bodies) More… As you see, it is quite effective and contains the mostly used optimization techniques required by modern compilers. Why manually optimize? Well, the problem is that while the compiler almost always take...

Performance tips and tricks – part 2

Image
In part 1 I presented you with some tricks on loop optimizations and constants. This time I’ll show you some simple keywords in C# that can speed up the execution of your code for the various reasons. Sealed keyword Marking a class as sealed makes it it unable to be used as a base class. This is primarily used to prevent other classes from deriving from it. This can unlock some run-time optimizations. Before: After: Static keyword You can use the static keyword to instantiate a copy of a class at run-time and only keep that one copy in memory. You can read more about the static keyword in the link – be careful with it, you should only use it where appropriate. Before: After:   Struct keyword There are two types of allocations in C#: Stack and heap The stack is where all the value types go: int, char, enum, structs and the like. The heap is where all the reference types go: class, interface, string and the like. The stack does not get garbag...

Performance tips and tricks – part 1

Image
In this post series I will take a look at the different tips and tricks that can increase the performance of your application. You probably already know some of them and some of them are even just a bit of logical thinking. In any case, it should help you remember the different techniques. I took all these tips and tricks out of my article High Performance Game Development (The revised version – coming out soon) since they apply to almost all languages. Move work out of loops This one is kind of obvious, but it can be hard to identify in some cases. You simply move work from inner loops to outer loops or even out of a loop altogether. I’ve made some simple examples of this: Before: After: Don’t use loops This is much like the previous one. One important difference is that you don’t use loops at all. You will save a lot of instructions and thus you will have better performance. Before: After: Use constants whenever possible Using constants can greatly im...

The famous inverse square root

Image
I’ve already mentioned before that you can approximate divisions using a division-free Newton-Raphson approximation. You can also use the Newton-Raphson method for approximating the square root and the inverse square root function. But first, what do we actually use inverse square root and square root for? A lot really – everything that uses a normalized vector; collision responses, lightning and reflection to mention a few. It is an important part of any game and the operation has to be as fast as possible. Back in the day when computers had weaker CPUs and no specialized hardware to handle floating point operations (Today we even have specialized physics cards and specialized lightning circuits), some steps had to be taken to compute the inverse square root a million times each second as quickly as possible. And here it is: I could go into details on how this works, but I think it has been described sufficiently on Wikipedia's article Fast inverse square root . Suffice t...

Mathematical operations and micro-benchmarking

Image
You might have heard that the division operation on a modern FPU is slower than a multiplication operation. If you have not heard this before; it’s true – divisions are indeed slower. But why? Without going into the technical details, the divide operation is an iterative algorithm and the multiplication operation is computed directly. One of the more common ways of doing divide operations on a FPU is to use a divide-free Newton-Raphson approximation to get the reciprocal of the denominator and then multiply with the numerator. More on this will come in a later post. Let’s put it to the test. Here we run 10.000.000 iterations where we divide 500 by 11.5 Here we multiply 500 by the multiplicative inverse (reciprocal) of 11.5 From what I described earlier our multiplication example (code example 2) should perform better than the division example. Here are the results: Division : 00:00:00.0010787 Multiplication : 00:00:00.0010871 Something is clearly wrong . First...