Advertising (This ad goes away for registered users. You can Login or Register)

Making collision detection fast

Forum rules
Forum rule Nº 15 is strictly enforced in this subforum.
Post Reply
tufty
Posts: 12
Joined: Sun Jul 28, 2013 5:50 pm

Making collision detection fast

Post by tufty » Mon Aug 03, 2015 11:38 am

Let's assume that we have a list of objects, and we want to do collision detection on them.

The trivial approach is something like this:

Code: Select all

for (uint32_t a = 0; a < max_objects; a++) {
  for (uint32_t b = 0; b < max_objects; b++) {
    collide(a, b);
  }
}
Obviously, this is a long way short of being efficient. The obvious issue here is that the double loop makes the complexity O(n^2), but there's also the "slight" problem with the collision detection being expensive in itself, sqrt() never comes cheap. The loop in itself can be brought down to O(nlogn) by making collide() deal update both objects at once, and rewriting the inner loop conditions as

Code: Select all

for (uint32_t b = a; b < max_objects; b++)
That's still a bunch too much work to be doing, though, as we're still doing collision detection for every possible pair of objects; we ought to be doing less. But hey. This is collision detection 101. You all know this, right?

Fast forward through 35+ years of collision detection research papers. Feel free to read some of them. Graphics Gems volumes 1 through 5 would be a good place to start, followed by the proceedings of the various computer graphics conferences.

So, as I've outlined in previous posts, we're keeping our objects in a (conceptual) fixed-size array, preallocated for the maximum number of objects in the game. And we're indexing them based on morton code, effectively a linear encoding of the depth-first traversal of a quadtree[1] data structure. So, we can use the morton code to find candidates for collision detection, which massively improves average complexity to something like O(logn^2), although the worst case stays the same.

That's what we're going to do.

The gist of this is that, for every object, we need to find its morton code, the morton codes of a number of other cells that may contain candidates for collision detection, and *only* perform the relatively expensive collide() funcion on the (hopefully) tiny subset of all game objects with one of those morton codes.

Simple, right?

Actually, no. Not so simple.

So, let's assume, for simplicity's sake, that all our objects are bounded not, as usual, by axis-aligned rectangles, but by circles. We must choose our spatial subdivision for morton encoding (i.e. the size of our quadtree leaves) to be larger than the maximum object bounding circle radius.

So (and ignoring velocity, this is instantaneous detection only for the moment), any object that might realistically collide with our source object must either have its centroid in the same quadtree leaf as our source object, or one of its immediate neighbours.

So, we must perform collide() on all objects in the same quadtree leaf as our source object, and with any object in a neighbouring leaf whose bounding circle overlaps our quadtree leaf. The latter of these is not entirely obvious, but follows from the fact that collide() acts not only on the source object, but on the candidate object as well (which is what brings our worst case complexity down to O(nlogn)). The following diagram illustrates.
Image
In this diagram, dealing with objects in the central cell (morton code 3), we need to collide objects A, B & C with each other, and each of them against D and E. No other objects impinge on cell 3, so they can be excluded early. Note how objects E and F both impinge on cell 2, so will need to be collision tested against G, but never need to be tested against each other.

Building the list of neighbour cells is not entirely trivial (although if we're tricky it's constant time), and iterating the objects in those neighbour cells is relatively expensive, so it makes sense to perform the tests on a cell-by-cell basis, rather than object-by-object. Remember, we have a fixed size quadtree with 0 <= x <= n objects per leaf node, rather than a floating size quadtree with a single object per leaf, thus each node comparison is potentially n to n.

So our algorithm looks like this.

Code: Select all

stop = first node.
do while stop exists
  start = stop
  stop = next node until stop.morton != start.morton

  for each source in start <= source < stop
    for each candidate in source < candidate < stop
      collide source, candidate
    end
  end

  for each morton in start.morton.neighbours
    candidate-start = find node by morton
    if candidate-start exists
      candidate-end = candidate-start
      candidate-end = next node until candidate-end.morton != morton

      for each candidate in candidate-start <= candidate < candidate-end
        if candidate.bounds overlaps start.morton
          for each source in start <= source < stop
            collide source, candidate
          end
        end
      end
    end
  end
end
This conservatively culls most things that we don't need to do collision testing with, and reduces a large number of collision tests that a trivial implementation might carry out to (very fast) "circle overlaps rectangle" tests.

So, what's the hitch?

Mainly, it's start.morton.neighbours above. I've (deliberately) glossed over this above, as it's a little bit complex. How *do* we find the neighbours for a given morton code?

The most obvious obvious way, given that a morton code is an encoding of a quadtree, is to do a recursive search for each direction, going up and down the "tree" as necessary. That's ugly and painful, and you really don't want to do it.

The next most obvious way is to take the X and Y coordinates of each cell, and interleave them to generate a morton code. Fortunately, we already know at least one coordinate pair from the source object(s) in the current cell, so we don't need to de-interleave the current cell's X and Y, but it's still quite a lot of work (it's either done via lookup tables, which have a cache hit, or via bit twiddling hacks, which are a fair amount of work.

So why can't we just add or subtract numbers to / from the morton codes themselves? After all, they contain the x any y coordinates. And the answer to that is "we can, kinda". The problem, though, is that we need to add and subtract x and y portions separately. A good starting point in terms of relevant literature on this is "Finding Neighbours of Equal Size in Linear Quadtrees and Octrees in Constant Time" by Gunther Schrack.

In short, we can extract the X and Y components of a (16 bit) morton code by

Code: Select all

x = m & 0x5555
y = m & 0xaaaa
Addition and subtraction are a bit tricky, as the numbers are interleaved: addition requires the "interleaved" bits to be set to 1, subtraction requires them to be set to 0, which makes our extracts instead

Code: Select all

// For addition
x = m | 0xaaaa
y = m | 0x5555
// For subtraction
x = m & 0x5555
y = m & 0xaaaa
Once we've finished adding or subtracting, we recombine by blowing away the interleaved bits and then logically "or"ing as follows:

Code: Select all

m = (x & 0x5555) | (y & 0xaaaa)
For our purposes, we only need to worry about one of addition and subtraction, as we only want to increment and decrement; by using constants of 1 and -1 we can decide to either add or subtract as necessary. On MIPS, and for 16 bit morton codes, we probably want to use addition, as MIPS has an "add unsigned 16 bit immediate value" opcode, which is not mirrored in subtraction. Finding a single neighbour is a constant time operation of lesser complexity than combining uninterleaved x and y values via bit twiddling. It gets better still when we consider we only need to do 4 adds (x and y, + and - 1) and combine as necessary to get all 8 neighbours.

There is a "gotcha" in this, though. You almost certainly have to do it in assembler, because although it all looks like "normal" maths, it's not, and you have no way of telling the compiler that. The compiler knows that adding -1 is the same as subtracting 1 and vice versa, and will happily blow away your carefully constructed sequence of operations without bothering to tell you. After all, the user doesn't care about the actual sequence of operations if the result is the same, right? Wrong. For example, the following (relatively inefficient) code to find the top right neighbour

Code: Select all

#define minus_one 0xffff
#define mask_x 0x5555
#define mask_y 0xaaaa

int16_t top_right(int16_t m) {
  return (((m & mask_x) - minus_one) & mask_x) | (((m & mask_y) - 1) & mask_y);
}
becomes, when compiled with -O3 and disassembled with objdump

Code: Select all

00000000 <top_right>:
   0:	3084ffff 	andi	a0,a0,0xffff
   4:	2403aaaa 	li	v1,-21846
   8:	00831024 	and	v0,a0,v1
   c:	30845555 	andi	a0,a0,0x5555
  10:	2442ffff 	addiu	v0,v0,-1
  14:	24840001 	addiu	a0,a0,1
  18:	00431824 	and	v1,v0,v1
  1c:	30845555 	andi	a0,a0,0x5555
  20:	00641025 	or	v0,v1,a0
  24:	03e00008 	jr	ra
  28:	7c021620 	seh	v0,v0
This is (fairly) tight code, but incorrect (we're preparing the components for subtraction by clearing the interleaved bits, and then adding, which will give us an incorrect answer). It's even less correct when compiled with clang -O3 for x64. Long and short of it - don't trust your compiler when you're subverting basic operations.

Here's an implementation with inline assembly for toroidal space:

Code: Select all

uint8_t neighbours_toroidal (uint16_t m, uint16_t * r) {
  uint16_t xc, yc, xp, xm, yp, ym;
  
  __asm__ volatile (
    "ori   %4, %6, 0xaaaa\n"
    "ori   %5, %6, 0x5555\n"
    "addiu %0, %4, 1\n"
    "addiu %1, %4, -1\n"
    "addiu %2, %5, 1\n"
    "addiu %3, %5, -1\n"
    "andi  %0, %0, 0x5555\n"
    "andi  %1, %1, 0x5555\n"
    "andi  %4, %4, 0x5555\n"
    "andi  %2, %2, 0xaaaa\n"
    "andi  %3, %3, 0xaaaa\n"
    "andi  %5, %5, 0xaaaa\n"
    : "=r" (xp), "=r" (xm), "=r" (yp), "=r" (ym), "=r" (xc), "=r" (yc) : "r" (m));
  r[0] = xm | ym;
  r[1] = xc | ym;
  r[2] = xp | ym;
  r[3] = xm | yc;
  r[4] = xp | yc;
  r[5] = xm | yp;
  r[6] = xc | yp;
  r[7] = xp | yp;
  return 0;
}
For clipped space, we need to treat everything as uint32_t, and cull cells with morton codes that overflow in 16 bits. The majority of the calculations, however, look the same.

I'm not suggesting this is the be-all and end-all of collision detection, of course. For starters, it really needs adjusting to handle relative movement, but it's a decent start. On another platform, I'd be looking at SIMD implementations. For the PSP, we might be able to make collide() use the VFPU in some cases, but the loop itself has to be CPU based (unless there's some currently undocumented bitwise ops on the VFPU, of course).

[1] Quadtree because I only really care about 2 dimensions. But it all holds for octrees, you filthy "third dimensionist", you.
Advertising

User avatar
noname120
Developer
Posts: 776
Joined: Thu Oct 07, 2010 4:29 pm

Re: Making collision detection fast

Post by noname120 » Wed Aug 05, 2015 7:40 am

Nice article, thank you again!
Advertising
Funny stuff
<yifanlu> I enjoy being loud and obnoxious
<yifanlu> rooting an android is like getting a hooker pregnant
<xerpi> I sometimes think I should leave all this stressing **** and be a farmer instead

Post Reply

Return to “Programming and Security”