April 11, 2012

Java Games and Performance

Some basic performance considerations for real-time Java:

Choose collection classes wisely.
What are the most frequently called operations ? Collections classes can give very different performance signatures when it comes to
- iterating all items
- index based direct access
- key based access
- insertion, appending or removing items
- keep items ordered
- concurrent access, whether algorithmic or by synchronization
- scaling / extensions of memory

Some hints:
- FastMap supports ordered maps
- Colt's maps can store primitives
- LinkedList is not that good for index access
- ArrayList isn't the best for finding concrete items
- Ever looked what ArrayList does when you remove an item (except for the last one) ?
- Choose the right initial capacity and growth behaviour
- For fast get/set access of hash maps, check out Trove's maps or just use Java's native HashMap
- Take care of collection classes which create additional objects for each added item like LinkedList and ConcurrentLinkedQueue
- Array based Bags are good and fast for adding as well as removing items when no item order is required
- The various cuckoo hash maps of LibGdx are said to be very fast and worth checking out
- After all, only profiling can show realtime behaviour - and can result in different conclusions for different platforms, machines, CPUs, CPU core counts, etc.

Do you know what the following code does ?
List names = ..
for (String name : names) {
...
}

Certainly you do, but do you also know what happens under the hood ?
These enhanced loops work with implicit created iterator objects. So if you use them in your main game loop, this ends up in creation of tons of avoidable object garbage.
Better choices are ArrayList with index access or maps like FastMap THashMap which offer callback loops.

What about this:
String guitarGod = "Jimi" + " " + " Hendrix";
Well, the coding makes no sense, the meaning very well does, but unfortunately Java creates StringBuilder objects to concatenate such strings. Again, for frequently called code fragments...
Better set up your own controllable StringBuilder.

When dealing with images, be sure to to create compatible images, for example by using
GraphicsConfiguration#createCompatibleImage(...) and
GraphicsConfiguration#createCompatibleVolatileImage(...)

That ensures compatible data and color models to prevent from implicit slow conversion while rendering images.

For hardware accelerated images use VolatileImage. Take care if you intend to fiddle with image pixels by yourself. Prior managed images might very well become unmanaged and thus lose any hardware acceleration (see VolatileBufferedToolkitImage Strategies).

Keep synchronization blocks short and be sure you know what you when it comes to multithreading ;)

Add delay or conditions to certain operation to lower the invocation frequency and save CPU cycles.
For example:
- calculating a field-of-view only needs to be updated if an actor was moved or the environment has changed
- sending postion updates from server to client might only be required for actors which have actually been moved
- don't call your genius-pixel-exact collision detection when the cool rough collection detector said: relax boy, too far away anyway
- write algorithms that are interruptable and can continue later on for spreading their execution over multiple game frames
But beware that processing heavy code kicking in only from time to time can cause cpu spikes and an unsteady game experience. Thus, the right balance must be found.

Using prerendered text images instead of drawing true type antialiased fonts each time letter by letter might relax your machine.

Avoid object creation from auto boxing.

Think about object caching for selected classes (not in general).

A profiler can easily reveal performance hot spots you have never thought of and keep you from tinkering the wrong code fragments.

Be careful with micro benchmarks:
  • let the JVM optimize during an appropriate warm-up period before measuring
  • prevent from dead code optimizations
  • compare results of client and server VMs
  • cache hits or misses influence benchmark results

No comments:

Post a Comment