Well, it took me a few days to realise that this had actually gone up on the developers pages.
I was quite pleased with this particular article because it covers quite a few optimisation concepts.
- Using optimisation. Might sound obvious, but compiling without optimisation flags results in poor performance. If you want better performance you've got to ask for it.
- Manually adding prefetch statements. Most people don't want to have to do this - which is fair enough. Most of the time the compiler is smart enough to do this for you, this particular situation happens to be a corner case where it doesn't catch it.
- Using templates to insert VIS instructions into the code. I normally reach straight for the inline templates when dealing with VIS, and it's rather nice not to have to.
- Using inline templates to insert VIS instructions. It's good to have an example of using inline templates to complement the paper on them. The final code is still not optimised, and one of my colleagues commented that I'd not unrolled and pipelined the code. There's probably quite a bit of performance still to be extracted, but at this point the article was sufficiently long.