{"id":2269,"date":"2011-10-09T18:02:21","date_gmt":"2011-10-09T08:02:21","guid":{"rendered":"http:\/\/www.vectorstorm.org\/?p=2269"},"modified":"2011-12-02T23:18:22","modified_gmt":"2011-12-02T13:18:22","slug":"profiling-is-fun","status":"publish","type":"post","link":"https:\/\/www.vectorstorm.com.au\/2011\/10\/09\/profiling-is-fun\/","title":{"rendered":"Profiling is fun"},"content":{"rendered":"

Actually, that’s a dirty lie.<\/p>\n

Profiling is fun when it shows you something big that’s easy to fix. \u00a0Profiling is a lot less fun when everything is equally slow. \u00a0But luckily in my case, there was something big to fix. \u00a0:)<\/p>\n

I noticed the other day that editing roads in MMORPG Tycoon 2 had become extremely slow. \u00a0(Well.. actually I noticed that roads didn’t draw at all. \u00a0It was only after I fixed them not drawing that I noticed that they were slow to edit). \u00a0 This was odd, because I have videos of early road editing, where it hardly touched the frame rate at all on a system much slower than the one I’m using now. \u00a0So after some profiling, I found that creating their procedural geometry was taking up about 75% of the whole processing time for the frame when they were created. \u00a0This was extremely bad when editing them, especially (because they’re regenerated every frame, when they’re being edited, or when the ground under them is being edited).<\/p>\n

Those of you who were at Freeplay will remember the part of my talk where I discussed optimising games, and when to do it (or rather, when not to do it).<\/p>\n

For those who weren’t there, the tl;dr version is this: \u00a0don’t optimise anything unless you have timing numbers that proves it’s slow relative to the rest of your program, and against which you’ll be able to judge your progress. \u00a0Ideally, you get these numbers by using a profiler of some sort. \u00a0(Visual Studio has one built in, modern Xcode uses Instruments while old Xcode used Shark, under Linux you have gperf, etc)<\/p>\n

What I didn’t say in that talk (and which I’ve regretted ever since), is that there’s a second criteria which can make it acceptable to optimise some code, even if you don’t have performance numbers. \u00a0Here is that other criteria:<\/p>\n

<\/p>\n

Monte Carlo Profiling<\/h1>\n
    \n
  1. Make a build of your game with optimisations turned on (a “release” build, or “retail” build, or “dist” build, or whatever you call it), but with debugging symbols still present.<\/li>\n
  2. Run this build through your debugger of choice.<\/li>\n
  3. At any time of your own choosing, pause the execution of the game, and examine the call stack shown in your debugger.<\/li>\n
  4. Repeat step 3 nine more times (for a total of ten pauses and ten callstacks).<\/li>\n
  5. If in seven of the ten total times you paused the game you were in the same area of code, then you may spend time optimising that area of code, even without first profiling it.<\/li>\n<\/ol>\n

    The Monte Carlo test is basically very quick and ad hoc profiling. \u00a0Profiling does exactly this same process, but instead of doing it ten times, it does it millions of times, far faster than you’re able to do it manually. \u00a0It avoids having to go to the bother of setting up proper profiling, and will still point out anything that’s extremely slow in your code.<\/p>\n

    But back to my story. \u00a0In my case, I did do real profiling. \u00a0Here’s what it looked like in Instruments:<\/p>\n

    \"\"<\/a>
    \n<\/a>

    \n<\/a>That graph is of CPU usage. \u00a0(Also, it totally breaks the site frame layout, but I don’t care. \u00a0I want you to see the details. \u00a0:) ) \u00a0The spike at the start is the game starting up (and generating all the world geometry all at once). \u00a0The low part in the middle is just me looking around a bit, and then the heavy bit that I have highlighted is me editing a road — just dragging one of its endpoints around the map at random.<\/p>\n

    As you can see from the screenshot, it’s spending about 30% of the frame time in vsMeshMaker::BakeTriangleVertex, 20% of its time in vsFloor (called from vsMeshMaker::BakeTriangleVertex), 16% of the frame time in vsVector2D::SqLength() (also called from vsMeshMaker::BakeTriangleVertex)… \u00a0you get the idea. \u00a0Add it all up, and we were spending about 75% of our CPU time inside BakeTriangleVertex. \u00a0That BakeTriangleVertex function was really, really heavy for some reason, and I needed to figure out why.<\/p>\n

    Basically what vsMeshMaker::BakeTriangleVertex does is to look at all the triangle vertices in a procedurally generated model, and to decide (one at a time) whether they should be merged with other nearby vertices to make a smoothly curving surface. \u00a0The hardest part of this is actually finding other nearby vertices. \u00a0Previously, I had sped this up by dividing up space into grid squares, and storing each vertex into the appropriate grid space. \u00a0Then when I wanted to know about nearby vertices, I only had to check the other vertices in the same grid space, not every other vertices in the model.<\/p>\n

    But as it turns out, those grid spaces were too large, especially for large objects like roads, and so I was still comparing far too many vertices for merging. \u00a0So I increased the number of grid spaces (and made each grid space smaller, so there would be fewer vertices to test in each), so that instead of having an 8x8x8 grid and needing to test every vertex in those large squares, we have a 32x32x32 grid and only need to test every vertex in much smaller squares. \u00a0I had also noticed that very simple functions like vsFloor and vsVector3D::SqLength() were showing up in the profiling, and so made those functions inline, to hopefully reduce the cost of calling them. \u00a0This led to this graph:<\/p>\n

    \"\"<\/a><\/p>\n

    This one is fascinating. \u00a0Notice that the amount of CPU being used has dropped substantially, compared to the previous one. \u00a0Also note that functions like “vsFloor” and “vsVector3D::SqLength” have vanished, since I made them inline (they’re now included in the caller’s time). \u00a0Increasing the resolution of my vertex lookup grid had sped things up a lot — we’re down from 75% of our frame time, down to an average of 40% of our frame time (and made a huge improvement to the frame rate). \u00a0But here’s the interesting thing that I hadn’t noticed previously:<\/p>\n

    Look at that graph, how it increases over time. \u00a0I wasn’t doing anything unusual in-game, just rapidly dragging one end of the road back and forth over a single area of terrain. \u00a0But the more time I spent editing the road, the slower editing became. \u00a0This eventually pointed me to a bug in my code, where some of the merging work from previous frames was being left-over to do again on all subsequent frames, even though it didn’t accomplish anything on the new frames. \u00a0So I fixed that bug, and profiled again:<\/p>\n

    \"\"<\/a><\/p>\n

    Great! \u00a0Getting rid of that leftover repeated work brought us down from 40% frame time (on average, during a long editing session) to about 2.6% frame time, low enough that you don’t even notice a frame rate hit in-game.<\/p>\n

    The problem now was that my grid was so fine, that it actually uses a lot of memory. \u00a0A 32x32x32 grid has about 32,000 spaces in it, and each of those needed storage to hold an arbitrary number of vertices. \u00a0And I had one of these for every road (and indeed, every building, tree, etc). \u00a0That builds up fast! \u00a0And while 2.6% is low enough to be acceptable, I would have liked it to be even lower. \u00a0Let’s face it, my placeholder “road” model is ridiculously simple. \u00a0I’d really like this code to be much faster than it is now, to handle the much more detailed meshes that I’ll be throwing at it later.<\/p>\n

    So I set out to replace the grid with a new structure: \u00a0an octree. \u00a0For those who are interested, this implementation has been ported back into the main VectorStorm trunk, as VS_PointOctree.h, inside the VS\/Utils\/ directory. \u00a0It’s basically a tree structure which expands as more data is added to it, so it should behave far more predictably than the grid structure. \u00a0What’s more, it expands as data is added, instead of being a fixed (huge) size, which means that smaller models use much less space, exactly as you’d hope. \u00a0Profiling isn’t much different (down to about 2%), but memory usage is drastically lower. \u00a0So I call that a win. \u00a0And in future, I may return to this code to see if I can’t whittle it down even a little further, once it starts moving up into a prime spot in the profiling again.<\/p>\n

    I’ve also been doing other stuff recently, but I reckon that this blog post is enough of a novel already, so I’ll save the other stuff for tomorrow.<\/p>\n","protected":false},"excerpt":{"rendered":"

    Actually, that’s a dirty lie. Profiling is fun when it shows you something big that’s easy to fix. \u00a0Profiling is a lot less fun when everything is equally slow. \u00a0But luckily in my case, there was something big to fix. \u00a0:) I noticed the other day that editing roads in MMORPG Tycoon 2 had become…<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":""},"categories":[4,24,25,3],"tags":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/po9WK-AB","_links":{"self":[{"href":"https:\/\/www.vectorstorm.com.au\/wp-json\/wp\/v2\/posts\/2269"}],"collection":[{"href":"https:\/\/www.vectorstorm.com.au\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.vectorstorm.com.au\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.vectorstorm.com.au\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vectorstorm.com.au\/wp-json\/wp\/v2\/comments?post=2269"}],"version-history":[{"count":0,"href":"https:\/\/www.vectorstorm.com.au\/wp-json\/wp\/v2\/posts\/2269\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.vectorstorm.com.au\/wp-json\/wp\/v2\/media?parent=2269"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vectorstorm.com.au\/wp-json\/wp\/v2\/categories?post=2269"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vectorstorm.com.au\/wp-json\/wp\/v2\/tags?post=2269"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}