Pestis

GPU Architecture

Optimisations

Indirect Rendering

https://docs.unity3d.com/6000.0/Documentation/ScriptReference/Graphics.RenderPrimitivesIndirect.html

I think this means I can batch a bunch of different hordes’ draw calls together?

Sprite Atlas

Fit all sprites onto 1753 x 1753 atlas, if I cut each sprite down to 32x32 which should be fine. Then 5 directions for each sprite, as I can just mirror them. So 32 x 32 x 5 = 3,072,000 32325=3,072,0001753232 * 32 * 5 = 3,072,000 \approx 1753^2. Which is ~12MB of vram!!!

Then hopefully I can batch some draw calls together since they’re technically all using one material. And then reduce cache invalidations!

Thread Group Counts & LDS

https://gpuopen.com/learn/optimizing-gpu-occupancy-resource-usage-large-thread-groups/


https://www.artstation.com/blogs/degged/Ow6W/compute-shaders-in-unity-boids-simulation-on-gpu-shared-memory

https://lisyarus.github.io/blog/posts/compute-blur.html#section-compute-naive-lds

Basically involves each thread group using its own local memory to store the specific boids it knows it will care about. Means I’ll have to make sure each thread group is specifically accessing boids in the same neighbourhood

Loop Unrolling

Reduce data sent to/from CPU

Not all info about a boid needs to be sent to/from the CPU

Maybe can also resize buffers on the GPU instead of having to go via CPU.

Avoid cast from int to float

Swap use of sign() for a ternary to avoid having to cast the int result to a float

Tiling

https://developer.nvidia.com/blog/optimizing-compute-shaders-for-l2-locality-using-thread-group-id-swizzling/ https://github.com/LouisBavoil/ThreadGroupIDSwizzling/blob/master/ThreadGroupTilingX.hlsl

Testing

Other

Bots

%%🖋 Edit in Excalidraw%%

Created 4/21/2025
Tended
  • 4/21/2025
  • 4/26/2025
  • 6/14/2025
  • 6/15/2025
  • 6/22/2025