https://docs.unity3d.com/6000.0/Documentation/ScriptReference/Graphics.RenderPrimitivesIndirect.html
I think this means I can batch a bunch of different hordes’ draw calls together?
Fit all sprites onto 1753 x 1753 atlas, if I cut each sprite down to 32x32 which should be fine. Then 5 directions for each sprite, as I can just mirror them. So 32 x 32 x 5 = 3,072,000 . Which is ~12MB of vram!!!
Then hopefully I can batch some draw calls together since they’re technically all using one material. And then reduce cache invalidations!
https://gpuopen.com/learn/optimizing-gpu-occupancy-resource-usage-large-thread-groups/
https://lisyarus.github.io/blog/posts/compute-blur.html#section-compute-naive-lds
Basically involves each thread group using its own local memory to store the specific boids it knows it will care about. Means I’ll have to make sure each thread group is specifically accessing boids in the same neighbourhood
Not all info about a boid needs to be sent to/from the CPU
Maybe can also resize buffers on the GPU instead of having to go via CPU.
Swap use of sign()
for a ternary to avoid having to cast the int result to a float
https://developer.nvidia.com/blog/optimizing-compute-shaders-for-l2-locality-using-thread-group-id-swizzling/ https://github.com/LouisBavoil/ThreadGroupIDSwizzling/blob/master/ThreadGroupTilingX.hlsl