Performance at a Glance

MetricTypical ValueNotes
Draw Calls1 per rendererAll shapes in a category render in a single draw call. GlobalFX uses a second renderer = 1 additional call.
Batches1–2Main reticle (1 batch) + Global FX (1 batch if active). Grid and layer connectors add 1 batch each when visible (editor only).
Main Thread< 0.1 msAnimation evaluation, buffer upload, state management. No per-frame allocations in steady state.
Render Thread< 0.05 msSingle fullscreen quad per renderer. GPU-bound, not render-thread-bound.
GPU (Fragment)0.05–0.3 msDepends on shape count, glow complexity, and screen fill. SDF evaluation is ALU-heavy but has zero texture bandwidth for standard shapes.
Memory (Runtime)~10–50 KBStructuredBuffer for ShapeData (~144 bytes × shape count), plus loadout data in managed memory.
Allocations0 per frameSteady-state rendering is allocation-free. Allocations occur only during loadout changes, transitions starting, and editor operations.

Draw Calls & Batches

RCE's single-draw-call architecture is one of its core design principles. Here's how it works and what contributes to your batch count:

Main Reticle Renderer

All shapes in the active combat category (SA, HF, or ADS) plus HUD shapes are packed into a single StructuredBuffer<ShapeData> and rendered with one full-screen draw call using the ReticleSDF shader. Boolean groups, gaps, glow, outline — everything evaluates in the same fragment shader pass. No multi-pass rendering, no render textures, no stencil operations.

Why 1 Draw Call Matters

Many HUD systems use one sprite/mesh per element, leading to 10–50+ batches for a complex reticle. RCE's SDF approach keeps the batch count constant regardless of shape complexity. Adding 10 more shapes to your reticle costs zero additional draw calls.

Global FX Renderer

If you use the Global category (hit markers, damage flashes, etc.), a second ReticleRenderer instance handles those shapes in 1 additional draw call. This renderer is completely independent — it doesn't interfere with the main reticle pipeline.

Editor-Only Rendering

In the editor (not shipped in builds), additional batches are added:

These are stripped from runtime builds where the editor UI is not active.


CPU Cost

Main Thread Breakdown

RCE's per-frame CPU work is minimal. In steady state (no transition, no loadout change), the main thread performs:

OperationTypical CostFrequency
ShapePropertyAnimator~0.02 msEvery frame (when animations active)
Buffer upload~0.01 msEvery frame (SetData on StructuredBuffer)
State checks< 0.01 msEvery frame
CategoryTransitionAnimator~0.05 msOnly during transitions (typically < 1 second)
Transition shape matching~0.02 msOnce per transition start

Animation Evaluation

All 7 animation modes (Spin, Oscillate, Pulse, PingPong, Sawtooth, Noise, Tween) are evaluated CPU-side as additive deltas. This is intentional — CPU-side animation allows the shader to remain simple and avoids per-shape branching in the fragment shader. The cost scales linearly with animated layer count, but even 20+ animated layers remain well under 0.1 ms.

Category Transitions

During a transition (typically 0.2–0.8 seconds), the CategoryTransitionAnimator interpolates between shape arrays. All 10 transition types are CPU-side. The one-time setup cost (shape matching for Morph transitions) is negligible. The per-frame interpolation cost during the transition is ~0.05 ms.

Render Thread

The render thread cost is effectively zero beyond what any single draw call costs. RCE issues one DrawProcedural call per renderer. There's no mesh submission, no material property block churn, and no render texture blits in the standard path.


GPU Cost

Fragment Shader Complexity

The ReticleSDF fragment shader evaluates every shape's SDF for every fragment. The cost is proportional to:

In practice, a reticle with 15–20 shapes including glow and boolean groups costs 0.1–0.2 ms on mid-range GPUs. The shader is ALU-bound (math-heavy) rather than bandwidth-bound, which is favorable on modern hardware where ALU is abundant.

Early-Out Optimization

The shader performs early-out checks for invisible shapes (alpha = 0, scale = 0). Shapes hidden by the category system never reach the SDF evaluation loop. During transitions, only the shapes being interpolated are evaluated.

Fill Rate

RCE renders a full-screen quad per renderer. On most hardware this is a non-issue since the fragment shader is lightweight. However, at extreme resolutions (4K+) with many shapes and heavy glow, fill rate can become the bottleneck. If this occurs:


Memory

GPU Memory

ResourceSizeNotes
ShapeData StructuredBuffer144 bytes × N shapesN = total shapes in active categories. Typically 10–40 shapes = 1.4–5.8 KB.
Custom SDF Texture2DArrayUp to 32 MB1024² × ARGBHalf (8 bytes/texel) × 16 slots. Only allocated when custom shapes exist. Each slot = 8 MB, but only used slots consume memory.
Shader constants< 1 KBUniform buffer with screen dimensions, time, etc.

CPU / Managed Memory

ResourceSizeNotes
LoadoutData5–15 KBFull loadout in managed memory. All categories, layers, animations, transitions.
Undo stack~250–500 KB50 snapshots × 5–10 KB each. Only in editor mode.
Animation state< 1 KBPer-entry timers and sequence state. Scales with animated layer count.
Transition state< 2 KBMatched pairs, interpolated shape arrays. Only during active transitions.

Allocations

Zero-Allocation Steady State

RCE is designed for zero per-frame GC allocations in runtime mode. The ShapeData[] arrays, animation timer dictionaries, and buffer upload paths are all pre-allocated and reused. No new, no LINQ, no string operations on the hot path.

When Allocations Occur

Allocations happen during infrequent events:

EventAllocationsFrequency
Loadout load/switch~10–20 KBOnce per loadout change (typically at game start or menu)
Transition start~2–5 KBOnce per state change (shape matching + array cloning)
Animation rebuild~1–2 KBOnce per state change (timer dictionary rebuild)
Custom SDF generation~1 KBOnce when a custom shape's control points change
Editor operations5–10 KB per undo snapshotPer user action (not shipped in builds)
JSON serialization~5–15 KBOn save (not during gameplay)
Profiling Tip

In Unity Profiler, RCE's runtime work appears under ReticleRenderer.Update, ShapePropertyAnimator.ApplyAnimations, and CategoryTransitionAnimator.EmitInterpolatedFrame. If you see GC.Alloc markers during steady-state rendering, something external is triggering a loadout rebuild — check for accidental SetActiveState() calls every frame with a new value.


Profiling Guide

Unity Profiler Markers

Key markers to look for in the Unity Profiler:

MarkerWhat It Measures
ReticleRenderer.LateUpdateBuffer upload and render command issue
ShapePropertyAnimator.ApplyAnimationsPer-frame animation delta computation
CategoryTransitionAnimator.UpdateTransition interpolation (only during transitions)
LoadoutRendererBridge.EmitShapesShape array sanitization and handoff
GlobalFXBridge.EmitGlobalShapesGlobal layer filtering and emission

GPU Profiling

Use the Frame Debugger (Window → Analysis → Frame Debugger) to inspect RCE's draw calls:

Common Bottlenecks

SymptomLikely CauseSolution
High GPU time on ReticleSDF Too many shapes with large glow at high resolution Reduce glowSize, consolidate shapes, or reduce render resolution
GC.Alloc spikes every frame Calling SetActiveState() with a changing value each frame Only call SetActiveState() when the state actually changes
Stutter when switching states Large custom shape SDF regeneration on transition Pre-warm custom SDFs by loading the loadout before gameplay starts
High memory from Texture2DArray Many custom shapes consuming texture slots Limit custom shapes to what's needed; prefer analytical shapes where possible

Scaling Guidelines

Shape Count

RCE comfortably handles 20–40 shapes per category on mid-range hardware. The fragment shader cost scales linearly with shape count, but the constant factor is low (each SDF evaluation is ~10–50 ALU ops). Extreme reticle designs with 60+ shapes are achievable but should be profiled on target hardware.

Custom Shapes

The Texture2DArray supports up to 16 custom shapes. Each occupies a 1024² ARGBHalf texture slot (~8 MB). The compute shader SDF generation runs once when control points change — not per frame. At runtime, custom shapes sample from the texture array, which is actually cheaper per-fragment than analytical SDF evaluation.

Mobile Considerations

RCE is designed for desktop/console but works on mobile with considerations:


What's Next