Draw calls, memory footprint, CPU threading, GPU cost, and allocations. Everything you need to profile and optimize RCE in your game.
| Metric | Typical Value | Notes |
|---|---|---|
| Draw Calls | 1 per renderer | All shapes in a category render in a single draw call. GlobalFX uses a second renderer = 1 additional call. |
| Batches | 1–2 | Main reticle (1 batch) + Global FX (1 batch if active). Grid and layer connectors add 1 batch each when visible (editor only). |
| Main Thread | < 0.1 ms | Animation evaluation, buffer upload, state management. No per-frame allocations in steady state. |
| Render Thread | < 0.05 ms | Single fullscreen quad per renderer. GPU-bound, not render-thread-bound. |
| GPU (Fragment) | 0.05–0.3 ms | Depends on shape count, glow complexity, and screen fill. SDF evaluation is ALU-heavy but has zero texture bandwidth for standard shapes. |
| Memory (Runtime) | ~10–50 KB | StructuredBuffer for ShapeData (~144 bytes × shape count), plus loadout data in managed memory. |
| Allocations | 0 per frame | Steady-state rendering is allocation-free. Allocations occur only during loadout changes, transitions starting, and editor operations. |
RCE's single-draw-call architecture is one of its core design principles. Here's how it works and what contributes to your batch count:
All shapes in the active combat category (SA, HF, or ADS) plus HUD shapes are packed into a single StructuredBuffer<ShapeData> and rendered with one full-screen draw call using the ReticleSDF shader. Boolean groups, gaps, glow, outline — everything evaluates in the same fragment shader pass. No multi-pass rendering, no render textures, no stencil operations.
Many HUD systems use one sprite/mesh per element, leading to 10–50+ batches for a complex reticle. RCE's SDF approach keeps the batch count constant regardless of shape complexity. Adding 10 more shapes to your reticle costs zero additional draw calls.
If you use the Global category (hit markers, damage flashes, etc.), a second ReticleRenderer instance handles those shapes in 1 additional draw call. This renderer is completely independent — it doesn't interfere with the main reticle pipeline.
In the editor (not shipped in builds), additional batches are added:
GridSDF shader, only when grid is visible)LayerConnectorLines shader)These are stripped from runtime builds where the editor UI is not active.
RCE's per-frame CPU work is minimal. In steady state (no transition, no loadout change), the main thread performs:
| Operation | Typical Cost | Frequency |
|---|---|---|
| ShapePropertyAnimator | ~0.02 ms | Every frame (when animations active) |
| Buffer upload | ~0.01 ms | Every frame (SetData on StructuredBuffer) |
| State checks | < 0.01 ms | Every frame |
| CategoryTransitionAnimator | ~0.05 ms | Only during transitions (typically < 1 second) |
| Transition shape matching | ~0.02 ms | Once per transition start |
All 7 animation modes (Spin, Oscillate, Pulse, PingPong, Sawtooth, Noise, Tween) are evaluated CPU-side as additive deltas. This is intentional — CPU-side animation allows the shader to remain simple and avoids per-shape branching in the fragment shader. The cost scales linearly with animated layer count, but even 20+ animated layers remain well under 0.1 ms.
During a transition (typically 0.2–0.8 seconds), the CategoryTransitionAnimator interpolates between shape arrays. All 10 transition types are CPU-side. The one-time setup cost (shape matching for Morph transitions) is negligible. The per-frame interpolation cost during the transition is ~0.05 ms.
The render thread cost is effectively zero beyond what any single draw call costs. RCE issues one DrawProcedural call per renderer. There's no mesh submission, no material property block churn, and no render texture blits in the standard path.
The ReticleSDF fragment shader evaluates every shape's SDF for every fragment. The cost is proportional to:
In practice, a reticle with 15–20 shapes including glow and boolean groups costs 0.1–0.2 ms on mid-range GPUs. The shader is ALU-bound (math-heavy) rather than bandwidth-bound, which is favorable on modern hardware where ALU is abundant.
The shader performs early-out checks for invisible shapes (alpha = 0, scale = 0). Shapes hidden by the category system never reach the SDF evaluation loop. During transitions, only the shapes being interpolated are evaluated.
RCE renders a full-screen quad per renderer. On most hardware this is a non-issue since the fragment shader is lightweight. However, at extreme resolutions (4K+) with many shapes and heavy glow, fill rate can become the bottleneck. If this occurs:
glowSize — Smaller glow radii reduce the effective per-fragment cost| Resource | Size | Notes |
|---|---|---|
| ShapeData StructuredBuffer | 144 bytes × N shapes | N = total shapes in active categories. Typically 10–40 shapes = 1.4–5.8 KB. |
| Custom SDF Texture2DArray | Up to 32 MB | 1024² × ARGBHalf (8 bytes/texel) × 16 slots. Only allocated when custom shapes exist. Each slot = 8 MB, but only used slots consume memory. |
| Shader constants | < 1 KB | Uniform buffer with screen dimensions, time, etc. |
| Resource | Size | Notes |
|---|---|---|
| LoadoutData | 5–15 KB | Full loadout in managed memory. All categories, layers, animations, transitions. |
| Undo stack | ~250–500 KB | 50 snapshots × 5–10 KB each. Only in editor mode. |
| Animation state | < 1 KB | Per-entry timers and sequence state. Scales with animated layer count. |
| Transition state | < 2 KB | Matched pairs, interpolated shape arrays. Only during active transitions. |
RCE is designed for zero per-frame GC allocations in runtime mode. The ShapeData[] arrays, animation timer dictionaries, and buffer upload paths are all pre-allocated and reused. No new, no LINQ, no string operations on the hot path.
Allocations happen during infrequent events:
| Event | Allocations | Frequency |
|---|---|---|
| Loadout load/switch | ~10–20 KB | Once per loadout change (typically at game start or menu) |
| Transition start | ~2–5 KB | Once per state change (shape matching + array cloning) |
| Animation rebuild | ~1–2 KB | Once per state change (timer dictionary rebuild) |
| Custom SDF generation | ~1 KB | Once when a custom shape's control points change |
| Editor operations | 5–10 KB per undo snapshot | Per user action (not shipped in builds) |
| JSON serialization | ~5–15 KB | On save (not during gameplay) |
In Unity Profiler, RCE's runtime work appears under ReticleRenderer.Update, ShapePropertyAnimator.ApplyAnimations, and CategoryTransitionAnimator.EmitInterpolatedFrame. If you see GC.Alloc markers during steady-state rendering, something external is triggering a loadout rebuild — check for accidental SetActiveState() calls every frame with a new value.
Key markers to look for in the Unity Profiler:
| Marker | What It Measures |
|---|---|
ReticleRenderer.LateUpdate | Buffer upload and render command issue |
ShapePropertyAnimator.ApplyAnimations | Per-frame animation delta computation |
CategoryTransitionAnimator.Update | Transition interpolation (only during transitions) |
LoadoutRendererBridge.EmitShapes | Shape array sanitization and handoff |
GlobalFXBridge.EmitGlobalShapes | Global layer filtering and emission |
Use the Frame Debugger (Window → Analysis → Frame Debugger) to inspect RCE's draw calls:
ReticleSDF| Symptom | Likely Cause | Solution |
|---|---|---|
High GPU time on ReticleSDF |
Too many shapes with large glow at high resolution | Reduce glowSize, consolidate shapes, or reduce render resolution |
| GC.Alloc spikes every frame | Calling SetActiveState() with a changing value each frame |
Only call SetActiveState() when the state actually changes |
| Stutter when switching states | Large custom shape SDF regeneration on transition | Pre-warm custom SDFs by loading the loadout before gameplay starts |
| High memory from Texture2DArray | Many custom shapes consuming texture slots | Limit custom shapes to what's needed; prefer analytical shapes where possible |
RCE comfortably handles 20–40 shapes per category on mid-range hardware. The fragment shader cost scales linearly with shape count, but the constant factor is low (each SDF evaluation is ~10–50 ALU ops). Extreme reticle designs with 60+ shapes are achievable but should be profiled on target hardware.
The Texture2DArray supports up to 16 custom shapes. Each occupies a 1024² ARGBHalf texture slot (~8 MB). The compute shader SDF generation runs once when control points change — not per frame. At runtime, custom shapes sample from the texture array, which is actually cheaper per-fragment than analytical SDF evaluation.
RCE is designed for desktop/console but works on mobile with considerations: