Performance — RCE Documentation

Performance at a Glance

Metric	Typical Value	Notes
Draw Calls	1 per renderer	All shapes in a category render in a single draw call. GlobalFX uses a second renderer = 1 additional call.
Batches	1–2	Main reticle (1 batch) + Global FX (1 batch if active). Grid and layer connectors add 1 batch each when visible (editor only).
Main Thread	< 0.1 ms	Animation evaluation, buffer upload, state management. No per-frame allocations in steady state.
Render Thread	< 0.05 ms	Single fullscreen quad per renderer. GPU-bound, not render-thread-bound.
GPU (Fragment)	0.05–0.3 ms	Depends on shape count, glow complexity, and screen fill. SDF evaluation is ALU-heavy but has zero texture bandwidth for standard shapes.
Memory (Runtime)	~10–50 KB	StructuredBuffer for ShapeData (~144 bytes × shape count), plus loadout data in managed memory.
Allocations	0 per frame	Steady-state rendering is allocation-free. Allocations occur only during loadout changes, transitions starting, and editor operations.

Draw Calls & Batches

RCE's single-draw-call architecture is one of its core design principles. Here's how it works and what contributes to your batch count:

Main Reticle Renderer

All shapes in the active combat category (SA, HF, or ADS) plus HUD shapes are packed into a single StructuredBuffer<ShapeData> and rendered with one full-screen draw call using the ReticleSDF shader. Boolean groups, gaps, glow, outline — everything evaluates in the same fragment shader pass. No multi-pass rendering, no render textures, no stencil operations.

Why 1 Draw Call Matters

Many HUD systems use one sprite/mesh per element, leading to 10–50+ batches for a complex reticle. RCE's SDF approach keeps the batch count constant regardless of shape complexity. Adding 10 more shapes to your reticle costs zero additional draw calls.

Global FX Renderer

If you use the Global category (hit markers, damage flashes, etc.), a second ReticleRenderer instance handles those shapes in 1 additional draw call. This renderer is completely independent — it doesn't interfere with the main reticle pipeline.

Editor-Only Rendering

In the editor (not shipped in builds), additional batches are added:

Grid — 1 batch (GridSDF shader, only when grid is visible)
Layer Connectors — 1 batch (LayerConnectorLines shader)
Custom Shape Overlay — 1 batch (GL-based rendering of control points and handles)

These are stripped from runtime builds where the editor UI is not active.

CPU Cost

Main Thread Breakdown

RCE's per-frame CPU work is minimal. In steady state (no transition, no loadout change), the main thread performs:

Operation	Typical Cost	Frequency
ShapePropertyAnimator	~0.02 ms	Every frame (when animations active)
Buffer upload	~0.01 ms	Every frame (SetData on StructuredBuffer)
State checks	< 0.01 ms	Every frame
CategoryTransitionAnimator	~0.05 ms	Only during transitions (typically < 1 second)
Transition shape matching	~0.02 ms	Once per transition start

Animation Evaluation

All 7 animation modes (Spin, Oscillate, Pulse, PingPong, Sawtooth, Noise, Tween) are evaluated CPU-side as additive deltas. This is intentional — CPU-side animation allows the shader to remain simple and avoids per-shape branching in the fragment shader. The cost scales linearly with animated layer count, but even 20+ animated layers remain well under 0.1 ms.

Category Transitions

During a transition (typically 0.2–0.8 seconds), the CategoryTransitionAnimator interpolates between shape arrays. All 10 transition types are CPU-side. The one-time setup cost (shape matching for Morph transitions) is negligible. The per-frame interpolation cost during the transition is ~0.05 ms.

Render Thread

The render thread cost is effectively zero beyond what any single draw call costs. RCE issues one DrawProcedural call per renderer. There's no mesh submission, no material property block churn, and no render texture blits in the standard path.

GPU Cost

Fragment Shader Complexity

The ReticleSDF fragment shader evaluates every shape's SDF for every fragment. The cost is proportional to:

Shape count — Each shape adds one SDF evaluation (typically 10–50 ALU ops per shape)
Boolean groups — Min/max operations on SDF values; negligible additional cost
Glow — Exponential falloff calculation per shape with glow enabled
Custom shapes — Texture2DArray sample instead of analytical SDF; slightly cheaper per-fragment but uses texture bandwidth

In practice, a reticle with 15–20 shapes including glow and boolean groups costs 0.1–0.2 ms on mid-range GPUs. The shader is ALU-bound (math-heavy) rather than bandwidth-bound, which is favorable on modern hardware where ALU is abundant.

Early-Out Optimization

The shader performs early-out checks for invisible shapes (alpha = 0, scale = 0). Shapes hidden by the category system never reach the SDF evaluation loop. During transitions, only the shapes being interpolated are evaluated.

Fill Rate

RCE renders a full-screen quad per renderer. On most hardware this is a non-issue since the fragment shader is lightweight. However, at extreme resolutions (4K+) with many shapes and heavy glow, fill rate can become the bottleneck. If this occurs:

Reduce glowSize — Smaller glow radii reduce the effective per-fragment cost
Reduce shape count — Combine shapes where possible using boolean groups
Consider rendering at a lower resolution and upscaling (engine-level setting, not RCE-specific)

Memory

GPU Memory

Resource	Size	Notes
ShapeData StructuredBuffer	144 bytes × N shapes	N = total shapes in active categories. Typically 10–40 shapes = 1.4–5.8 KB.
Custom SDF Texture2DArray	Up to 32 MB	1024² × ARGBHalf (8 bytes/texel) × 16 slots. Only allocated when custom shapes exist. Each slot = 8 MB, but only used slots consume memory.
Shader constants	< 1 KB	Uniform buffer with screen dimensions, time, etc.

CPU / Managed Memory

Resource	Size	Notes
LoadoutData	5–15 KB	Full loadout in managed memory. All categories, layers, animations, transitions.
Undo stack	~250–500 KB	50 snapshots × 5–10 KB each. Only in editor mode.
Animation state	< 1 KB	Per-entry timers and sequence state. Scales with animated layer count.
Transition state	< 2 KB	Matched pairs, interpolated shape arrays. Only during active transitions.

Allocations

Zero-Allocation Steady State

RCE is designed for zero per-frame GC allocations in runtime mode. The ShapeData[] arrays, animation timer dictionaries, and buffer upload paths are all pre-allocated and reused. No new, no LINQ, no string operations on the hot path.

When Allocations Occur

Allocations happen during infrequent events:

Event	Allocations	Frequency
Loadout load/switch	~10–20 KB	Once per loadout change (typically at game start or menu)
Transition start	~2–5 KB	Once per state change (shape matching + array cloning)
Animation rebuild	~1–2 KB	Once per state change (timer dictionary rebuild)
Custom SDF generation	~1 KB	Once when a custom shape's control points change
Editor operations	5–10 KB per undo snapshot	Per user action (not shipped in builds)
JSON serialization	~5–15 KB	On save (not during gameplay)

Profiling Tip

In Unity Profiler, RCE's runtime work appears under ReticleRenderer.Update, ShapePropertyAnimator.ApplyAnimations, and CategoryTransitionAnimator.EmitInterpolatedFrame. If you see GC.Alloc markers during steady-state rendering, something external is triggering a loadout rebuild — check for accidental SetActiveState() calls every frame with a new value.

Profiling Guide

Unity Profiler Markers

Key markers to look for in the Unity Profiler:

Marker	What It Measures
`ReticleRenderer.LateUpdate`	Buffer upload and render command issue
`ShapePropertyAnimator.ApplyAnimations`	Per-frame animation delta computation
`CategoryTransitionAnimator.Update`	Transition interpolation (only during transitions)
`LoadoutRendererBridge.EmitShapes`	Shape array sanitization and handoff
`GlobalFXBridge.EmitGlobalShapes`	Global layer filtering and emission

GPU Profiling

Use the Frame Debugger (Window → Analysis → Frame Debugger) to inspect RCE's draw calls:

Look for "Draw Procedural" calls from ReticleSDF
The ShapeData buffer will be visible in the shader properties — you can inspect individual shapes
For GPU timing, use RenderDoc or your platform's GPU profiler

Common Bottlenecks

Symptom	Likely Cause	Solution
High GPU time on `ReticleSDF`	Too many shapes with large glow at high resolution	Reduce `glowSize`, consolidate shapes, or reduce render resolution
GC.Alloc spikes every frame	Calling `SetActiveState()` with a changing value each frame	Only call `SetActiveState()` when the state actually changes
Stutter when switching states	Large custom shape SDF regeneration on transition	Pre-warm custom SDFs by loading the loadout before gameplay starts
High memory from Texture2DArray	Many custom shapes consuming texture slots	Limit custom shapes to what's needed; prefer analytical shapes where possible

Scaling Guidelines

Shape Count

RCE comfortably handles 20–40 shapes per category on mid-range hardware. The fragment shader cost scales linearly with shape count, but the constant factor is low (each SDF evaluation is ~10–50 ALU ops). Extreme reticle designs with 60+ shapes are achievable but should be profiled on target hardware.

Custom Shapes

The Texture2DArray supports up to 16 custom shapes. Each occupies a 1024² ARGBHalf texture slot (~8 MB). The compute shader SDF generation runs once when control points change — not per frame. At runtime, custom shapes sample from the texture array, which is actually cheaper per-fragment than analytical SDF evaluation.

Mobile Considerations

RCE is designed for desktop/console but works on mobile with considerations:

Reduce shape count to 10–15 per category
Minimize glow usage (heavy ALU per fragment)
Custom shapes require compute shader support (most modern mobile GPUs)
Consider rendering at a lower resolution scale

What's Next

Architecture — Understand the assembly structure and data flow in detail
Integration Guide — Best practices for wiring RCE into your game