Architecture

MoonLight Task Architecture & Synchronization

MoonLight uses a multi-core, multi-task architecture on ESP32 to achieve smooth LED effects while maintaining responsive UI and network connectivity. This document explains the task structure, synchronization mechanisms, and why this configuration is optimal.

Main Tasks

Task	Core	Priority	Stack Size	Frequency	Purpose
WiFi/BT	0 (PRO_CPU)	23	System	Event-driven	System networking stack
lwIP TCP/IP	0 (PRO_CPU)	18	System	Event-driven	TCP/IP protocol processing
ESP32SvelteKit	1 (APP_CPU)	2	System	20ms	HTTP/WebSocket UI framework
Driver Task	1 (APP_CPU)	3	3-4KB	~60 fps	Output data to LEDs via DMA/I2S/LCD/PARLIO
Effect Task	0 (PRO_CPU)	3	3-4KB	~60 fps	Calculate LED colors and effects

Effect Task (Core 0, Priority 3)

Function: Pure computation - calculates pixel colors based on effect algorithms
Operations: Reads/writes to channels array, performs mathematical calculations
Tolerant to preemption: WiFi interruptions are acceptable as this is non-timing-critical and we have a double buffer
Why Core 0: Can coexist with WiFi; uses idle CPU cycles when WiFi is not transmitting

Driver Task (Core 1, Priority 3)

Function: Timing-critical hardware operations
Operations: Sends pixel data to LEDs via DMA, I2S (ESP32), LCD (S3), or PARLIO (P4)
Requires uninterrupted execution: DMA timing must be precise to avoid LED glitches
Why Core 1: Isolated from WiFi interference; WiFi on Core 0 cannot preempt this task

ESP32SvelteKit Task (Core 1, Priority 2)

Function: HTTP server and WebSocket handler for UI
Operations: Processes REST API calls, WebSocket messages, JSON serialization
Runs every: 20ms
Why Core 1, Priority 2: Lower priority than system Tasks

Task Interaction Flow

sequenceDiagram
    participant User
    participant WebUI
    participant SvelteKit
    participant EffectTask
    participant DriverTask
    participant LEDs

    Note over EffectTask,DriverTask: Both tasks synchronized via mutex

    User->>WebUI: Adjust effect parameter
    WebUI->>SvelteKit: WebSocket message
    SvelteKit->>SvelteKit: Update in-memory state

    Note over EffectTask: Core 0 (PRO_CPU)
    EffectTask->>EffectTask: Take mutex (10µs)
    EffectTask->>EffectTask: memcpy channelsD → channelsE
    EffectTask->>EffectTask: Release mutex
    EffectTask->>EffectTask: Compute effects (5-15ms)
    EffectTask->>EffectTask: Take mutex (10µs)
    EffectTask->>EffectTask: Swap buffer pointers
    EffectTask->>EffectTask: Release mutex

    Note over DriverTask: Core 1 (APP_CPU)
    DriverTask->>DriverTask: Take mutex (10µs)
    DriverTask->>DriverTask: Capture buffer pointer
    DriverTask->>DriverTask: Release mutex
    DriverTask->>DriverTask: Send via DMA (1-5ms)
    DriverTask->>LEDs: Pixel data

HTTPP task

no assigned core (OS decides), prio 5
processes WebUI / Websockets
calls ModuleState read() and update() functions
MoonLight Modules: runs Modules::compareRecursive and Modules::checkReOrderSwap which calls processUpdatedItem()
Page refresh: runs onLayout pass 1 for the monitor
processUpdatedItem() calls Module::onUpdate(), which is a virtual function which is overridden by Modules to implement custom functionality
NodeManager::onUpdate() propagates onUpdate() to Node Controls (together with Node::updateControl()), guarded by layerMutex

Driver Task

PhysicalLayer::loopDrivers(): if requestMap call mapLayout(). mapLayout() calls onLayout(), guarded by layerMutex
PhysicalLayer::loopDrivers(): Node::onSizeChanged() and Node::loop() guarded by layerMutex

Effect Task

PhysicalLayer::loop() calls VirtualLayer::Loop(): Node::onSizeChanged() and Node::loop(), guarded by layerMutex

Core Assignments

Why This Configuration is Optimal

graph TB
    subgraph Core0["Core 0 (PRO_CPU)"]
        WiFi[WiFi/BT<br/>Priority 23]
        lwIP[lwIP TCP/IP<br/>Priority 18]
        Effect[Effect Task<br/>Priority 3<br/>Computation Only]
    end

    subgraph Core1["Core 1 (APP_CPU)"]
        Driver[Driver Task<br/>Priority 3<br/>Timing-Critical]
        SvelteKit[ESP32SvelteKit<br/>Priority 2<br/>HTTP/WebSocket]
    end

    WiFi -.->|Preempts during bursts| Effect
    Effect -.->|Uses idle cycles| WiFi
    Driver -->|Preempts when needed| SvelteKit

    Effect <-->|Mutex-protected<br/>buffer swap| Driver

    style WiFi fill:#8f8989
    style lwIP fill:#8f8c89
    style Effect fill:#898c8f
    style Driver fill:#898f89
    style SvelteKit fill:#8f8f89

Design Principles

Timing-Critical Hardware on Core 1
- WiFi/BT run at priority 23 on Core 0
- If Driver Task were on Core 0, WiFi would constantly preempt it
- DMA/I2S/LCD/PARLIO require uninterrupted timing
- Result: Core 1 isolation prevents LED glitches
Computation-Heavy Effects on Core 0
- Effect computation is pure math (no hardware timing requirements)
- Can tolerate WiFi preemption (frame computes slightly slower)
- Uses CPU cycles when WiFi is idle
- Result: Efficient CPU utilization, true dual-core parallelism
SvelteKit on Core 1 with Lower Priority
- Driver Task (priority 3) preempts SvelteKit (priority 2)
- LED output never stalls for HTTP requests
- SvelteKit processes UI during Driver idle time
- Result: UI remains responsive without affecting LEDs
Minimal Lock Duration
- Mutex held for only ~10µs (pointer swap only)
- 99% of execution is unlocked and parallel
- Tasks interleave efficiently via FreeRTOS scheduling
- Result: "Full speed ahead" - minimal blocking

Double Buffering & Synchronization

Buffer Architecture (PSRAM Only)

graph LR
    subgraph MemoryBuffers["Memory Buffers"]
        Effects[Effects Buffer<br/>channelsE*]
        Drivers[Drivers Buffer<br/>channelsD*]
    end

    EffectTask[Effect Task<br/>Core 0] -.->|1. memcpy| Effects
    EffectTask -.->|2. Compute effects| Effects
    EffectTask -.->|3. Swap pointers<br/>MUTEX 10µs| Drivers

    DriverTask[Driver Task<br/>Core 1] -->|4. Read pixels| Drivers
    DriverTask -->|5. Send via DMA| LEDs[LEDs]

    style Effects fill:#898f89
    style Drivers fill:#898c8f

Synchronization Flow

Key Point: Effects need read-modify-write access (e.g., blur, ripple effects read neighboring pixels), so memcpy ensures they see a consistent previous frame.

Performance Impact

LEDs	Buffer Size	memcpy Time	% of 60fps Frame
1,000	3 KB	10 µs	0.06%
5,000	15 KB	50 µs	0.3%
10,000	30 KB	100 µs	0.6%
20,000	60 KB	200 µs	1.2%

Conclusion: Double buffering overhead is negligible (<1% for typical setups).

Performance Budget at 60fps

Per-Frame Time Budget (16.66ms)

gantt
    title Core 0 Timeline (Effect Task)
    dateFormat X
    axisFormat %L

    section WiFi Bursts
    WiFi burst 1    :0, 200
    WiFi burst 2    :5000, 100
    WiFi burst 3    :12000, 150

    section Effect Computation
    memcpy          :500, 100
    Compute effects :600, 14000
    Swap pointers   :14600, 10

    section Idle
    Available       :200, 300
    Available       :14610, 1390

gantt
    title Core 1 Timeline (Driver + SvelteKit)
    dateFormat X
    axisFormat %L

    section Driver Task
    Capture pointer :0, 10
    Send via DMA    :10, 3000

    section SvelteKit
    Process WebSocket :3000, 2000
    JSON serialize    :5000, 1000

    section Driver Task
    Capture pointer :6000, 10
    Send via DMA    :6010, 3000

    section Idle
    Available       :9010, 7656

Overhead Analysis

Source	Light Load	Heavy Load	Peak (Flash Write)
WiFi preemption	0.5-1ms (3-6%)	2-5ms (12-30%)	300ms (WiFi scan)
SvelteKit	0.5-2ms (on Core 1)	2-3ms (on Core 1)	5ms
Double buffer memcpy	0.1ms (0.6%)	0.1ms (0.6%)	0.1ms
Mutex locks	0.02ms (0.1%)	0.02ms (0.1%)	0.02ms
Total	1-3ms (6-18%)	4-8ms (24-48%)	Flash: user-triggered

Result:

✅ 60fps sustained during normal operation
✅ 52-60fps during heavy WiFi/UI activity
✅ No stutter during UI interaction

Configuration

Enabling Double Buffering

Double buffering is automatically enabled when PSRAM is detected:

// In PhysicalLayer::setup()
if (psramFound()) {
  lights.useDoubleBuffer = true;
  lights.channelsE = allocMB<uint8_t>(maxChannels);
  lights.channelsD = allocMB<uint8_t>(maxChannels);
} else {
  lights.useDoubleBuffer = false;
  lights.channelsE = allocMB<uint8_t>(maxChannels);
  lights.channelsD = lights.channelsE;
}

Moving ESP32SvelteKit to Core 1

Add to platformio.ini:

build_flags =
  -DESP32SVELTEKIT_RUNNING_CORE=1

Or in code before including framework:

#define ESP32SVELTEKIT_RUNNING_CORE 1
#include <ESP32SvelteKit.h>

Summary

This architecture achieves optimal performance through:

Core Separation: Computation (Core 0) vs Timing-Critical I/O (Core 1)
Priority Hierarchy: Driver > SvelteKit ensures LED timing is never compromised
Minimal Locking: 10µs mutex locks enable 99% parallel execution
Double Buffering: Eliminates tearing with <1% overhead

Result: Smooth 60fps LED effects with responsive UI and stable networking. 🚀

Idle Watchdog

For big setups, 16K LEDs typically, Task watchdog got triggered crashes occur more frequently. Mostly in the effects and drivers task but also in other tasks like WiFi occasionally. The workaround to avoid this is adding task yields in the code. This is currently done as follow:

void effectOrDriverTask(void* pvParameters) {
  // 🌙
  esp_task_wdt_add(NULL);
  setup();

  while (true) {
    esp_task_wdt_reset();
    loop();
    vTaskDelay(1);
  }

  // Cleanup (never reached in this case, but good practice)
  esp_task_wdt_delete(NULL);
}

void Node::loop() {
    addYield(10);
}

void ArtNetOutDriver::loop() {
    for (each package) {
        writePackage();
        addYield(10);
    }
}

inline void addYield(uint8_t frequency) {
  if (++yieldCallCount % frequency == 0) {
    yieldCounter++;
    vTaskDelay(1); 
  }
}

esp_task_wdt ( #include "esp_task_wdt.h" ) make sure the tasks are in the watchdog system and in the task loop it is reset and vTaskDelay(1) makes sure there is a yield each time
taskYIELD() is not good enough as it does not give back control to the idle task so we need vTaskDelay(1), taskYIELD() only yields to tasks of equal or higher priority
increasing the watchdog timer from 5s to 10s might trigger less watchdog crashes but is not eliminating it so this is not added yet. However, for extreme setups (up to 100K LEDs), even with yields, processing time might legitimately exceed 5s. So might be added later.
Node::loop(): each active node will call addYield(10)
ArtNetOutDriver::loop(): as a massive amount of packages are blasted (for 16K LEDs 97 universes / packets), addYield(10) is called after each packet
addYield(10) means: send a vTaskDelay(1) every 10 times.
Occasional flood of ESP_LOG error messages might also trigger the watchdog so where it happened a vTaskDelay(1) is added e.g. in EventSocket::emitEvent(), failed to send event