Architecture
MoonLight Task Architecture & Synchronization
MoonLight uses a multi-core, multi-task architecture on ESP32 to achieve smooth LED effects while maintaining responsive UI and network connectivity. This document explains the task structure, synchronization mechanisms, and why this configuration is optimal.
Main Tasks
| Task | Core | Priority | Stack Size | Frequency | Purpose |
|---|---|---|---|---|---|
| WiFi/BT | 0 (PRO_CPU) | 23 | System | Event-driven | System networking stack |
| lwIP TCP/IP | 0 (PRO_CPU) | 18 | System | Event-driven | TCP/IP protocol processing |
| ESP32SvelteKit | 1 (APP_CPU) | 2 | System | 20ms | HTTP/WebSocket UI framework |
| Driver Task | 1 (APP_CPU) | 3 | 3-4KB | ~60 fps | Output data to LEDs via DMA/I2S/LCD/PARLIO |
| Effect Task | 0 (PRO_CPU) | 3 | 3-4KB | ~60 fps | Calculate LED colors and effects |
Effect Task (Core 0, Priority 3)
- Function: Pure computation - calculates pixel colors based on effect algorithms
- Operations: Reads/writes to
channelsarray, performs mathematical calculations - Tolerant to preemption: WiFi interruptions are acceptable as this is non-timing-critical and we have a double buffer
- Why Core 0: Can coexist with WiFi; uses idle CPU cycles when WiFi is not transmitting
Driver Task (Core 1, Priority 3)
- Function: Timing-critical hardware operations
- Operations: Sends pixel data to LEDs via DMA, I2S (ESP32), LCD (S3), or PARLIO (P4)
- Requires uninterrupted execution: DMA timing must be precise to avoid LED glitches
- Why Core 1: Isolated from WiFi interference; WiFi on Core 0 cannot preempt this task
ESP32SvelteKit Task (Core 1, Priority 2)
- Function: HTTP server and WebSocket handler for UI
- Operations: Processes REST API calls, WebSocket messages, JSON serialization
- Runs every: 20ms
- Why Core 1, Priority 2: Lower priority than system Tasks
Task Interaction Flow
sequenceDiagram
participant User
participant WebUI
participant SvelteKit
participant EffectTask
participant DriverTask
participant LEDs
Note over EffectTask,DriverTask: Both tasks synchronized via mutex
User->>WebUI: Adjust effect parameter
WebUI->>SvelteKit: WebSocket message
SvelteKit->>SvelteKit: Update in-memory state
Note over EffectTask: Core 0 (PRO_CPU)
EffectTask->>EffectTask: Take mutex (10µs)
EffectTask->>EffectTask: memcpy channelsD → channelsE
EffectTask->>EffectTask: Release mutex
EffectTask->>EffectTask: Compute effects (5-15ms)
EffectTask->>EffectTask: Take mutex (10µs)
EffectTask->>EffectTask: Swap buffer pointers
EffectTask->>EffectTask: Release mutex
Note over DriverTask: Core 1 (APP_CPU)
DriverTask->>DriverTask: Take mutex (10µs)
DriverTask->>DriverTask: Capture buffer pointer
DriverTask->>DriverTask: Release mutex
DriverTask->>DriverTask: Send via DMA (1-5ms)
DriverTask->>LEDs: Pixel data
HTTPP task
- no assigned core (OS decides), prio 5
- processes WebUI / Websockets
- calls ModuleState read() and update() functions
- MoonLight Modules: runs Modules::compareRecursive and Modules::checkReOrderSwap which calls processUpdatedItem()
- Page refresh: runs onLayout pass 1 for the monitor
- processUpdatedItem() calls Module::onUpdate(), which is a virtual function which is overridden by Modules to implement custom functionality
- NodeManager::onUpdate() propagates onUpdate() to Node Controls (together with Node::updateControl()), guarded by layerMutex
Driver Task
- PhysicalLayer::loopDrivers(): if requestMap call mapLayout(). mapLayout() calls onLayout(), guarded by layerMutex
- PhysicalLayer::loopDrivers(): Node::onSizeChanged() and Node::loop() guarded by layerMutex
Effect Task
- PhysicalLayer::loop() calls VirtualLayer::Loop(): Node::onSizeChanged() and Node::loop(), guarded by layerMutex
Core Assignments
Why This Configuration is Optimal
graph TB
subgraph Core0["Core 0 (PRO_CPU)"]
WiFi[WiFi/BT<br/>Priority 23]
lwIP[lwIP TCP/IP<br/>Priority 18]
Effect[Effect Task<br/>Priority 3<br/>Computation Only]
end
subgraph Core1["Core 1 (APP_CPU)"]
Driver[Driver Task<br/>Priority 3<br/>Timing-Critical]
SvelteKit[ESP32SvelteKit<br/>Priority 2<br/>HTTP/WebSocket]
end
WiFi -.->|Preempts during bursts| Effect
Effect -.->|Uses idle cycles| WiFi
Driver -->|Preempts when needed| SvelteKit
Effect <-->|Mutex-protected<br/>buffer swap| Driver
style WiFi fill:#8f8989
style lwIP fill:#8f8c89
style Effect fill:#898c8f
style Driver fill:#898f89
style SvelteKit fill:#8f8f89
Design Principles
-
Timing-Critical Hardware on Core 1
- WiFi/BT run at priority 23 on Core 0
- If Driver Task were on Core 0, WiFi would constantly preempt it
- DMA/I2S/LCD/PARLIO require uninterrupted timing
- Result: Core 1 isolation prevents LED glitches
-
Computation-Heavy Effects on Core 0
- Effect computation is pure math (no hardware timing requirements)
- Can tolerate WiFi preemption (frame computes slightly slower)
- Uses CPU cycles when WiFi is idle
- Result: Efficient CPU utilization, true dual-core parallelism
-
SvelteKit on Core 1 with Lower Priority
- Driver Task (priority 3) preempts SvelteKit (priority 2)
- LED output never stalls for HTTP requests
- SvelteKit processes UI during Driver idle time
- Result: UI remains responsive without affecting LEDs
-
Minimal Lock Duration
- Mutex held for only ~10µs (pointer swap only)
- 99% of execution is unlocked and parallel
- Tasks interleave efficiently via FreeRTOS scheduling
- Result: "Full speed ahead" - minimal blocking
Double Buffering & Synchronization
Buffer Architecture (PSRAM Only)
graph LR
subgraph MemoryBuffers["Memory Buffers"]
Effects[Effects Buffer<br/>channelsE*]
Drivers[Drivers Buffer<br/>channelsD*]
end
EffectTask[Effect Task<br/>Core 0] -.->|1. memcpy| Effects
EffectTask -.->|2. Compute effects| Effects
EffectTask -.->|3. Swap pointers<br/>MUTEX 10µs| Drivers
DriverTask[Driver Task<br/>Core 1] -->|4. Read pixels| Drivers
DriverTask -->|5. Send via DMA| LEDs[LEDs]
style Effects fill:#898f89
style Drivers fill:#898c8f
Synchronization Flow
Key Point: Effects need read-modify-write access (e.g., blur, ripple effects read neighboring pixels), so memcpy ensures they see a consistent previous frame.
Performance Impact
| LEDs | Buffer Size | memcpy Time | % of 60fps Frame |
|---|---|---|---|
| 1,000 | 3 KB | 10 µs | 0.06% |
| 5,000 | 15 KB | 50 µs | 0.3% |
| 10,000 | 30 KB | 100 µs | 0.6% |
| 20,000 | 60 KB | 200 µs | 1.2% |
Conclusion: Double buffering overhead is negligible (<1% for typical setups).
Performance Budget at 60fps
Per-Frame Time Budget (16.66ms)
gantt
title Core 0 Timeline (Effect Task)
dateFormat X
axisFormat %L
section WiFi Bursts
WiFi burst 1 :0, 200
WiFi burst 2 :5000, 100
WiFi burst 3 :12000, 150
section Effect Computation
memcpy :500, 100
Compute effects :600, 14000
Swap pointers :14600, 10
section Idle
Available :200, 300
Available :14610, 1390
gantt
title Core 1 Timeline (Driver + SvelteKit)
dateFormat X
axisFormat %L
section Driver Task
Capture pointer :0, 10
Send via DMA :10, 3000
section SvelteKit
Process WebSocket :3000, 2000
JSON serialize :5000, 1000
section Driver Task
Capture pointer :6000, 10
Send via DMA :6010, 3000
section Idle
Available :9010, 7656
Overhead Analysis
| Source | Light Load | Heavy Load | Peak (Flash Write) |
|---|---|---|---|
| WiFi preemption | 0.5-1ms (3-6%) | 2-5ms (12-30%) | 300ms (WiFi scan) |
| SvelteKit | 0.5-2ms (on Core 1) | 2-3ms (on Core 1) | 5ms |
| Double buffer memcpy | 0.1ms (0.6%) | 0.1ms (0.6%) | 0.1ms |
| Mutex locks | 0.02ms (0.1%) | 0.02ms (0.1%) | 0.02ms |
| Total | 1-3ms (6-18%) | 4-8ms (24-48%) | Flash: user-triggered |
Result:
- ✅ 60fps sustained during normal operation
- ✅ 52-60fps during heavy WiFi/UI activity
- ✅ No stutter during UI interaction
Configuration
Enabling Double Buffering
Double buffering is automatically enabled when PSRAM is detected:
// In PhysicalLayer::setup()
if (psramFound()) {
lights.useDoubleBuffer = true;
lights.channelsE = allocMB<uint8_t>(maxChannels);
lights.channelsD = allocMB<uint8_t>(maxChannels);
} else {
lights.useDoubleBuffer = false;
lights.channelsE = allocMB<uint8_t>(maxChannels);
lights.channelsD = lights.channelsE;
}
Moving ESP32SvelteKit to Core 1
Add to platformio.ini:
Or in code before including framework:
Summary
This architecture achieves optimal performance through:
- Core Separation: Computation (Core 0) vs Timing-Critical I/O (Core 1)
- Priority Hierarchy: Driver > SvelteKit ensures LED timing is never compromised
- Minimal Locking: 10µs mutex locks enable 99% parallel execution
- Double Buffering: Eliminates tearing with <1% overhead
Result: Smooth 60fps LED effects with responsive UI and stable networking. 🚀
Idle Watchdog
For big setups, 16K LEDs typically, Task watchdog got triggered crashes occur more frequently. Mostly in the effects and drivers task but also in other tasks like WiFi occasionally. The workaround to avoid this is adding task yields in the code. This is currently done as follow:
void effectOrDriverTask(void* pvParameters) {
// 🌙
esp_task_wdt_add(NULL);
setup();
while (true) {
esp_task_wdt_reset();
loop();
vTaskDelay(1);
}
// Cleanup (never reached in this case, but good practice)
esp_task_wdt_delete(NULL);
}
void Node::loop() {
addYield(10);
}
void ArtNetOutDriver::loop() {
for (each package) {
writePackage();
addYield(10);
}
}
inline void addYield(uint8_t frequency) {
if (++yieldCallCount % frequency == 0) {
yieldCounter++;
vTaskDelay(1);
}
}
- esp_task_wdt ( #include "esp_task_wdt.h" ) make sure the tasks are in the watchdog system and in the task loop it is reset and vTaskDelay(1) makes sure there is a yield each time
- taskYIELD() is not good enough as it does not give back control to the idle task so we need vTaskDelay(1), taskYIELD() only yields to tasks of equal or higher priority
- increasing the watchdog timer from 5s to 10s might trigger less watchdog crashes but is not eliminating it so this is not added yet. However, for extreme setups (up to 100K LEDs), even with yields, processing time might legitimately exceed 5s. So might be added later.
- Node::loop(): each active node will call addYield(10)
- ArtNetOutDriver::loop(): as a massive amount of packages are blasted (for 16K LEDs 97 universes / packets), addYield(10) is called after each packet
- addYield(10) means: send a vTaskDelay(1) every 10 times.
- Occasional flood of ESP_LOG error messages might also trigger the watchdog so where it happened a vTaskDelay(1) is added e.g. in EventSocket::emitEvent(), failed to send event