Architecture
MoonLight Task Architecture & Synchronization
MoonLight uses a multi-core, multi-task architecture on ESP32 to achieve smooth LED effects while maintaining responsive UI and network connectivity. This document explains the task structure, synchronization mechanisms, and why this configuration is optimal.
Main Tasks
| Task | Core | Priority | Stack Size | Frequency | Purpose |
|---|---|---|---|---|---|
| WiFi/BT | 0 (PRO_CPU) | 23 | System | Event-driven | System networking stack |
| lwIP TCP/IP | 0 (PRO_CPU) | 18 | System | Event-driven | TCP/IP protocol processing |
| Effect Task | 0 (PRO_CPU) | 10 | 3-4KB | ~60 fps | Calculate LED colors and effects |
| ESP32SvelteKit | 1 (APP_CPU) | 2 | System | 10ms | HTTP/WebSocket UI framework |
| Driver Task | 1 (APP_CPU) | 3 | 3-4KB | ~60 fps | Output data to LEDs via DMA/I2S/LCD/PARLIO |
Effect Task (Core 0, Priority 10)
- Function: Pure computation - calculates pixel colors based on effect algorithms
- Operations: Reads/writes to
channelsarray, performs mathematical calculations - Tolerant to preemption: WiFi interruptions are acceptable as this is non-timing-critical
- Why Core 0: Can coexist with WiFi; uses idle CPU cycles when WiFi is not transmitting
Driver Task (Core 1, Priority 3)
- Function: Timing-critical hardware operations
- Operations: Sends pixel data to LEDs via DMA, I2S (ESP32), LCD (S3), or PARLIO (P4)
- Requires uninterrupted execution: DMA timing must be precise to avoid LED glitches
- Why Core 1: Isolated from WiFi interference; WiFi on Core 0 cannot preempt this task
ESP32SvelteKit Task (Core 1, Priority 2)
- Function: HTTP server and WebSocket handler for UI
- Operations: Processes REST API calls, WebSocket messages, JSON serialization
- Runs every: 10ms
- Why Core 1, Priority 2: Lower priority than Driver Task, so LED output always takes precedence
Task Interaction Flow
sequenceDiagram
participant User
participant WebUI
participant SvelteKit
participant EffectTask
participant DriverTask
participant LEDs
Note over EffectTask,DriverTask: Both tasks synchronized via mutex
User->>WebUI: Adjust effect parameter
WebUI->>SvelteKit: WebSocket message
SvelteKit->>SvelteKit: Update in-memory state
Note over EffectTask: Core 0 (PRO_CPU)
EffectTask->>EffectTask: Take mutex (10µs)
EffectTask->>EffectTask: memcpy channelsD → channelsE
EffectTask->>EffectTask: Release mutex
EffectTask->>EffectTask: Compute effects (5-15ms)
EffectTask->>EffectTask: Take mutex (10µs)
EffectTask->>EffectTask: Swap buffer pointers
EffectTask->>EffectTask: Release mutex
Note over DriverTask: Core 1 (APP_CPU)
DriverTask->>DriverTask: Take mutex (10µs)
DriverTask->>DriverTask: Capture buffer pointer
DriverTask->>DriverTask: Release mutex
DriverTask->>DriverTask: Send via DMA (1-5ms)
DriverTask->>LEDs: Pixel data
Core Assignments
Why This Configuration is Optimal
graph TB
subgraph Core0["Core 0 (PRO_CPU)"]
WiFi[WiFi/BT<br/>Priority 23]
lwIP[lwIP TCP/IP<br/>Priority 18]
Effect[Effect Task<br/>Priority 10<br/>Computation Only]
end
subgraph Core1["Core 1 (APP_CPU)"]
Driver[Driver Task<br/>Priority 3<br/>Timing-Critical]
SvelteKit[ESP32SvelteKit<br/>Priority 2<br/>HTTP/WebSocket]
end
WiFi -.->|Preempts during bursts| Effect
Effect -.->|Uses idle cycles| WiFi
Driver -->|Preempts when needed| SvelteKit
Effect <-->|Mutex-protected<br/>buffer swap| Driver
style WiFi fill:#8f8989
style lwIP fill:#8f8c89
style Effect fill:#898c8f
style Driver fill:#898f89
style SvelteKit fill:#8f8f89
Design Principles
-
Timing-Critical Hardware on Core 1
- WiFi/BT run at priority 23 on Core 0
- If Driver Task were on Core 0, WiFi would constantly preempt it
- DMA/I2S/LCD/PARLIO require uninterrupted timing
- Result: Core 1 isolation prevents LED glitches
-
Computation-Heavy Effects on Core 0
- Effect computation is pure math (no hardware timing requirements)
- Can tolerate WiFi preemption (frame computes slightly slower)
- Uses CPU cycles when WiFi is idle
- Result: Efficient CPU utilization, true dual-core parallelism
-
SvelteKit on Core 1 with Lower Priority
- Driver Task (priority 3) preempts SvelteKit (priority 2)
- LED output never stalls for HTTP requests
- SvelteKit processes UI during Driver idle time
- Result: UI remains responsive without affecting LEDs
-
Minimal Lock Duration
- Mutex held for only ~10µs (pointer swap only)
- 99% of execution is unlocked and parallel
- Tasks interleave efficiently via FreeRTOS scheduling
- Result: "Full speed ahead" - minimal blocking
Double Buffering & Synchronization
Buffer Architecture (PSRAM Only)
graph LR
subgraph MemoryBuffers["Memory Buffers"]
Effects[Effects Buffer<br/>channelsE*]
Drivers[Drivers Buffer<br/>channelsD*]
end
EffectTask[Effect Task<br/>Core 0] -.->|1. memcpy| Effects
EffectTask -.->|2. Compute effects| Effects
EffectTask -.->|3. Swap pointers<br/>MUTEX 10µs| Drivers
DriverTask[Driver Task<br/>Core 1] -->|4. Read pixels| Drivers
DriverTask -->|5. Send via DMA| LEDs[LEDs]
style Effects fill:#898f89
style Drivers fill:#898c8f
Synchronization Flow
// Simplified synchronization logic
void effectTask(void* param) {
while (true) {
xSemaphoreTake(swapMutex, portMAX_DELAY);
if (layerP.lights.header.isPositions == 0 && !newFrameReady) { // within mutex as driver task can change this
if (layerP.lights.useDoubleBuffer) {
xSemaphoreGive(swapMutex);
memcpy(layerP.lights.channelsE, layerP.lights.channelsD, layerP.lights.header.nrOfChannels); // Copy previous frame (channelsD) to working buffer (channelsE)
}
layerP.loop();
if (layerP.lights.useDoubleBuffer) { // Atomic swap channels
xSemaphoreTake(swapMutex, portMAX_DELAY);
uint8_t* temp = layerP.lights.channelsD;
layerP.lights.channelsD = layerP.lights.channelsE;
layerP.lights.channelsE = temp;
}
newFrameReady = true;
}
xSemaphoreGive(swapMutex);
vTaskDelay(1);
}
}
void driverTask(void* param) {
while (true) {
bool mutexGiven = false;
xSemaphoreTake(swapMutex, portMAX_DELAY);
if (layerP.lights.header.isPositions == 0) {
if (newFrameReady) {
newFrameReady = false;
if (layerP.lights.useDoubleBuffer) {
xSemaphoreGive(swapMutex); // Double buffer: release lock, then send
mutexGiven = true;
}
esp32sveltekit.lps++;
layerP.loopDrivers();
}
}
if (!mutexGiven) xSemaphoreGive(swapMutex); // not double buffer or if conditions not met
vTaskDelay(1);
}
}
Key Point: Effects need read-modify-write access (e.g., blur, ripple effects read neighboring pixels), so memcpy ensures they see a consistent previous frame.
Performance Impact
| LEDs | Buffer Size | memcpy Time | % of 60fps Frame |
|---|---|---|---|
| 1,000 | 3 KB | 10 µs | 0.06% |
| 5,000 | 15 KB | 50 µs | 0.3% |
| 10,000 | 30 KB | 100 µs | 0.6% |
| 20,000 | 60 KB | 200 µs | 1.2% |
Conclusion: Double buffering overhead is negligible (<1% for typical setups).
Performance Budget at 60fps
Per-Frame Time Budget (16.66ms)
gantt
title Core 0 Timeline (Effect Task)
dateFormat X
axisFormat %L
section WiFi Bursts
WiFi burst 1 :0, 200
WiFi burst 2 :5000, 100
WiFi burst 3 :12000, 150
section Effect Computation
memcpy :500, 100
Compute effects :600, 14000
Swap pointers :14600, 10
section Idle
Available :200, 300
Available :14610, 1390
gantt
title Core 1 Timeline (Driver + SvelteKit)
dateFormat X
axisFormat %L
section Driver Task
Capture pointer :0, 10
Send via DMA :10, 3000
section SvelteKit
Process WebSocket :3000, 2000
JSON serialize :5000, 1000
section Driver Task
Capture pointer :6000, 10
Send via DMA :6010, 3000
section Idle
Available :9010, 7656
Overhead Analysis
| Source | Light Load | Heavy Load | Peak (Flash Write) |
|---|---|---|---|
| WiFi preemption | 0.5-1ms (3-6%) | 2-5ms (12-30%) | 300ms (WiFi scan) |
| SvelteKit | 0.5-2ms (on Core 1) | 2-3ms (on Core 1) | 5ms |
| Double buffer memcpy | 0.1ms (0.6%) | 0.1ms (0.6%) | 0.1ms |
| Mutex locks | 0.02ms (0.1%) | 0.02ms (0.1%) | 0.02ms |
| Total | 1-3ms (6-18%) | 4-8ms (24-48%) | Flash: user-triggered |
Result:
- ✅ 60fps sustained during normal operation
- ✅ 52-60fps during heavy WiFi/UI activity
- ✅ No stutter during UI interaction
Configuration
Enabling Double Buffering
Double buffering is automatically enabled when PSRAM is detected:
// In PhysicalLayer::setup()
if (psramFound()) {
lights.useDoubleBuffer = true;
lights.channelsE = allocMB<uint8_t>(maxChannels);
lights.channelsD = allocMB<uint8_t>(maxChannels);
} else {
lights.useDoubleBuffer = false;
lights.channelsE = allocMB<uint8_t>(maxChannels);
}
Moving ESP32SvelteKit to Core 1
Add to platformio.ini:
Or in code before including framework:
Task Creation
// Effect Task on Core 0
xTaskCreateUniversal(effectTask,
"AppEffectTask",
psramFound() ? 4 * 1024 : 3 * 1024,
NULL,
10, // Priority
&effectTaskHandle,
0 // Core 0 (PRO_CPU)
);
// Driver Task on Core 1
xTaskCreateUniversal(driverTask,
"AppDriverTask",
psramFound() ? 4 * 1024 : 3 * 1024,
NULL,
3, // Priority
&driverTaskHandle,
1 // Core 1 (APP_CPU)
);
Summary
This architecture achieves optimal performance through:
- Core Separation: Computation (Core 0) vs Timing-Critical I/O (Core 1)
- Priority Hierarchy: Driver > SvelteKit ensures LED timing is never compromised
- Minimal Locking: 10µs mutex locks enable 99% parallel execution
- Double Buffering: Eliminates tearing with <1% overhead
Result: Smooth 60fps LED effects with responsive UI and stable networking. 🚀