Work Units, Part 2: My Personal Implementation of the Spec

The Work Unit specification defines the model in general terms.1 This post defines my implementation of that model as it currently exists in openclaw-morning-run.

In the generalized specification, a Work Unit is a structural accounting unit. In the implementation, that remains true, but the unit is extended into a deterministic operating model. Work Units do not merely describe the cost of a task. They participate in baseline calculation, debt accumulation, tier resolution, and recovery protection. The result is not just a ledger of effort. It is a system for governing deployment.

Overview

The implementation currently lives across several parts of the openclaw-morning-run repository:

openclaw_morning_pipeline.py
soa_engine/soa_engine.py
rva_engine/rva.py
sde_engine/sde_engine.py
src/blackbird_brief/

These layers perform distinct functions:

The pipeline assigns task-level Work Units to deadlines and personal events.
RVA turns the day into a projected structural load score.
SDE compares that score against baseline and fatigue debt.
HSI turns the result into a policy tier and operating mode.
Blackbird converts the whole thing into daily guidance about pacing, recovery, and whether the day is expandable at all.

The phrase "Work Units" appears singular, but the implementation is not. It is a layered model. Task-level Work Units, day-level projected load, fragmentation penalties, debt-adjusted baseline, and policy tiering all belong to the same accounting system. Each layer serves a different purpose, but all of them describe the same underlying problem: structural demand over time.

Layer 1: Task-Level Work Units

The first layer is the one most directly tied to the original specification.

The pipeline emits a soa_snapshot payload with:

work_unit_definition.source_doc
basis_minutes = 30
model = structural_impact_not_clock_time

Each task-level item includes:

time
task
agent_assignment
priority
work_units
wu_basis_minutes
wu_reason
source

This is the most literal implementation of the spec. A task receives a Work Unit value, a reason for that value, and a source. The system can therefore explain not only what a task costs, but why.

Deadline scoring

Canvas deadlines are assigned Work Units using:

base_units = effort_minutes / 30

The base is then modified by:

a type multiplier
an urgency multiplier

Current type multipliers:

exam = 1.50
quiz = 1.25
project = 1.35
lab = 1.20
discussion = 1.00
reading = 0.80
assignment = 1.10

Current urgency multipliers:

critical = 1.40
high = 1.25
medium = 1.00
low = 0.85

The final value is rounded and clamped to a minimum of 1.

This preserves the core rule of the specification: time is not cost. Time only establishes the starting point. Structural characteristics determine the final value.

Personal event scoring

Personal calendar events are scored differently. Their duration is converted into:

base_units = duration_minutes / 30

The event text is then scanned for structural cues.

Recovery-like events such as:

decompress
recovery
rest
hobby
special interest

return negative Work Units. This is how the implementation encodes the fact that some activities restore capacity rather than consume it. Recovery is not treated as the absence of load. It is treated as a measurable counter-force to load.

Other event classes apply multipliers:

exam, midterm, final, presentation, interview -> 1.50
lecture, class, meeting, office hour -> 1.30
drive, commute, travel -> 1.20
social, networking, reception, banquet -> 1.25
default -> 1.00

Again, time is only the base. Structural characteristics determine the final value.

Layer 2: Day-Level Projected Work Units

The implementation does not stop at per-task accounting. The system also computes a day-level projected load in rva_engine/rva.py.

This is the point at which the model becomes more operational than the generalized post. The day-level score is derived from risk profiles, not merely from the task-level work_units field. The goal is no longer to score isolated tasks. The goal is to quantify the structure of the entire day.

Risk registry caps

Each calendar event is classified into an operational profile, and each profile has an rva_cap in the risk registry.

That cap acts as the event's base structural load. Duration then modifies it:

<= 60 minutes -> 1.00
<= 120 minutes -> 1.15
> 120 minutes -> 1.25

So the base event score is effectively:

event_score = rva_cap * duration_multiplier

Two events of equal duration can therefore diverge sharply if they belong to different profiles. Duration alone does not govern the cost. Structural class does.

Day-level bonuses

After summing event scores, RVA applies deterministic bonuses:

+4 if there are at least 3 high-cognitive events
+4 if there are at least 3 low-recovery events
+10 if there are at least 4 lectures
fragmentation bonus based on count of meaningful blocks

Current fragmentation bonus by meaningful block count:

<= 4 blocks -> 0
5 blocks -> 2
6 blocks -> 4
>= 7 blocks -> 6

Sustained exposure bonus

The system also applies a sustained exposure rule for long, constrained events.

An event qualifies only if:

trusted duration is at least 180 minutes
rva_cap >= 20
autonomy is low or recovery is low

The bonus is:

quadratic drift based on hours past 3 hours
plus +8 if both autonomy and recovery are low

Concretely:

drift = min(40, 6 * excess_hours^2)

This is one of the major places where the implementation departs from simple additive accounting. Long constrained exposure is treated as qualitatively different from ordinary duration. It is not merely "more time." It is structural drift under prolonged restriction.

Overlay floor and recovery credit

If the day is part of a continuous operations overlay, the score is floored to at least 75, with an extra +5 on overlay day 3 or later.

If the schedule contains a recognized recovery block, RVA applies a recovery credit:

recovery_credit = min(10, 10% of post-overlay score)

That credit is subtracted from the day score.

RVA emits:

capped display score
unbounded day score
per-bin Work Unit curve across the day
dominant vectors
recovery credit telemetry

The unbounded score is the important number for downstream planning. Later systems use it as the practical projected_wu. At this layer, Work Units are no longer just attached to tasks. They are describing the operational posture of the day as a whole.

Layer 3: Baseline, Fragmentation, and Debt

The generalized Work Unit post describes baseline as a rolling average. The current implementation goes further than that.

Effective baseline

In sde_engine.py, the effective baseline is:

baseline_eff = max(35, baseline_ref - floor(debt_prev / 80))

Baseline is therefore not just historical capacity. It is historical capacity degraded by unresolved fatigue debt.

The default baseline_ref is currently 50, and the floor is 35.

Fragmentation points

SDE also computes fragmentation points separately from RVA's meaningful-block bonus.

Points are added for:

domain changes between adjacent events
profile changes involving high switching or masking demand
gaps shorter than 15 minutes
gaps from 15 to 30 minutes
repeated re-entry into low-autonomy, low-recovery profiles

Current gap scoring:

gap < 15m -> +2
gap 15m to <30m -> +1

Repeated re-entry into the same low-autonomy, low-recovery profile adds +2 each time after the first occurrence.

These fragmentation points are converted into an effective bonus:

0-2 points -> 0
3-5 points -> 2
6-9 points -> 4
10-14 points -> 6
15-20 points -> 8
21+ points -> 10

That value becomes frag_bonus_eff. Fragmentation is not simply observed. It is priced into the day.

Debt growth

The day's effective raw load is:

day_raw_eff = projected_day_units + frag_bonus_eff

The system then compares that to baseline_eff.

If load exceeds baseline, debt increases according to:

debt_increase = excess + (excess^2 / 100)

using floor rounding.

Debt therefore grows faster than linearly as the day increasingly exceeds sustainable range. Overspending capacity is not modeled as a flat penalty. It compounds.

Debt paydown

Debt can also be reduced by sleep context.

Current base paydown values:

home = 160
hotel = 75
hotel_first_night = 60

These are modified by an efficiency factor eta(debt_prev):

debt < 120 -> 1.00
120-249 -> 0.85
250-449 -> 0.70
450+ -> 0.55

Recovery weakens as debt accumulates. This is intentional. A system already in deep deficit does not recover with the same efficiency as a stable one.

Baseline curve model

The repo also maintains a rolling baseline curve by day type:

WEEKDAY_CLASSDAY
WEEKDAY_NOCLASS
WEEKEND

Rather than using only a rolling average, the curve model stores:

rolling cumulative Work Unit curves
median daily totals
median absolute deviation (MAD)

The current window is 42 days, with a minimum usable history target of 10 days.

This allows the system to compare not just how much load a day has, but when the load arrives relative to normal patterns. A day can therefore deviate from baseline by total intensity, by shape, or by both.

Layer 4: HSI Tiering and Guardrails

After RVA, the pipeline resolves a daily HSI tier.

Current score-to-tier mapping:

>= 90 -> Tier 0
>= 75 -> Tier 1
>= 50 -> Tier 2
>= 25 -> Tier 3
< 25 -> Tier 4

Tier 0 is gated. A raw high score alone is not enough.

Tier 0 requires all of the following:

autonomy loss
sensory stacking or emotional labor
low recovery signal and no recovery credit

If that full gate does not pass, Tier 0 is downgraded to Tier 1.

The tier then maps to operating mode and allowed capacity:

Tier 4 -> normal, capacity 0.75
Tier 3 -> normal, capacity 0.60
Tier 2 -> conserve, capacity 0.45
Tier 1 -> conserve, capacity 0.30
Tier 0 -> recover, capacity 0.15

This is the point at which Work Units stop being purely descriptive and become prescriptive. The system is no longer only measuring the day. It is defining the safe operating envelope for the day.

Layer 5: Blackbird Day Policy

The later blackbird_brief subsystem imports the OpenClaw outputs and uses them to generate daily briefings.

This is the layer where the Work Unit system becomes legible as an actual morning planning instrument rather than a backend score.

Imported fields

Blackbird carries forward:

HSI tier
projected WU
WU segments derived from the snapshot task list
operational vectors
schedule shape
weather and movement burden

If RVA's projected score is unavailable, Blackbird can fall back to summing snapshot Work Units.

Policy thresholds

Blackbird uses its own practical thresholds:

>= 80 WU -> overload territory
>= 45 WU -> high strain
>= 18 WU -> moderate operating pressure

It also maps HSI tier into policy posture:

Tier 4 -> moderate
Tier 3 -> constrained
Tier 2 -> high-strain
Tier 1 and 0 -> overload

This posture is then modified upward if the schedule is dense, switching burden is high, due pressure is acute, or environmental and physical risk are elevated.

Recovery windows

Blackbird also defines what counts as usable recovery, not merely open time.

A recovery window must generally be:

at least 30 minutes
inside the operational day window
not defeated by hostile movement or weather conditions
not buried inside a saturated schedule

This distinction is important. The implementation does not treat empty calendar space as automatically restorative. Open time only counts if it is structurally recoverable.

How This Differs from the General Spec

The generalized post presents Work Units as a structural accounting model. That remains true. The implementation, however, adds several additional commitments:

1. Work Units are multi-layered

The code does not use a single number. It uses:

per-task Work Units
day-level projected load
fragmentation bonuses
debt-adjusted baseline
policy tiers

In the implementation, "Work Units" is therefore a stack, not a scalar.

2. Recovery is both negative load and policy state

In the generalized spec, recovery mostly offsets load conceptually. In the implementation, recovery appears in several ways:

negative task-level Work Units
recovery credit in RVA
debt paydown in SDE
usable recovery windows in Blackbird
HSI and Blackbird policy downgrades when recovery is not structurally available

This is more rigorous, but also more opinionated. Recovery is not merely something that feels good. It is a condition that must be structurally available if it is to count.

3. Baseline is no longer just a rolling average

The conceptual post uses a weekly rolling average as the clearest explanation. The implementation now uses:

baseline reference
debt-adjusted effective baseline
rolling median curve by day type
MAD for deviation tracking

The system now models both capacity and distribution of load across the day.

4. The system is designed to constrain expansion

This is the largest philosophical departure from a simple tracking tool.

My implementation is not trying to merely describe how much work a day contains. It is trying to determine:

whether the day is expandable
whether open time is actually recoverable
how much additional tasking is safe
what operating mode the day belongs to

That is why tiering, guardrails, debt, and recovery windows exist. The purpose of the implementation is not to produce an elegant score. The purpose is to prevent avoidable failure.

Summary

My personal implementation of Work Units is a deterministic operational model built on top of the original structural accounting concept.

At the lowest level, a Work Unit is still thirty minutes of ideal work adjusted for structural reality. But at the system level, Work Units also feed:

day-load scoring
fragmentation penalties
baseline drift detection
fatigue debt tracking
daily operating modes
recovery protection logic

The implementation is therefore not just "counting how hard a task is." It is a way of translating lived structural demand into machine-readable policy.

That is the real goal of the system: not productivity theater, but protective planning. Constraint does not disappear when it is quantified. It becomes governable.

Footnotes

This post is the implementation layer. If you want the general conceptual specification first, read Work Units.

Shelf

Return to Writing