EED Gym uses a shaped reward combining task progress, safety violations, blame, trust calibration, and communicative behavior:
\[
\begin{aligned}
R_t \;=\;& w_{task}\,\Delta prog_t
\;-\; w_{safety}\, \mathbf{1}[\text{violation}_t]
\;-\; w_{blame}\, b_t
\;-\; w_{trust}\, H(l,h;\,trust_t) \\
&\;-\; w_{refuse}\,\mathbf{1}[\text{refuse}_t]
\;+\; w_{explain}\,\mathbf{1}[\text{explain}_t]
\;-\; w_{clarify}\,\mathbf{1}[\text{clarify}_t]
\;+\; w_{alt}\,\mathbf{1}[\text{alt}_t] \\
&\;+\; w_{style}\, s_t
\;+\; w_{just}\,\mathbf{1}[\text{refuse}_t \wedge \text{risky}_t].
\end{aligned}
\]
Trust calibration uses a hinge penalty:
\[
H(l,h;\,trust_t) = \max\{0,\, l-trust_t,\, trust_t-h\}, \quad l \le h
\]
centered around a balanced trust level \(t^\star\) of 0.7.