scml.oneshot.rl.reward
Module Contents
Classes
Represents a reward function. |
|
The default reward function of SCML |
- class scml.oneshot.rl.reward.RewardFunction[source]
Bases:
Protocol
Represents a reward function.
- Remarks:
before_action
is called before the action is executed for initialization and should return info to be passed to the call__call__
is called with the awi (to get the state), action and info and should return the reward
- before_action(awi: scml.oneshot.awi.OneShotAWI) Any [source]
Called before executing the action from the RL agent to save any required information for calculating the reward in its return
- Remarks:
The returned value will be passed as
info
to__call__()
when it is time to calculate the reward.
- __call__(awi: scml.oneshot.awi.OneShotAWI, action: dict[str, negmas.SAOResponse], info: Any) float [source]
Called to calculate the reward to be given to the agent at the end of a step.
- Parameters:
awi –
OneShotAWI
to access the agent’s stateaction – The action (decoded) as a mapping from partner ID to responses to their last offer.
info – Information generated from
before_action()
. You an use this to store baselines for calculating the reward
- Returns:
The reward (a number) to be given to the agent at the end of the step.
- class scml.oneshot.rl.reward.DefaultRewardFunction[source]
Bases:
RewardFunction
The default reward function of SCML
- Remarks:
The reward is the difference between the balance before the action and after it.
- before_action(awi: scml.oneshot.awi.OneShotAWI) float [source]
Called before executing the action from the RL agent to save any required information for calculating the reward in its return
- Remarks:
The returned value will be passed as
info
to__call__()
when it is time to calculate the reward.
- __call__(awi: scml.oneshot.awi.OneShotAWI, action: dict[str, negmas.SAOResponse], info: float)[source]
Called to calculate the reward to be given to the agent at the end of a step.
- Parameters:
awi –
OneShotAWI
to access the agent’s stateaction – The action (decoded) as a mapping from partner ID to responses to their last offer.
info – Information generated from
before_action()
. You an use this to store baselines for calculating the reward
- Returns:
The reward (a number) to be given to the agent at the end of the step.