scml.std.rl.reward

Module Contents

Classes

DefaultRewardFunction

The default reward function of SCML

RewardFunction

Represents a reward function.

class scml.std.rl.reward.DefaultRewardFunction[source]

Bases: RewardFunction

The default reward function of SCML

Remarks:
  • The reward is the difference between the balance before the action and after it.

before_action(awi: scml.oneshot.awi.OneShotAWI) float[source]

Called before executing the action from the RL agent to save any required information for calculating the reward in its return

Remarks:

The returned value will be passed as info to __call__() when it is time to calculate the reward.

__call__(awi: scml.oneshot.awi.OneShotAWI, action: dict[str, negmas.SAOResponse], info: float)[source]

Called to calculate the reward to be given to the agent at the end of a step.

Parameters:
  • awiOneShotAWI to access the agent’s state

  • action – The action (decoded) as a mapping from partner ID to responses to their last offer.

  • info – Information generated from before_action(). You an use this to store baselines for calculating the reward

Returns:

The reward (a number) to be given to the agent at the end of the step.

class scml.std.rl.reward.RewardFunction[source]

Bases: Protocol

Represents a reward function.

Remarks:
  • before_action is called before the action is executed for initialization and should return info to be passed to the call

  • __call__ is called with the awi (to get the state), action and info and should return the reward

before_action(awi: scml.oneshot.awi.OneShotAWI) Any[source]

Called before executing the action from the RL agent to save any required information for calculating the reward in its return

Remarks:

The returned value will be passed as info to __call__() when it is time to calculate the reward.

__call__(awi: scml.oneshot.awi.OneShotAWI, action: dict[str, negmas.SAOResponse], info: Any) float[source]

Called to calculate the reward to be given to the agent at the end of a step.

Parameters:
  • awiOneShotAWI to access the agent’s state

  • action – The action (decoded) as a mapping from partner ID to responses to their last offer.

  • info – Information generated from before_action(). You an use this to store baselines for calculating the reward

Returns:

The reward (a number) to be given to the agent at the end of the step.