scml.oneshot.rl.reward
======================

.. py:module:: scml.oneshot.rl.reward


Classes
-------

.. autoapisummary::

   scml.oneshot.rl.reward.RewardFunction
   scml.oneshot.rl.reward.DefaultRewardFunction


Module Contents
---------------

.. py:class:: RewardFunction

   Bases: :py:obj:`Protocol`


   Represents a reward function.

   Remarks:
       - `before_action` is called before the action is executed for initialization and should return info to be passed to the call
       - `__call__` is called with the awi (to get the state), action and info and should return the reward


   .. py:method:: before_action(awi: scml.oneshot.awi.OneShotAWI) -> Any

      Called before executing the action from the RL agent to save any required information for
      calculating the reward in its return

      Remarks:
          The returned value will be passed as `info` to `__call__()` when it is time to calculate
          the reward.


   .. py:method:: __call__(awi: scml.oneshot.awi.OneShotAWI, action: dict[str, negmas.SAOResponse], info: Any) -> float

      Called to calculate the reward to be given to the agent at the end of a step.

      :param awi: `OneShotAWI` to access the agent's state
      :param action: The action (decoded) as a mapping from partner ID to responses to their last offer.
      :param info: Information generated from `before_action()`. You an use this to store baselines for calculating the reward

      :returns: The reward (a number) to be given to the agent at the end of the step.


.. py:class:: DefaultRewardFunction

   Bases: :py:obj:`RewardFunction`


   The default reward function of SCML

   Remarks:
       - The reward is the difference between the balance before the action and after it.


   .. py:method:: before_action(awi: scml.oneshot.awi.OneShotAWI) -> float

      Called before executing the action from the RL agent to save any required information for
      calculating the reward in its return

      Remarks:
          The returned value will be passed as `info` to `__call__()` when it is time to calculate
          the reward.


   .. py:method:: __call__(awi: scml.oneshot.awi.OneShotAWI, action: dict[str, negmas.SAOResponse], info: float)

      Called to calculate the reward to be given to the agent at the end of a step.

      :param awi: `OneShotAWI` to access the agent's state
      :param action: The action (decoded) as a mapping from partner ID to responses to their last offer.
      :param info: Information generated from `before_action()`. You an use this to store baselines for calculating the reward

      :returns: The reward (a number) to be given to the agent at the end of the step.