scml.std.rl.policies
====================

.. py:module:: scml.std.rl.policies


Functions
---------

.. autoapisummary::

   scml.std.rl.policies.greedy_policy
   scml.std.rl.policies.random_action
   scml.std.rl.policies.random_policy


Module Contents
---------------

.. py:function:: greedy_policy(obs: numpy.ndarray, awi: scml.oneshot.awi.OneShotAWI, obs_manager: scml.oneshot.rl.observation.ObservationManager, action_manager: scml.oneshot.rl.action.ActionManager = FlexibleActionManager(ANACOneShotContext()), debug=False, distributor: Callable[[int, int], list[int]] = all_but_concentrated) -> numpy.ndarray

   A simple greedy policy.

   :param obs: The current observation
   :param awi: The AWI of the agent running the policy
   :param obs_manager: The observation manager used to encode the observation
   :param action_manager: The action manager to be used to encode the action
   :param debug: If True, extra assertions are tested
   :param distributor: A callable that receives a total quantity to be distributed
                       over n partners and returns a list of n values that sum to this total quantity

   Remarks:
       - Accepts the subset of offers with maximum total quantity under current needs.
       - The remaining quantity is distributed over the remaining partners using the distributor function
       - Prices are set to the worst for the agent if the price range is small else they are set randomly


.. py:function:: random_action(obs: numpy.ndarray, env: scml.oneshot.rl.env.OneShotEnv) -> numpy.ndarray

   Samples a random action from the action space of the


.. py:function:: random_policy(obs: numpy.ndarray, env: scml.oneshot.rl.env.OneShotEnv, pend: float = 0.05, paccept: float = 0.15) -> numpy.ndarray

   Ends the negotiation or accepts with a predefined probability or samples a random response.