scml.std.rl.policies ==================== .. py:module:: scml.std.rl.policies Functions --------- .. autoapisummary:: scml.std.rl.policies.greedy_policy scml.std.rl.policies.random_action scml.std.rl.policies.random_policy Module Contents --------------- .. py:function:: greedy_policy(obs: numpy.ndarray, awi: scml.oneshot.awi.OneShotAWI, obs_manager: scml.oneshot.rl.observation.ObservationManager, action_manager: scml.oneshot.rl.action.ActionManager = FlexibleActionManager(ANACOneShotContext()), debug=False, distributor: Callable[[int, int], list[int]] = all_but_concentrated) -> numpy.ndarray A simple greedy policy. :param obs: The current observation :param awi: The AWI of the agent running the policy :param obs_manager: The observation manager used to encode the observation :param action_manager: The action manager to be used to encode the action :param debug: If True, extra assertions are tested :param distributor: A callable that receives a total quantity to be distributed over n partners and returns a list of n values that sum to this total quantity Remarks: - Accepts the subset of offers with maximum total quantity under current needs. - The remaining quantity is distributed over the remaining partners using the distributor function - Prices are set to the worst for the agent if the price range is small else they are set randomly .. py:function:: random_action(obs: numpy.ndarray, env: scml.oneshot.rl.env.OneShotEnv) -> numpy.ndarray Samples a random action from the action space of the .. py:function:: random_policy(obs: numpy.ndarray, env: scml.oneshot.rl.env.OneShotEnv, pend: float = 0.05, paccept: float = 0.15) -> numpy.ndarray Ends the negotiation or accepts with a predefined probability or samples a random response.