scml.std.rl.policies

Module Contents

Functions

greedy_policy()[, debug])

A simple greedy policy.

random_action(→ numpy.ndarray)

Samples a random action from the action space of the

random_policy(→ numpy.ndarray)

Ends the negotiation or accepts with a predefined probability or samples a random response.

scml.std.rl.policies.greedy_policy(obs: numpy.ndarray, awi: scml.oneshot.awi.OneShotAWI, obs_manager: scml.oneshot.rl.observation.ObservationManager, action_manager: scml.oneshot.rl.action.ActionManager = FlexibleActionManager(ANACOneShotContext()), debug=False, distributor: Callable[[int, int], list[int]] = all_but_concentrated) numpy.ndarray[source]

A simple greedy policy.

Parameters:
  • obs – The current observation

  • awi – The AWI of the agent running the policy

  • obs_manager – The observation manager used to encode the observation

  • action_manager – The action manager to be used to encode the action

  • debug – If True, extra assertions are tested

  • distributor – A callable that receives a total quantity to be distributed over n partners and returns a list of n values that sum to this total quantity

Remarks:
  • Accepts the subset of offers with maximum total quantity under current needs.

  • The remaining quantity is distributed over the remaining partners using the distributor function

  • Prices are set to the worst for the agent if the price range is small else they are set randomly

scml.std.rl.policies.random_action(obs: numpy.ndarray, env: scml.oneshot.rl.env.OneShotEnv) numpy.ndarray[source]

Samples a random action from the action space of the

scml.std.rl.policies.random_policy(obs: numpy.ndarray, env: scml.oneshot.rl.env.OneShotEnv, pend: float = 0.05, paccept: float = 0.15) numpy.ndarray[source]

Ends the negotiation or accepts with a predefined probability or samples a random response.