`scml.std.rl.policies`

Module Contents

`greedy_policy`()[, debug])	A simple greedy policy.
`random_action`(→ numpy.ndarray)	Samples a random action from the action space of the
`random_policy`(→ numpy.ndarray)	Ends the negotiation or accepts with a predefined probability or samples a random response.

scml.std.rl.policies.greedy_policy(obs: numpy.ndarray, awi: scml.oneshot.awi.OneShotAWI, obs_manager: scml.oneshot.rl.observation.ObservationManager, action_manager: scml.oneshot.rl.action.ActionManager = FlexibleActionManager(ANACOneShotContext()), debug=False, distributor: Callable[[int, int], list[int]] = all_but_concentrated) → numpy.ndarray[source]

A simple greedy policy.

Parameters:

obs – The current observation
awi – The AWI of the agent running the policy
obs_manager – The observation manager used to encode the observation
action_manager – The action manager to be used to encode the action
debug – If True, extra assertions are tested
distributor – A callable that receives a total quantity to be distributed over n partners and returns a list of n values that sum to this total quantity

Remarks:

Accepts the subset of offers with maximum total quantity under current needs.
The remaining quantity is distributed over the remaining partners using the distributor function
Prices are set to the worst for the agent if the price range is small else they are set randomly

scml.std.rl.policies.random_action(obs: numpy.ndarray, env: scml.oneshot.rl.env.OneShotEnv) → numpy.ndarray[source]: Samples a random action from the action space of the

scml.std.rl.policies.random_policy(obs: numpy.ndarray, env: scml.oneshot.rl.env.OneShotEnv, pend: float = 0.05, paccept: float = 0.15) → numpy.ndarray[source]: Ends the negotiation or accepts with a predefined probability or samples a random response.