Developing an agent for SCML2024 (OneShot)

In 2021, we introduced a new track called SCML-OneShot which implements a simplified problem in which the agent can focus on the many to many concurrent negotiation problem without needing to worry about long term planning or production planning as is the case with the standard and collusion tracks.

SCML-OneShot brief introduction Please refer to the first tutorial for a brief introduction about the world simulated in this game as well as pointers to more information. We will assume knowledge of at least this brief introduction in the remainder of this tutorial.

First things first, let’s create some helper functions that will allow us to evaluate different agents we develop in this tutorial:

from negmas import SAOResponse, ResponseType, Outcome, SAOState

from scml.oneshot.world import SCML2024OneShotWorld as W
from scml.oneshot import *
from scml.runner import WorldRunner
import pandas as pd
from rich.jupyter import print
# create a runner that encapsulates a number of configs to evaluate agents
# in the same conditions every time

CONFIGS, REPS, STEPS = 10, 3, 50
context = ANACOneShotContext(n_steps=STEPS, world_params=dict(construct_graphs=True))
single_agent_runner = WorldRunner(
    context, n_configs=CONFIGS, n_repetitions=REPS, save_worlds=True
)
full_market_runner = WorldRunner.from_runner(
    single_agent_runner, control_all_agents=True
)

Here we use the WorldRunner class which is designed to allow us to compare multiple agents in exactly the same conditions. To create a WorldRunner, you need to pass a context which is used for generating the worlds in which the agents are evaluated. the SCML package defines several contexts that allow us to control these worlds. For example the StrongSupplierContext will always create worlds in which the agent being evaluated is in the first production level \(L_0\) with more agents on this level than on the next level. You cand define your own contexts for experimenting with specific conditions (e.g. specific exogenous contract distribution, market structure, etc). The most general context which will produce any world that your agent may encounter in the ANAC comptition is the ANACOneShotContext for one-shot worlds and ANACStdContext for standard worlds.

We create two runners:

  1. single_agent_runner in which a single agent is being evaluated while the rest of the agents are sampled randomly from a subset of SCML built-in agents.

  2. full_market_runner in which all agents in the market are controlled by the agent type being evaluated. This may be helpful in understanding how your agent behaves in this extreme condition but can be misleading as an estimate of the agent’s performance in the official ANAC competition.

This is time to describe some of the tools that the WorldRunner gives you to evaluate the agent. The WorldRunner is a Callable. You just call it with the class (agent type) you want to evaluate. You can optionally pass parameters if you would like to compared different parameters for example. Remember in this case to also pass a name to differentiate between different parameter choices.

We are using a relatively large nubmer of configurations, repetitions per configuration and steps (days) per repetition. If you are running this notebook for the first time, consider reducing CONFIGS, REPS, STEPS above to make it run faster.

Testing a completely random agent

Let’s try an agent that behaves randomly:

full_market_runner(RandomOneShotAgent);

We can use the runner now to display the worlds for a given type that we evaluated:

full_market_runner.draw_worlds_of(RandomOneShotAgent);
../_images/02.develop_agent_scml2024_oneshot_7_0.png

Lots of contracts have been reached, but were they good contracts? We can use the runner now to plot several statistics (e.g. shortfall-penalty, disposal-cost, productivity, and score in this example). There are more than 42 such statistics that you can display.

full_market_runner.plot_stats(agg=False);
../_images/02.develop_agent_scml2024_oneshot_9_0.png

The score is going down monotonically showing that this random agent is losing money every step. Note that this test was when the agent controller every factory in the market. This is very different than the ANAC competition in whic the agent controls a single agent. Luckily, we already have a runner that tests exactly this case. Let’s see how does RandomOneShotAgent behave in such cases:

single_agent_runner(RandomOneShotAgent)
single_agent_runner.draw_worlds_of(RandomOneShotAgent);
../_images/02.develop_agent_scml2024_oneshot_11_0.png
single_agent_runner.plot_stats(agg=False);
../_images/02.develop_agent_scml2024_oneshot_12_0.png

Still losing money but much less than before. The fact that other agents made sense actually help our random agent get higher score. We can now check the distribution of scores for our agent using the score_summary method:

single_agent_runner.score_summary()
type score count mean std min 25% 50% 75% max
0 RandomOneShotAgent 0.65793 30.0 0.65793 0.185348 0.336662 0.517957 0.71734 0.780287 0.954094

You may have noticed that in some worlds multiple agents were of this random type (having Ra in their names). We can check which agent in each world was evaluated using the agents_per_world_of method:

single_agent_runner.agents_per_world_of(RandomOneShotAgent)
{'c0_RandomOneShotAgent_2/20240307H102843381773DM6bvVuL': [00Ra@0],
 'c1_RandomOneShotAgent_2/20240307H102843395130xCD2AEh6': [09Ra@1],
 'c2_RandomOneShotAgent_2/20240307H1028434068309wSk61hc': [03Ra@0],
 'c3_RandomOneShotAgent_2/20240307H1028434193944qJf2mRi': [10Ra@1],
 'c4_RandomOneShotAgent_2/20240307H1028434348725FI408iN': [02Ra@0],
 'c5_RandomOneShotAgent_2/20240307H102843446517qRCJQARL': [06Ra@1],
 'c6_RandomOneShotAgent_2/20240307H1028434575457oq3tBfI': [07Ra@1],
 'c7_RandomOneShotAgent_2/20240307H1028434712435BW7DlOy': [10Ra@1],
 'c8_RandomOneShotAgent_2/20240307H1028434854184xqGXFax': [12Ra@1],
 'c9_RandomOneShotAgent_2/20240307H102843497378K9hijmDw': [09Ra@1]}

This random agent always loses money. Can we do better? Let’s start by an agent that does absolutely nothing.

An agent that does nothing

class MyOneShotDoNothing(OneShotPolicy):
    """My Agent that does nothing"""

    def act(self, state):
        return {}


""
single_agent_runner(MyOneShotDoNothing)
single_agent_runner.draw_worlds_of(MyOneShotDoNothing);
../_images/02.develop_agent_scml2024_oneshot_18_0.png

In all of the graph representing world simulations, we use short names that represent the type of the agent. For example an agent named 03Ran@1 is an agent of type RandomOneShotAgent at production level 1 that was the third agent to create. MDN here is a shorthand for MyOneShotDoNothingAgent (we will usually remove OneShot and Agent from the name before shortening it).

Notice how there is exactly one agent of our type (MDN) in each simulation. Moreover, these are in exactly the same palces in which the random agents evaluated were before. This is how we can guarantee that the comparison is fair.

Looking at the contracts-signed, we can see that none of the concluded contracts involved our do-nothing agent. Nevertheless, these agents still had exogenous contracts which means that they will lose money. A do-nothing agent will usually lose money in this game.

Let’s check the scores of different agents to confirm:

single_agent_runner.score_summary()
type score count mean std min 25% 50% 75% max
0 MyOneShotDoNothing 0.716924 30.0 0.716924 0.128240 0.518197 0.587324 0.711476 0.852739 0.885616
1 RandomOneShotAgent 0.657930 30.0 0.657930 0.185348 0.336662 0.517957 0.717340 0.780287 0.954094
single_agent_runner.plot_stats(agg=False);
../_images/02.develop_agent_scml2024_oneshot_21_0.png

It is clear that our do-nothing agent always loses money and is even worse than acting randomly. That is because it cannot get any contracts from negotiation to satisfy its needs from the exogenous contracts but it still has to pay for disposal cost and shortfall penalty. This is by design. We set the penalties so that this is almost always the case to encourage agents to trade.

We can also have a look at the exogenous contracts that drive the market.

import math
from typing import Iterable


def analyze_contracts(worlds, exogenous_only=True):
    """
    Analyzes the contracts signed in the given world
    """
    dfs = []
    for world in worlds:
        dfs.append(pd.DataFrame.from_records(world.saved_contracts))
    data = pd.concat(dfs)
    if exogenous_only:
        data = data.loc[
            (data["seller_name"] == "SELLER") | (data["buyer_name"] == "BUYER"), :
        ]
    return data.groupby(["seller_name", "buyer_name"])[["quantity", "unit_price"]].agg(
        dict(quantity=("sum", "count"), unit_price="mean")
    )


analyze_contracts(single_agent_runner.worlds_of())
quantity unit_price
sum count mean
seller_name buyer_name
04Eq@1 BUYER 2442 300 28.000000
05Eq@1 BUYER 5406 600 27.550000
05Ra@1 BUYER 4344 852 26.542254
06Eq@1 BUYER 2106 300 29.160000
06Gr@1 BUYER 6834 900 27.133333
06MDN@1 BUYER 1371 150 28.200000
06Ra@1 BUYER 6375 750 28.240000
07Eq@1 BUYER 2742 300 27.460000
07Gr@1 BUYER 13638 1500 27.752000
07MDN@1 BUYER 774 150 27.860000
07Ra@1 BUYER 3630 744 27.229839
08Eq@1 BUYER 5460 600 27.780000
08Gr@1 BUYER 8766 1200 27.085000
08Ra@1 BUYER 10638 1200 28.075000
09Gr@1 BUYER 4668 888 28.250000
09MDN@1 BUYER 2382 300 26.990000
09Ra@1 BUYER 9996 1200 27.562500
10Eq@1 BUYER 2910 480 27.412500
10Gr@1 BUYER 3054 546 28.241758
10MDN@1 BUYER 2817 300 27.930000
10Ra@1 BUYER 4341 600 27.715000
11Eq@1 BUYER 4182 600 27.690000
11Gr@1 BUYER 2178 300 29.160000
11Ra@1 BUYER 1890 300 29.440000
12Gr@1 BUYER 4320 600 27.930000
12MDN@1 BUYER 1143 150 28.480000
12Ra@1 BUYER 4023 450 28.360000
13Ra@1 BUYER 4824 600 29.290000
SELLER 00Eq@0 9894 1200 9.940000
00Gr@0 2658 300 10.260000
00MDN@0 1338 150 10.100000
00Ra@0 12636 1350 9.940000
01Eq@0 5304 600 9.930000
01Gr@0 8532 900 9.893333
01Ra@0 13176 1500 10.064000
02Eq@0 11064 1200 10.035000
02Gr@0 7554 900 9.893333
02MDN@0 1317 150 10.020000
02Ra@0 7059 750 10.108000
03Eq@0 7608 900 10.013333
03Gr@0 8688 900 10.133333
03MDN@0 1446 150 10.220000
03Ra@0 9660 1050 10.145714
04Eq@0 10392 1200 9.985000
04Gr@0 10854 1200 9.960000
04Ra@0 2352 300 9.840000
05Eq@0 5448 600 9.850000
05Gr@0 8400 900 9.986667
06Eq@0 2904 300 9.720000
06Gr@0 2400 300 10.340000
06Ra@0 2898 300 10.120000
07Ra@0 2760 300 10.160000

There are few things to note about the distribution of the exogenous contracts:

  • The unit price of the raw material is always lower than that of the final product. This is the source of profitability in this market.

  • Each agent has a different mean and standar deviation for the quantities in its exogenous contracts. This means that different agents will have different utility functions but these utility functions for different steps are related because the exogenous contract is sampled from some common distribution for each agent for all the steps which makes learning more useful in the game.

Building your own agent

A one-shot agent needs only to do negotiation. The simplest possible version (MyDoNothingAgent above) just responded to offers from its partners and proposed new offers to them.

Looking at the graph for the world simulation, we can see immediately some features of the one-shot simulation that are not replicated in the full SCML game:

  • All negotiation requests are accepted. In fact in the one-shot game, the agent need not consider requesting negotiations or deciding the negotiation agenda as the system takes care of this ensuring that on every simulated day every agent is negotiating with its suppliers and or consumers about trade on that day (and only that day).

  • Contracts in the one-shot game are always executed (despite not showing that in the graph). There is no concept of a breach. Failure to honor contracts is instead penalized monetarily. Contracts are also never cancelled or nullified. This greatly simplifies the problem as the agent does not need to keep track of contract execution.

  • Production is too fast that it does not affect the agent reasoning. In the terminology to be presented in the following tutorial, there is no need for an explicit production strategy.

  • There is no need to consider future negotiations while reasoning about a the current set of negotiations. This greatly simplifies agent design as there is no long-term planning. In the terminology to be presented in the following section, there is no need for a trading strategy

Your AWI

As described in the previous tutorial, your agent can sense and act in the simulation by accessing methods and properties of its AWI which is accessible at any time as:

self.awi

You can see all of these methods and properties specific for the OneShotAWI and its descendents here.

Your ufun

The Oneshot game has the advantage that it is possible at the end of each simulation step (day) to calculate exactly the profit you will be getting for the set of contracts you have (either through negotiation or as exogenous contracts). We provide a utility function class (OneShotUtilityFunction which can be used normally as any NegMAS UtilityFunction. This ufun is available to your all the time (a new one is created for each simulation step) and is accessible as:

self.ufun

The most important services this ufun class provides for you are the following:

  • from_offers: This method receives a list of outcomes and a list of booleans indicating whether each of them is for buying or for selling. It returns to you the profit you will get if all of these outcomes and nothing else became contracts. An outcome is just a tuple (quantity, delivery time, unit price). You can use this callback during negotiation to judge hypothetical agreements with your partners.

  • from_contracts: This method is the same as from_offers but it receives a list of Contract objects. It is useful after all negotiations are finished to calculate the profit you will be getting for this step.

  • is_breach: will tell you whether or not getting the given total input and output quantities will make you cause a breach. Notice that breaches are expected in the OneShot track as any mismatch in the quantities of inputs and outputs will constitute a breach.

  • breach_level: returns a value between zero and one specifying the level of breach that will be recorded for a given total input and output quantities.

  • find_limit: finds either the maximum or the minimum possible profit (minimum profit is maximm loss) attainable in the current simulation step (day). This is useful when you want to normalize utility values between zero and one. Two of the agents we will develop during this tutorial will use this feature.

  • max_utility, min_utility: give the maximum and minimum utilities/profits attainable. Note that you must prepare them by calling find_limit. We will go into how to do that later.

  • best, worst: give more information about the cases of maximum and minimum profit (i.e. the total input and output quantity needed, the prodcible quantity, best possible prices for buying and selling, etc). Again, these are not available except after calling find_limit.

Your callbacks

Your agent needs to implement methods that are called by the system at various time during the negotiation. You can find a full list in the game description.

The most important ones are:

  • init() called once at the beginning of the simulation (i.e. before the first day starts). At this point, your AWI is set but you should not assume anything else.

  • before_step() called at the beginning of every day. At this point, your ufun is set and market information is available.

  • step() called at the end of every day. You can use this to analyze what happened during the day and modify your strategy in the future.

  • on_negotiation_success()/on_negotiation_failure() called after each negotiation is concluded to let you know what happened in it.

  • Depending on your base-class, you will also need to implement methods that allow you to control negotiations. These will be explained in details in the following sections but here is a summary:

    • OneShotAgent If your agent is based on OneShotAgent, you will get a propose() call when you need to offer something to one of our partners during negotiation and respond() when asked to respond to one of its offers.

    • OneShotSyncAgent If your agent is based on OneShotSyncAgent you will get a call to first_proposals() once every day to set your first proposal in all negotiations and a counter_all() call to counter offers from your partners. The system will try to always give you one offer from each partner in the counter_all() call but that is not guaranteed and sometimes it may be called with a subset of the offers.

    • OneShotPolicy This is very similar to OneShotSyncAgent with only one callback act() which receives the AWI (as state) and returns a mapping from each partner to an SAOResponse (i.e. acceptance, ending negotiation, or rejection and a counter offer). This is mostly there to help build RL agents (see next tutorial).

    • OneShotSingleAgreementAgent If your agent is based on OneShotSingleAgreementAgent you will have to implement is_acceptable() to decide if a given offer is acceptable to you, best_offer() to find the best offer in a given negotiation for your agent and is_better() to compare two offers. Once you implement these, the agent will implement all callback for you trying to get a single agreement that maximizes your utility. Note that, again, it is not guaranteed that you will get a single agreement at the end but the system will try its best to achieve that.

Now we can start working on our agent.

We will discuss these different base classes and basing your agent in each of them in more details in what follows.

OneShotAgent

This is the base class of all agents for SCML-OneShot. Both SyncOneShotAgent and SingleAgreementOneShotAgent inherit from this class and provide support for a simplified way of developing your agent (or so we think). It is perfectly OK to use OneShotAgent directly as the base of your agent.

As discussed earlier, you will receive a propose and respond call for each round in each negotiation. The propose method receives the negotiation state (an object of the type `SAOState <https://negmas.readthedocs.io/en/latest/api/negmas.sao.SAOState.html>`__ including among other things the current negotiation step, relative time, last offer, etc) and is required to return an Outcome which is just a tuple of a quantity, delivery time (must be this simulation step) and unit price, in that order (See `negmas documentation <https://negmas.readthedocs.io/en/latest>`__) as an offer. The respond method receives a negotiation state and an offer (Outcome) from the opponent and needs to respond to it by a decision from the `ResponseType enumeration <https://negmas.readthedocs.io/en/latest/api/negmas.gb.ResponseType.html>`__ (REJECT_OFFER, ACCEPT_OFFER, and END_NEGOTIATION). Other than these two negotiation related callbacks, the agent receives an init call just after it joins the simulatin and a before_step/step call before/after each simulation step. The agent is also informed about failure/success of negotiations through the on_negotiation_success/on_negotiation_failure callbacks. That is all. A one-shot agent needs to only think about what should it do to respond to each of these seven callbacks. All of these callbacks except propose and respond are optional.

Simple OneShotAgent

We have already seen how to develop a do-nothing agent using the OneShotAgent class. Let’s try to develop some more meaningful agent using the same base class.

class SimpleAgent(OneShotAgent):
    """A greedy agent based on OneShotAgent"""

    def propose(self, negotiator_id: str, state) -> "Outcome":
        return self.best_offer(negotiator_id)

    def respond(self, negotiator_id, state, source=""):
        offer = state.current_offer
        my_needs = self._needed(negotiator_id)
        if my_needs <= 0:
            return ResponseType.END_NEGOTIATION
        return (
            ResponseType.ACCEPT_OFFER
            if offer[QUANTITY] <= my_needs
            else ResponseType.REJECT_OFFER
        )

    def best_offer(self, negotiator_id):
        my_needs = self._needed(negotiator_id)
        if my_needs <= 0:
            return None
        ami = self.get_nmi(negotiator_id)
        if not ami:
            return None
        quantity_issue = ami.issues[QUANTITY]

        offer = [-1] * 3
        offer[QUANTITY] = max(
            min(my_needs, quantity_issue.max_value), quantity_issue.min_value
        )
        offer[TIME] = self.awi.current_step
        offer[UNIT_PRICE] = self._find_good_price(ami)
        return tuple(offer)

    def _find_good_price(self, ami):
        """Finds a good-enough price."""
        unit_price_issue = ami.issues[UNIT_PRICE]
        if self._is_selling(ami):
            return unit_price_issue.max_value
        return unit_price_issue.min_value

    def is_seller(self, negotiator_id):
        return negotiator_id in self.awi.current_negotiation_details["sell"].keys()

    def _needed(self, negotiator_id=None):
        return (
            self.awi.needed_sales
            if self.is_seller(negotiator_id)
            else self.awi.needed_supplies
        )

    def _is_selling(self, ami):
        return ami.annotation["product"] == self.awi.my_output_product

Let’s see how well did this agent behave:

single_agent_runner(SimpleAgent)
single_agent_runner.score_summary()
type score count mean std min 25% 50% 75% max
2 SimpleAgent 0.905806 30.0 0.905806 0.243713 0.256752 0.857049 0.969827 1.038091 1.176166
0 MyOneShotDoNothing 0.716924 30.0 0.716924 0.128240 0.518197 0.587324 0.711476 0.852739 0.885616
1 RandomOneShotAgent 0.657930 30.0 0.657930 0.185348 0.336662 0.517957 0.717340 0.780287 0.954094

we can check how the score and other statistics of this type of agent changes over time:

single_agent_runner.plot_stats(agg=False);
../_images/02.develop_agent_scml2024_oneshot_31_0.png

This simple agent is better than the random agent and our do-nothing agent. It can make around 1% profit. Let’s understand how it works:

The main idea of this agent is pretty simple. It tries to secure as much of its needs (sales/supplies) as possible in every negotiation at the best possible price for itself.

To achieve this goal, the agent uses the fact that the AWI already keeps track of this information as needed_supplies and needed_sales. Therefore, it defines a helper that calculates the amount it needs by subtracting the exogenous quantity it has from the amount it secured

def _needed(self):
    self.awi.needed_sales if self.is_seller(negotiator_id) else self.awi.needed_supplies

where it uses needed_sales if the current negotiation is for selling and needed_supplies otherwise. Now that the agent can calculate how much it needs to buy/sell, it implements the negotiation related call-backs (propose and respond).

Here is the full implementation of propose:

def propose(self, negotiator_id: str, state) -> "Outcome":
    return self.best_offer(negotiator_id)

The agent is always offering its best offer which is calculated in the best_offer method to be discussed later. It does not conceed at all.

Responding to opponent offers is also simple:

  • it starts by calculating its needs using the helper _needed, and ends the negotiation if it needs no more sales/supplies

my_needs = self._needed()
if my_needs <= 0:
    return ResponseType.END_NEGOTIATION
  • If the offered quantity is less than its needs, accept the offer. Otherwise reject the offer.

return (
    ResponseType.ACCEPT_OFFER
    if offer[QUANTITY] <= my_needs
    else ResponseType.REJECT_OFFER
)

Most of the code is in the best_offer method which calculates the best offer for a negotiation given the agreements reached so far. Let’s check it line by line:

  • The agent checks its needs and returns None ending the negotiation if it needs no more sales/supplies. We also get access to the AMI.

my_needs = self._needed()
if my_needs <= 0:
    return None
ami = self.get_nmi(negotiator_id)
if not ami:
    return None
  • It then finds out the Issue objects corresponding to the quantity and unit-price for this negotiation and initializes an offer (we have 3 issues)

quantity_issue = ami.issues[QUANTITY]
unit_price_issue = ami.issues[UNIT_PRICE]
offer = [-1] * 3
  • The time is always the current step.

offer[TIME] = self.awi.current_step
  • The quantity to offer is simply the needs of the agent without mapped within the range of the quantities in the negotiation agenda (note that this may lead the agent to buy more than its needs).

offer[QUANTITY] = max(min(my_needs, quantity_issue.max_value), quantity_issue.min_value)
  • Finally, the unit price is the maximum possible unit price if the agent is selling otherwise it is the minimum possible price. Note that is_selling() assumes that the agent will never find itself in a middle layer in a deep negotiation. We will alleviate this issue later.

if self._is_selling(ami):
    offer[UNIT_PRICE] = unit_price_issue.max_value
else:
    offer[UNIT_PRICE] = unit_price_issue.min_value
return tuple(offer)

A (suposedly) better greedy agent

One problem with our SimpleAgent is that it does not take price into account in two ways:

  • When asked to propose, it always proposes an offer with the best price for itself. It never concedes on prices. In many cases this will lead to disagreement.

  • When asked to respond to an offer, it does not even check the price. This may lead to bad agreements (i.e. very high buying prices/very low selling prices).

We will try to remedie both of these issues in the following agent:

class BetterAgent(SimpleAgent):
    """A greedy agent based on OneShotAgent with more sane strategy"""

    def __init__(self, *args, concession_exponent=0.2, **kwargs):
        super().__init__(*args, **kwargs)
        self._e = concession_exponent

    def respond(self, negotiator_id, state, source=""):
        offer = state.current_offer
        if offer is None:
            return ResponseType.REJECT_OFFER
        response = super().respond(negotiator_id, state, source)
        if response != ResponseType.ACCEPT_OFFER:
            return response
        nmi = self.get_nmi(negotiator_id)
        return (
            response
            if self._is_good_price(nmi, state, offer[UNIT_PRICE])
            else ResponseType.REJECT_OFFER
        )

    def _is_good_price(self, nmi, state, price):
        """Checks if a given price is good enough at this stage"""
        mn, mx = self._price_range(nmi)
        th = self._th(state.step, nmi.n_steps)
        # a good price is one better than the threshold
        if self._is_selling(nmi):
            return (price - mn) >= th * (mx - mn)
        else:
            return (mx - price) >= th * (mx - mn)

    def _find_good_price(self, nmi):
        """Finds a good-enough price conceding linearly over time"""
        state = nmi.state
        mn, mx = self._price_range(nmi)
        th = self._th(state.step, nmi.n_steps)
        # offer a price that is around th of your best possible price
        if self._is_selling(nmi):
            return int(mn + th * (mx - mn))
        else:
            return int(mx - th * (mx - mn))

    def _price_range(self, nmi):
        """Finds the minimum and maximum prices"""
        mn = nmi.issues[UNIT_PRICE].min_value
        mx = nmi.issues[UNIT_PRICE].max_value
        return mn, mx

    def _th(self, step, n_steps):
        """calculates a descending threshold (0 <= th <= 1)"""
        return ((n_steps - step - 1) / (n_steps - 1)) ** self._e

Let’s see how well did this agent behave:

single_agent_runner(BetterAgent)
single_agent_runner.score_summary()
type score count mean std min 25% 50% 75% max
3 SimpleAgent 0.905806 30.0 0.905806 0.243713 0.256752 0.857049 0.969827 1.038091 1.176166
0 BetterAgent 0.742522 30.0 0.742522 0.355723 0.087694 0.320086 0.871921 0.978388 1.181927
1 MyOneShotDoNothing 0.716924 30.0 0.716924 0.128240 0.518197 0.587324 0.711476 0.852739 0.885616
2 RandomOneShotAgent 0.657930 30.0 0.657930 0.185348 0.336662 0.517957 0.717340 0.780287 0.954094

It seems that BetterAgent is much worse than the SimpleAgent. It is as good as the randomly behaving agent!! We failed :-(

Still, Let’s dive into the agent and analyze how it works:

The main idea in BetterAgent is to treat the price issue separately to avoid the two issues presented earlier:

  • Never conceding during proposal This is solved in the propose method by just overriding the price with a good-enough price:

    offer[UNIT_PRICE] = self._find_good_price(self.get_nmi(negotiator_id), state)
    

    As an aside, notice that we needed to convert the offer to a list in order to overwrite the price then back into a tuple to send it to the partner.

  • Never checking prices of offers This is solved in the respond method by checking whether or not the price offered is a good-enough price:

    return (
        response
        if self._is_good_price(ami, state, offer[UNIT_PRICE])
        else ResponseType.REJECT_OFFER
    )
    

    As we will see later, this is not much of an issue in SCML OneShot 2023 though.

What we mean by a good-enough price is defined in _is_good_price and _find_good_price methods. Both start by getting the limits of the unit-price in the negotiation agenda and a threshold value th:

mn, mx = self._price_range(ami, state)
th = self._th(mn, mx, state.step, ami.n_steps)

The price range is clear enough. For the threshold th is a value that starts at \(1.0\) and goes down toward \(0.0\) over the negotiation time under the control of an agent specific parameter _e called the concession exponent. Let’s see how does this look for different concession exponents:

x = np.arange(20)
fig = plt.figure()
for e in [0.1, 0.2, 1.0, 5, 10]:
    a = BetterAgent(concession_exponent=e)
    y = [a._th(i, 20) for i in x]
    plt.plot(x, y, label=f"Concession Exponent: {e}")
    plt.xlabel("Step (Of 20)")
    plt.ylabel("Threshold $th$")
    plt.legend()
../_images/02.develop_agent_scml2024_oneshot_38_0.png

You can see that the smaller the exponent the more hard-headed will the agent be. Setting the concession exponent to \(0\) will recover the behavior of the SimpleAgent in offering but will make it insisting on an unrealistic best price when responding to partner offers (can you see why?) which is definitely a bad idea. Setting it to \(\inf\) will recover the behavior of SimpleAgent in responding to offers but will make its offers least favorable for itself in terms of price (can you see why?)

Given this threshold function, we can now define is_good_price and _find_good_price:

  • _is_good_price simply compares the price given to it to the current threshold defined by multiplying th by the price rangemx - mn

    • When selling this is achieved by comparing the difference between the price and minimum price to the curren threshold:

    return (price - mn) >= th * (mx - mn)
    

    You can check that this will give the maximum unit price in the first step and gradually goes down to the minimum unit price in the last step (n_steps - 1)

    • When buying we go the other way around (starting at minimum price and going up over time to the maximum price):

    return (mx - price) >= th * (mx - mn)
    
  • _find_good_price works in the same fashion but rather than checking the goodness of a price, it simply uses the threshold to generate a good-enough price:

    if self._is_selling(ami):
        return mn + th * (mx - mn)
    else:
        return mx - th * (mx - mn)
    

Why did not this approach work

As you may have noticed, BetterAgent is not relly better than SimpleAgent. why? The main reason is that price does not really matter that much in the settings for SCML 2024 OneShot because the price range is limited to only two consecutive values (e.g. (9, 10)) which increases the relative importance of avoiding penalties by matching demand and supply.

Thinking about other negotiations

So far, our agent behaved indepdendently in each negotiation without considering what is happening in the others (except when one of them completes changing the amount secured). A simple way to consider other negotiations is to use the prices offered in them to limit our concessions. The following agent implements this idea

class AdaptiveAgent(BetterAgent):
    """Considers best price offers received when making its decisions"""

    def before_step(self):
        self._best_selling, self._best_buying = 0.0, float("inf")

    def respond(self, negotiator_id, state, source=""):
        """Save the best price received"""
        offer = state.current_offer
        response = super().respond(negotiator_id, state, source)
        nmi = self.get_nmi(negotiator_id)
        if self._is_selling(nmi):
            self._best_selling = max(offer[UNIT_PRICE], self._best_selling)
        else:
            self._best_buying = min(offer[UNIT_PRICE], self._best_buying)
        return response

    def _price_range(self, nmi):
        """Limits the price by the best price received"""
        mn, mx = super()._price_range(nmi)
        if self._is_selling(nmi):
            mn = max(mn, self._best_selling)
        else:
            mx = min(mx, self._best_buying)
        return mn, mx

Let’s see how well did this agent behave:

single_agent_runner(AdaptiveAgent)
single_agent_runner.score_summary()
type score count mean std min 25% 50% 75% max
4 SimpleAgent 0.905806 30.0 0.905806 0.243713 0.256752 0.857049 0.969827 1.038091 1.176166
0 AdaptiveAgent 0.756491 30.0 0.756491 0.367015 0.064872 0.318639 0.897773 1.008595 1.181927
1 BetterAgent 0.742522 30.0 0.742522 0.355723 0.087694 0.320086 0.871921 0.978388 1.181927
2 MyOneShotDoNothing 0.716924 30.0 0.716924 0.128240 0.518197 0.587324 0.711476 0.852739 0.885616
3 RandomOneShotAgent 0.657930 30.0 0.657930 0.185348 0.336662 0.517957 0.717340 0.780287 0.954094

Not even as good as BetterAgent, at least in this simulation. One possiblity here is that the agent became too hard-headed again because now whenever it sees a good price on one negotiation, it insists on it for all the rest. This may not be a good idea sometimes as it may lead to more disagreements. In general the agent must balance getting good prices with matching its input and output quantities.

Let’s just now see what happens if we are generous enough to grant our partner the best price for them half of the time. This should work because price is not important in SCML-OneShot

class GenerousAgent(SimpleAgent):
    """A greedy agent that always gives the best price for the opponent"""

    def _find_good_price(self, nmi):
        """Finds a good-enough price conceding linearly over time"""
        i = nmi.issues[UNIT_PRICE]
        return i.min_value if random.random() < 0.5 else i.max_value
single_agent_runner(GenerousAgent);
single_agent_runner.score_summary()
type score count mean std min 25% 50% 75% max
5 SimpleAgent 0.905806 30.0 0.905806 0.243713 0.256752 0.857049 0.969827 1.038091 1.176166
2 GenerousAgent 0.850194 30.0 0.850194 0.240858 0.260795 0.819196 0.886290 0.989907 1.171139
0 AdaptiveAgent 0.756491 30.0 0.756491 0.367015 0.064872 0.318639 0.897773 1.008595 1.181927
1 BetterAgent 0.742522 30.0 0.742522 0.355723 0.087694 0.320086 0.871921 0.978388 1.181927
3 MyOneShotDoNothing 0.716924 30.0 0.716924 0.128240 0.518197 0.587324 0.711476 0.852739 0.885616
4 RandomOneShotAgent 0.657930 30.0 0.657930 0.185348 0.336662 0.517957 0.717340 0.780287 0.954094

We finally kind of recover the performance of the SimpleAgent. That is how unimportant reasoning about prices is for SCML-OneShot. The situation changes dramatically in SCML-Std though as prices become more important.

OneShotSyncAgent

One problem that plagued all of our agents so far is that they have to make decisions (respond, propose) about negotiations on the spot. This makes it difficult to consider all other negotiations while making decisions.

Because the utility function is defined for a complete set of negotiation agreements and not for any single negotiation by itself, it makes sense to try to make decisions centrally by collecting offers from partners then responding to all of them at once. It is possible to do that by utilizing the response type ResponseType.WAIT supported by NegMAS but this entails a lot of house-keeping.

To simplify this task, we provide another base class for agents that does all of this house keeping for you exposing a simple interface that syncrhonizes all negotiations (as much as allowed by the underlying platform). The main goal of this base agent is to allow the developer to think about all negotiations together but it has some important caveats which we will discuss later. Here is an example of writing the do-nothing agent in this form:

class MySyncOneShotDoNothing(OneShotSyncAgent):
    """My Agent that does nothing"""

    def counter_all(self, offers, states):
        """Respond to a set of offers given the negotiation state of each."""
        return dict()

    def first_proposals(self):
        """Decide a first proposal on every negotiation.
        Returning None for a negotiation means ending it."""
        return dict()
single_agent_runner(MySyncOneShotDoNothing)
single_agent_runner.draw_worlds_of(MySyncOneShotDoNothing);
../_images/02.develop_agent_scml2024_oneshot_50_0.png

As you can see, in this case, we need to override counter_all to counter offers received from all the partners and first_proposals to decide a first offer for each partner.

Other than these two negotiation related callbacks, the agent receives an init call just after it joins the simulatin and a step call after each step. The agent is also informed about failure/success of negotiations through the on_negotiation_success/on_negotiation_failure callbacks. That is all. A one-shot agent needs to only think about what should it do to respond to each of these six callbacks. All of these callbacks except counter_all and first_proposals are optional.

A not so-good SyncAgent

The main advantage of using the OneShotSyncAgent is that you do not need to keep track of state variables (like secured, _supplies and _sales used earlier) and you have a common place to make your decisions about all negotiations at the same time. Here is a simple greedy agent using this approach.

class NaiveSyncAgent(OneShotSyncAgent):
    """A greedy agent based on OneShotSyncAgent"""

    def __init__(self, *args, threshold=0.5, **kwargs):
        super().__init__(*args, **kwargs)
        self._threshold = threshold

    def before_step(self):
        super().before_step()
        self.ufun.find_limit(True)
        self.ufun.find_limit(False)

    def first_proposals(self):
        """Decide a first proposal on every negotiation.
        Returning None for a negotiation means ending it."""
        return dict(
            zip(
                self.negotiators.keys(),
                (self.best_offer(_) for _ in self.negotiators.keys()),
            )
        )

    def counter_all(self, offers, states):
        """Respond to a set of offers given the negotiation state of each."""

        # Initialize all responses by my best options
        responses = {
            k: SAOResponse(ResponseType.REJECT_OFFER, v)
            for k, v in self.first_proposals().items()
        }

        # find how much quantity do I still need
        my_needs = self._needed()

        # Am I a seller?
        is_selling = (self._is_selling(self.get_nmi(_)) for _ in offers.keys())

        # sort my offres by price (descendingly/ascendingly for a seller/buyer)
        sorted_offers = sorted(
            zip(offers.values(), is_selling),
            key=lambda x: (-x[0][UNIT_PRICE]) if x[1] else x[0][UNIT_PRICE],
        )

        # greedly choose offers until my needs are satsified
        secured, outputs, chosen = 0, [], dict()
        for i, k in enumerate(offers.keys()):
            offer, is_output = sorted_offers[i]
            secured += offer[QUANTITY]
            if secured >= my_needs:
                break
            chosen[k] = offer
            outputs.append(is_output)

        # calculate the utility of selected offers
        u = self.ufun.from_offers(tuple(chosen.values()), tuple(outputs))

        # if the utility of selected offers is high enough, accept them
        rng = self.ufun.max_utility - self.ufun.min_utility
        threshold = self._threshold * rng + self.ufun.min_utility
        if u >= threshold:
            for k, v in chosen.items():
                responses[k] = SAOResponse(ResponseType.ACCEPT_OFFER, None)
        return responses

    def best_offer(self, negotiator_id):
        my_needs = self._needed(negotiator_id)
        if my_needs <= 0:
            return None
        ami = self.get_nmi(negotiator_id)
        if not ami:
            return None
        quantity_issue = ami.issues[QUANTITY]

        offer = [-1] * 3
        offer[QUANTITY] = max(
            min(my_needs, quantity_issue.max_value), quantity_issue.min_value
        )
        offer[TIME] = self.awi.current_step
        offer[UNIT_PRICE] = self._find_good_price(ami)
        return tuple(offer)

    def is_seller(self, negotiator_id):
        return negotiator_id in self.awi.current_negotiation_details["sell"].keys()

    def _needed(self, negotiator_id=None):
        return (
            self.awi.needed_sales
            if self.is_seller(negotiator_id)
            else self.awi.needed_supplies
        )

    def _find_good_price(self, nmi):
        """Finds a good-enough price conceding linearly over time"""
        if self._is_selling(nmi):
            return nmi.issues[UNIT_PRICE].min_value
        return nmi.issues[UNIT_PRICE].max_value

    def _is_selling(self, ami):
        return ami.annotation["product"] == self.awi.my_output_product

This agent shows a case of parameterizing your agent so that it can be tested with different hyper-parameters. You do that by passing whatever parameters you like as keyword arguments to the constctor:

def __init__(self, *args, threshold=0.3, **kwargs):
    super().__init__(*args, **kwargs)
    self._threshold = threshold

The one paramter we have is a threshold of utility relative to the maximum possile utility that we are willing to accept.

This agent also shows a case in which we use the built-in utility function implemented by the system (see Section 2.3 of the game description). This ufun is accessible as ufun. By default the ufun will return the profit in dollars for a given set of negotiation outcomes, offers, agreements, or contracts. Note that the ufun assumes that what it is given is the complete set of agreements and no others will be added to them later. This value may be positive or negative (loss). In some cases you need to get the utility value normalized to a range between zero and one. This agent will do that. To do this normalization, we need to know the value of maximum and minimum utilities. You can of course solve the corresponding optimziation problem but we did that for you. All you need is call find_limit and pass it a boolean (True for calculating the highest possible utility and False for calculating the lowest possible utility). To avoid doing this calculation repeatedly, you should store the results in ufun.best or ufun.worst for highest and lowest utility. After that, you can access the maximum possible utility as max_utility and minimum possible utility as min_utility. We do that in the before_step() method (called at the beginning of every day):

def before_step(self):
    super().init()
    self.ufun.find_limit(True)
    self.ufun.find_limit(False)

After this call, we can access maximum_utility, minimum_utility, best, worst members of the ufun. As explained earlier, best and worst give extra information about the conditions for achieving maximum and minimum utility.

We need to implement two methods: first_proposals (to generate a good first proposal for each negotiation) and counter_all (for countering a set of offers). We inherit from SimpleAgent in order to get access to best_offer and _is_selling methods (we could have repeated them here again of course. Note that, because of the way inheritence works in python, we must inherit from OneShotSyncAgent before SimpleAgent.

The first set of proposals in first_proposals is simply the best_offer for each negotiation which is calculated using this generator expression:

(self.best_offer(_) for _ in self.negotiators.keys())

Almost all the code now resides in the counter_all method. We will go over it here:

  • We start by initializing our response by the best offer for each negotiation using first_proposals and calculating our needs using _needed

responses = {
    k: SAOResponse(ResponseType.REJECT_OFFER, _)
    for k, v in self.first_proposals().items()
}
my_needs = self._needed(None)
  • We then sort the offers so that earlier offers have better prices for us. For sell offers, this means descendingly and for buy offers ascendingly.

is_selling = (self._is_selling(self.get_nmi(_)) for _ in offers.keys())
sorted_offers = sorted(
    zip(offers.values(), is_selling),
    key=lambda x: (-x[0][UNIT_PRICE]) if x[1] else x[0][UNIT_PRICE],
)
  • We greedily find a set of offers that satisfy all our needs (or as much as possible from them).

secured, outputs, chosen = 0, [], dict()
for i, k in enumerate(offers.keys()):
    offer, is_output = sorted_offers[i]
    secured += offer[QUANTITY]
    if secured >= my_needs:
        break
    chosen[k] = offer
    outputs.append(is_output)
  • Finally, we calculate the utility of accepting these and only these offers and accept the chosen offers if they provide 70% of the maximum possible utility. Otherwise, we reject all offers sending the default best_offer value back.

u = self.ufun.from_offers(tuple(chosen.values()), tuple(outputs))
rng = self.ufun.max_utility - self.ufun.min_utility
threshold = self._threshold * rng + self.ufun.min_utility
if u >= threshold:
    for k, v in chosen.items():
        responses[k] = SAOResponse(ResponseType.ACCEPT_OFFER, None)
return responses

Let’s see how did it do:

single_agent_runner(NaiveSyncAgent)
single_agent_runner.score_summary()
type score count mean std min 25% 50% 75% max
7 SimpleAgent 0.905806 30.0 0.905806 0.243713 0.256752 0.857049 0.969827 1.038091 1.176166
2 GenerousAgent 0.850194 30.0 0.850194 0.240858 0.260795 0.819196 0.886290 0.989907 1.171139
0 AdaptiveAgent 0.756491 30.0 0.756491 0.367015 0.064872 0.318639 0.897773 1.008595 1.181927
1 BetterAgent 0.742522 30.0 0.742522 0.355723 0.087694 0.320086 0.871921 0.978388 1.181927
4 MySyncOneShotDoNothing 0.717437 30.0 0.717437 0.128590 0.518223 0.587364 0.712251 0.853799 0.885967
3 MyOneShotDoNothing 0.716924 30.0 0.716924 0.128240 0.518197 0.587324 0.711476 0.852739 0.885616
6 RandomOneShotAgent 0.657930 30.0 0.657930 0.185348 0.336662 0.517957 0.717340 0.780287 0.954094
5 NaiveSyncAgent 0.578200 30.0 0.578200 0.288741 0.087481 0.308480 0.600469 0.810890 1.034915

Ok it works but you did not expect it to work well. right? We called it Naive for a reason.

This base-class simplifies the job of the agent developer by providing a single function (counter_all) in which to handle all offers it receive (most of the time, remember that sometimes you will receive a subset of the offers in the call). In principle the agent can then decide to accept a few of these offers and keep negotiating.

The problem with this agent is that it defines a good offer independently for each negotiation which defeats the purpose of having the chance to decide centrally what to do for all negotiations. That is made even less effective by the fact that in SCML 2024, price does not matter that much. In the following section, we design a very simple alternative that tries to resolve this issue

A better SyncAgent

We start by defining a simple helper function that distributes a given quantity \(q\) over \(n\) partners.

def distribute(q: int, n: int) -> list[int]:
    """Distributes n values over m bins with at
    least one item per bin assuming q > n"""
    from numpy.random import choice
    from collections import Counter

    if q < n:
        lst = [0] * (n - q) + [1] * q
        random.shuffle(lst)
        return lst

    if q == n:
        return [1] * n
    r = Counter(choice(n, q - n))
    return [r.get(_, 0) + 1 for _ in range(n)]

Here are few examples of how it would distribute \(10\) units over \(4\) partners

[distribute(10, 4) for _ in range(5)]
[[3, 2, 2, 3], [2, 2, 4, 2], [3, 3, 2, 2], [2, 4, 2, 2], [1, 3, 2, 4]]
[distribute(2, 4) for _ in range(5)]
[[1, 0, 0, 1], [1, 1, 0, 0], [0, 1, 0, 1], [0, 1, 1, 0], [0, 1, 1, 0]]

We will also need a helper function to find all subsets of a given set (powerset):

from itertools import chain, combinations


def powerset(iterable):
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s) + 1))
class BetterSyncAgent(OneShotSyncAgent):
    """An agent that distributes its needs over its partners randomly."""

    def distribute_needs(self) -> dict[str, int]:
        """Distributes my needs randomly over all my partners"""

        dist = dict()
        for needs, all_partners in [
            (self.awi.needed_supplies, self.awi.my_suppliers),
            (self.awi.needed_sales, self.awi.my_consumers),
        ]:
            # find suppliers and consumers still negotiating with me
            partner_ids = [_ for _ in all_partners if _ in self.negotiators.keys()]
            partners = len(partner_ids)

            # if I need nothing, end all negotiations
            if needs <= 0:
                dist.update(dict(zip(partner_ids, [0] * partners)))
                continue

            # distribute my needs over my (remaining) partners.
            dist.update(dict(zip(partner_ids, distribute(needs, partners))))
        return dist

    def first_proposals(self):
        # just randomly distribute my needs over my partners (with best price for me).
        s, p = self._step_and_price(best_price=True)
        distribution = self.distribute_needs()
        d = {k: (q, s, p) if q > 0 else None for k, q in distribution.items()}
        return d

    def counter_all(self, offers, states):
        response = dict()
        # process for sales and supplies independently
        for needs, all_partners, issues in [
            (
                self.awi.needed_supplies,
                self.awi.my_suppliers,
                self.awi.current_input_issues,
            ),
            (
                self.awi.needed_sales,
                self.awi.my_consumers,
                self.awi.current_output_issues,
            ),
        ]:
            # get a random price
            price = issues[UNIT_PRICE].rand()
            # find active partners
            partners = {_ for _ in all_partners if _ in offers.keys()}

            # find the set of partners that gave me the best offer set
            # (i.e. total quantity nearest to my needs)
            plist = list(powerset(partners))
            best_diff, best_indx = float("inf"), -1
            for i, partner_ids in enumerate(plist):
                others = partners.difference(partner_ids)
                offered = sum(offers[p][QUANTITY] for p in partner_ids)
                diff = abs(offered - needs)
                if diff < best_diff:
                    best_diff, best_indx = diff, i
                if diff == 0:
                    break

            # If the best combination of offers is good enough, accept them and end all
            # other negotiations
            th = self._current_threshold(
                min([_.relative_time for _ in states.values()])
            )
            if best_diff <= th:
                partner_ids = plist[best_indx]
                others = list(partners.difference(partner_ids))
                response |= {
                    k: SAOResponse(ResponseType.ACCEPT_OFFER, offers[k])
                    for k in partner_ids
                } | {k: SAOResponse(ResponseType.END_NEGOTIATION, None) for k in others}
                continue

            # If I still do not have a good enough offer, distribute my current needs
            # randomly over my partners.
            distribution = self.distribute_needs()
            response.update(
                {
                    k: SAOResponse(ResponseType.END_NEGOTIATION, None)
                    if q == 0
                    else SAOResponse(
                        ResponseType.REJECT_OFFER, (q, self.awi.current_step, price)
                    )
                    for k, q in distribution.items()
                }
            )
        return response

    def _current_threshold(self, r: float):
        mn, mx = 0, self.awi.n_lines // 2
        return mn + (mx - mn) * (r**4.0)

    def _step_and_price(self, best_price=False):
        """Returns current step and a random (or max) price"""
        s = self.awi.current_step
        seller = self.awi.is_first_level
        issues = (
            self.awi.current_output_issues if seller else self.awi.current_input_issues
        )
        pmin = issues[UNIT_PRICE].min_value
        pmax = issues[UNIT_PRICE].max_value
        if best_price:
            return s, pmax if seller else pmin
        return s, random.randint(pmin, pmax)
single_agent_runner(BetterSyncAgent)
single_agent_runner.score_summary()
type score count mean std min 25% 50% 75% max
2 BetterSyncAgent 1.045027 30.0 1.045027 0.058560 0.954717 1.003720 1.039322 1.076773 1.155156
8 SimpleAgent 0.905806 30.0 0.905806 0.243713 0.256752 0.857049 0.969827 1.038091 1.176166
3 GenerousAgent 0.850194 30.0 0.850194 0.240858 0.260795 0.819196 0.886290 0.989907 1.171139
0 AdaptiveAgent 0.756491 30.0 0.756491 0.367015 0.064872 0.318639 0.897773 1.008595 1.181927
1 BetterAgent 0.742522 30.0 0.742522 0.355723 0.087694 0.320086 0.871921 0.978388 1.181927
5 MySyncOneShotDoNothing 0.717437 30.0 0.717437 0.128590 0.518223 0.587364 0.712251 0.853799 0.885967
4 MyOneShotDoNothing 0.716924 30.0 0.716924 0.128240 0.518197 0.587324 0.711476 0.852739 0.885616
7 RandomOneShotAgent 0.657930 30.0 0.657930 0.185348 0.336662 0.517957 0.717340 0.780287 0.954094
6 NaiveSyncAgent 0.578200 30.0 0.578200 0.288741 0.087481 0.308480 0.600469 0.810890 1.034915

This is almost the highest score we got so far even though that agent is not that intelligent in its decision making. It is roughly on-bar with our SimpleAgent. Let’s check it in details:

The main idea is to generate offers that will (assuming all accepted) give us all the quantity we need (to buy/sell). Moreover, we accept a set of offers if the total quantity they provide is within some small margin from the quantity we need.

helpers

We have a helper helper function (_step_and_price) to return the current step and either the best or a good-enough price.

The core computation of the agent is implemented in the distribute_needs() method which is responsible of calculating a quantity for each partner (notice that price is completely ignored here).

We treat suppliers and consumers independently here by looping twice once for each:

for needs, all_partners in [
    (self.awi.needed_supplies, self.awi.my_suppliers),
    (self.awi.needed_sales, self.awi.my_consumers),
]:
    ...

The process for ditributing my needs is straight forward:

  1. find suppliers and consumers still negotiating with me python     partner_ids = [_ for _ in all_partners if _ in self.negotiators.keys()]     partners = len(partner_ids)

  2. if I need nothing, end all negotiations

    if needs <= 0:
        dist.update(dict(zip(partner_ids, [0] * partners)))
        continue
    
  3. otherwise, distribute my needs randomly using the distribute function defined earler:

    dist.update(dict(zip(partner_ids, distribute(needs, partners))))
    

Now we can move the main part of the agent which consists of the two abstract method implementations (first_proposals and counter_all).

First set of offers

The first set of proposals from the agent use the best price and will distribute the total quantity needed randomly between all partners:

s, p = self._step_and_price(best_price=True)
distribution = self.distribute_needs()

We then just return the quantity for each partner or None to end the negotiation if the quantity was \(0\)

return dict((k, (q, s, p) if q > 0 else None) for k, q in distribution.items())

Countering offers

When receiving offers, we again treat suppliers and consumers independelty:

for needs, all_partners, issues in [
    (
        self.awi.needed_supplies,
        self.awi.my_suppliers,
        self.awi.current_input_issues,
    ),
    (
        self.awi.needed_sales,
        self.awi.my_consumers,
        self.awi.current_output_issues,
    ),
]:
    ...

By treating our suppliers and consumers independetly, our agent can work – in principle – even if it finds itself in the middle of a deep supply chain (i.e. more than two production levels as in SCML-Std). Strictly speaking, this is not necessary for SCML-OneShot but it is a form of future-proofing that we get at a small cost.

When we receive some offers (in counter_all) we start by finding the subset of them (together) that best satisfy our needs up to a predefined threshold (defaulting to zero)

plist = list(powerset(partners))
best_diff, best_indx = float("inf"), -1
for i, partner_ids in enumerate(plist):
    others = partners.difference(partner_ids)
    offered = sum(offers[p][QUANTITY] for p in partner_ids)
    diff = abs(offered - needs)
    if diff < best_diff:
        best_diff, best_indx = diff, i
    if diff == 0:
        break

If the best subset satisfies our needs up to a threshold (set as zero by default), we simply accept all of them ending all other negotiations:

th = self._current_threshold(min([_.relative_time for _ in states.values()]))
if best_diff <= th:
    partner_ids = plist[best_indx]
    others = list(partners.difference(partner_ids))
    response |= {
        k: SAOResponse(ResponseType.ACCEPT_OFFER, offers[k]) for k in partner_ids
    } | {k: SAOResponse(ResponseType.END_NEGOTIATION, None) for k in others}
    continue

Note that we could slightly improve that by only rejecting the remaining offers and offering whatever we still need to buy/sell to them when the threshold is nonezero and the best subset has a total quantity less than our needs. This may improve our results slightly but will complicate the code so we do not pursue it in this tutorial.

If the best subset does not satisfy our needs up to the predefined threshold, we simply ignore all offers and generate a new random offer for our partners:

distribution = self.distribute_needs()
return {
    k: SAOResponse(ResponseType.END_NEGOTIATION, None)
    if q == 0
    else SAOResponse(ResponseType.REJECT_OFFER, (q, s, p))
    for k, q in distribution.items()
}

Note that we simply end the negotiation with some partners (selected randomly) if our needs are less than the number of our partners (see ``distribute_needs()``.

Possible Improvements

There are obvious ways to improve this agent:

  1. When countering offers, we should take into account the history of negotiation with each partner (in this round and previously) to make a more meaningful distribution of quantities over partners. Currently this is just random. We should also consider the probability that our offers will be accepted when deciding how to distribute the quantity we still need over our partners.

  2. Choosing which negotiators to end the negotiation with when we need a small quantity to buy/sell, is currently random. We could try to find a way to only end negotiation with negotiators least likely to provide us with our remaining needs.

  3. As indicated earlier, we should not just end the negotiation with all unselected partners when we accept some subset of the offers if the threshold was nonzero and the total quantity we are accepting is not enough to satisfy our needs.

  4. We should take the number of rounds remiaining in the negotiation when deciding whether to accept a subset of offers (e.g. have a higher threshold near the end of the negotiation), and when deciding what quantities to distribute over our partners (e.g. offer more than what we need near the end of the negotiation under the assumption that only some of them will be accepted).

  5. May be consider prices more when approaching our total needs.

Comparing all agents

We can now summarize the results of comparing all agents developed so far and while we are at it, compare them with three built-in agents in the scml package:

for t in (RandDistOneShotAgent, EqualDistOneShotAgent, GreedySyncAgent):
    single_agent_runner(t)
single_agent_runner.plot_stats(notch=True);
../_images/02.develop_agent_scml2024_oneshot_69_0.png

or in more details:

single_agent_runner.plot_stats(agg=False, stats="score", legend_ncols=4, ylegend=1.4);
../_images/02.develop_agent_scml2024_oneshot_71_0.png

You can easily notice that our SimpleAgent is actually hard to beat. No built-in agents can actually beat it. The one that comes closes is `RandDistOneShotAgent <https://scml.readthedocs.io/en/latest/autoapi/scml/oneshot/agents/rand/index.html#scml.oneshot.agents.rand.RandDistOneShotAgent>`__.

The way we just compared these agents is unbiased because all agents are allowed to control the same factories in the same simulation envoironment. Nevertheless, it is not the exact method used in the ANAC competition. The best way to compare these agents is to run a tournament between them. You already learned how to do that in the previous tutorial and we will not repeate it here.

If you are running this notebook, please note that the tournament running methods ``anac2024_*`` may not work within a notebook environment. You can just move your code to a normal python script and it will run correctly

Just out of curiousity, let’s see how do these agents compare against each other if they are allowed to control the whole market instead of a single agent:

full_market_runner = WorldRunner.from_runner(
    single_agent_runner, control_all_agents=True
)
for a in (
    BetterSyncAgent,
    SimpleAgent,
    GenerousAgent,
    BetterAgent,
    AdaptiveAgent,
    MyOneShotDoNothing,
    MySyncOneShotDoNothing,
    NaiveSyncAgent,
    RandDistOneShotAgent,
    EqualDistOneShotAgent,
    GreedySyncAgent,
):
    full_market_runner(a)
full_market_runner.plot_stats();
../_images/02.develop_agent_scml2024_oneshot_74_0.png
full_market_runner.score_summary()
type score count mean std min 25% 50% 75% max
4 GenerousAgent 1.070493 345.0 1.070493 0.135989 0.536034 1.032085 1.085166 1.142087 1.300736
10 SimpleAgent 1.068242 345.0 1.068242 0.139472 0.536031 1.028666 1.080434 1.147335 1.294385
9 RandDistOneShotAgent 1.027933 345.0 1.027933 0.093962 0.712678 0.965100 1.032097 1.100143 1.242283
3 EqualDistOneShotAgent 1.024942 345.0 1.024942 0.096280 0.706985 0.975430 1.034344 1.090284 1.249036
2 BetterSyncAgent 1.008757 345.0 1.008757 0.091677 0.710329 0.952090 1.021285 1.072802 1.228448
1 BetterAgent 0.931797 345.0 0.931797 0.264753 -0.006554 0.720905 1.023227 1.139277 1.353749
0 AdaptiveAgent 0.931182 345.0 0.931182 0.267260 0.002970 0.721272 1.015031 1.145970 1.358184
8 NaiveSyncAgent 0.737871 345.0 0.737871 0.181944 0.282766 0.608280 0.696524 0.886265 1.053605
5 GreedySyncAgent 0.726222 345.0 0.726222 0.128911 0.251198 0.642664 0.723267 0.824934 1.008826
7 MySyncOneShotDoNothing 0.628966 345.0 0.628966 0.156728 0.170207 0.516510 0.637839 0.740412 0.941539
6 MyOneShotDoNothing 0.628928 345.0 0.628928 0.156759 0.172867 0.516478 0.642465 0.740913 0.942300

You can find all the agents available in the scml package for the one-shot game under scml.oneshot.agents including the ones developed in this tutorial (with some modifications):

import scml.oneshot.agents as agents

print([_ for _ in agents.__dir__() if _.endswith("Agent")])
[
    'SingleAgreementAspirationAgent',
    'GreedyOneShotAgent',
    'GreedySyncAgent',
    'GreedySingleAgreementAgent',
    'OneshotDoNothingAgent',
    'RandomOneShotAgent',
    'RandDistOneShotAgent',
    'EqualDistOneShotAgent',
    'SyncRandomOneShotAgent',
    'SingleAgreementRandomAgent'
]

Running against winners from previous years

You can compare your agent against any agents previously submitted to SCML (same track). To do that, you need to install the scml-agents package from pip:

pip install scml-agents

You can then get agents using the get_agents() function from this package:

from scml_agents import get_agents

winners = [
    get_agents(y, track="oneshot", winners_only=True, as_class=True)[0]
    for y in (2021, 2022, 2023)
]
print(winners)
[
    <class 'scml_agents.scml2021.oneshot.team_86.agent112.Agent112'>,
    <class 'scml_agents.scml2022.oneshot.team_134.agent119.PatientAgent'>,
    <class 'scml_agents.scml2023.oneshot.team_poli_usp.quantity_oriented_agent.QuantityOrientedAgent'>
]

Let’s add them to the mix

for t in winners:
    single_agent_runner(t)
single_agent_runner.score_summary()
type score count mean std min 25% 50% 75% max
12 RandDistOneShotAgent 1.050543 30.0 1.050543 0.056464 0.943806 1.009891 1.043736 1.083475 1.149113
4 EqualDistOneShotAgent 1.050253 30.0 1.050253 0.063765 0.941294 1.010509 1.041863 1.093314 1.157501
3 BetterSyncAgent 1.045027 30.0 1.045027 0.058560 0.954717 1.003720 1.039322 1.076773 1.155156
11 QuantityOrientedAgent 0.991510 30.0 0.991510 0.137315 0.635187 0.942474 1.033218 1.056788 1.175083
10 PatientAgent 0.909359 30.0 0.909359 0.237183 0.306590 0.832855 0.951183 1.062123 1.181927
14 SimpleAgent 0.905806 30.0 0.905806 0.243713 0.256752 0.857049 0.969827 1.038091 1.176166
5 GenerousAgent 0.850194 30.0 0.850194 0.240858 0.260795 0.819196 0.886290 0.989907 1.171139
0 AdaptiveAgent 0.756491 30.0 0.756491 0.367015 0.064872 0.318639 0.897773 1.008595 1.181927
6 GreedySyncAgent 0.753260 30.0 0.753260 0.198388 0.198635 0.708680 0.837813 0.858390 0.951864
1 Agent112 0.742568 30.0 0.742568 0.358775 0.065792 0.334495 0.870720 1.025299 1.181927
2 BetterAgent 0.742522 30.0 0.742522 0.355723 0.087694 0.320086 0.871921 0.978388 1.181927
8 MySyncOneShotDoNothing 0.717437 30.0 0.717437 0.128590 0.518223 0.587364 0.712251 0.853799 0.885967
7 MyOneShotDoNothing 0.716924 30.0 0.716924 0.128240 0.518197 0.587324 0.711476 0.852739 0.885616
13 RandomOneShotAgent 0.657930 30.0 0.657930 0.185348 0.336662 0.517957 0.717340 0.780287 0.954094
9 NaiveSyncAgent 0.578200 30.0 0.578200 0.288741 0.087481 0.308480 0.600469 0.810890 1.034915
single_agent_runner.plot_stats(stats="score");
../_images/02.develop_agent_scml2024_oneshot_83_0.png

QuantityOrientedAgent, the winner of SCML 2023 OneShot is the best performing agent followed by PatientAgent winner of the SCML 2022 OneShot competition. Nevertheless, the differences between these agens and our SimpleAgent Can you beat them? The next tutorial explains how to try to achieve that using Reinforcement Learning but now you have enough information to build your own agent for SCML OneShot. Happy hacking

Download Notebook.