Skip to content
Snippets Groups Projects
Commit 043e61fd authored by Jean-Christophe Routier's avatar Jean-Christophe Routier
Browse files

fix some typos

parent 3053bce7
Branches
No related tags found
No related merge requests found
doc/report/figures/dcop_taxo_fioretto.png

51 KiB | W: | H:

doc/report/figures/dcop_taxo_fioretto.png

81.3 KiB | W: | H:

doc/report/figures/dcop_taxo_fioretto.png
doc/report/figures/dcop_taxo_fioretto.png
doc/report/figures/dcop_taxo_fioretto.png
doc/report/figures/dcop_taxo_fioretto.png
  • 2-up
  • Swipe
  • Onion skin
doc/report/figures/dcop_taxo_fioretto_old.png

51 KiB

...@@ -199,7 +199,7 @@ Jean-Christophe \textsc{Routier} \\ ...@@ -199,7 +199,7 @@ Jean-Christophe \textsc{Routier} \\
DCOPs are a multi-agent approach to optimisation problems. Among existing DCOP algorithms, Maximum-Gain-Message (MGM) and its 2-coordinated counterpart MGM-2 are popular options. DCOPs are a multi-agent approach to optimisation problems. Among existing DCOP algorithms, Maximum-Gain-Message (MGM) and its 2-coordinated counterpart MGM-2 are popular options.
Other similar algorithms such as DSA which is considered as a stochastic variant of MGM \cite{fioretto18jair} and BE-Rebid \cite{taylor2010} which extends MGM by calculating and communicating expected gain \cite{fioretto18jair} also exist. Other similar algorithms such as DSA which is considered as a stochastic variant of MGM \cite{fioretto18jair} and BE-Rebid \cite{taylor2010} which extends MGM by calculating and communicating expected gain \cite{fioretto18jair} also exist.
Several open-source implementations of MGM exist in Python with the \href{https://github.com/Orange-OpenSource/pyDcop}{pyDCOP framework} developped by \cite{rust19} and Java with the \href{https://api.frodo-ai.tech/d3/d4c/a01724.html}{Frodo project} offered by \cite{Leaute2009}. Several open-source implementations of MGM exist in Python with the \href{https://github.com/Orange-OpenSource/pyDcop}{pyDCOP framework} developped by \cite{rust19} and Java with the \href{https://api.frodo-ai.tech/d3/d4c/a01724.html}{Frodo project} offered by \cite{Leaute2009}.
Though not necessarily always the best performers, MGM and its variants are deemed robust and efficient algorithms on averag, making them suitable benchmark options \cite{fioretto18jair}. Though not necessarily always the best performers, MGM and its variants are deemed robust and efficient algorithms on average, making them suitable benchmark options \cite{fioretto18jair}.
We hereby present the MGM and MGM-2 algorithms and detail two toy-examples to facilitate their analysis. We hereby present the MGM and MGM-2 algorithms and detail two toy-examples to facilitate their analysis.
\include{pb} \include{pb}
......
...@@ -10,7 +10,7 @@ There are at least as many variables as agents and possibly more variables than ...@@ -10,7 +10,7 @@ There are at least as many variables as agents and possibly more variables than
However in most DCOPs, control of a single variable by a single agent is assumed. However in most DCOPs, control of a single variable by a single agent is assumed.
The optimal solution is the minimum/maximum of the global objective function. The optimal solution is the minimum/maximum of the global objective function.
Each variable has a domain of values it can take, this domain is only known to the agent in control of the said variable. Each variable has a domain of values it can take, this domain is only known to the agent in control of the said variable.
However the value of a variable is known to an agent's neighbours. However the value of a variable is known to an agent's neighbour.
The notion of neighbourhood is key because DCOPs rely on locality. The notion of neighbourhood is key because DCOPs rely on locality.
Therefore each agent can only communicate with its neighbours and only knows about cost functions which involve at least one of the variables it controls. Therefore each agent can only communicate with its neighbours and only knows about cost functions which involve at least one of the variables it controls.
...@@ -33,7 +33,7 @@ These are typically combinatorial problems \cite{fioretto18jair} ...@@ -33,7 +33,7 @@ These are typically combinatorial problems \cite{fioretto18jair}
\end{definition} \end{definition}
COPs -sometimes called Weighted Constraint Satisfaction Problems- take a step further when compared with CSPs. COPs -sometimes called Weighted Constraint Satisfaction Problems- take a step further when compared with CSPs.
Here the solution is not binary simply stating whether constraint can be satisfied or not but results are quatifiable. Here the solution is not binary simply stating whether constraint can be satisfied or not but results are quantifiable.
COPs are akin to CSPs with an additional particular objective function, either a maximisation or a minimisation. In this setting, constraints can either be hard or soft depending on whether respecting them is vital or preferable. COPs are akin to CSPs with an additional particular objective function, either a maximisation or a minimisation. In this setting, constraints can either be hard or soft depending on whether respecting them is vital or preferable.
The objective function can be either a maximisation in the case of rewards or a minimisation in the case of costs. The objective function can be either a maximisation in the case of rewards or a minimisation in the case of costs.
Typically, a constraint which \textit{needs} to be satisfied will be coined \textit{hard} while a constraint which \textit{should} be satisfied will be coined \textit{soft}. Typically, a constraint which \textit{needs} to be satisfied will be coined \textit{hard} while a constraint which \textit{should} be satisfied will be coined \textit{soft}.
...@@ -96,7 +96,7 @@ that a constraint (also called cost function) $f_i$ is satisfied by ...@@ -96,7 +96,7 @@ that a constraint (also called cost function) $f_i$ is satisfied by
$\sigma$ if $f_i(\sigma(x_i)) \neq \bot$. A complete assignment is a $\sigma$ if $f_i(\sigma(x_i)) \neq \bot$. A complete assignment is a
solution of a DCOP if it satisfies all its cost functions. solution of a DCOP if it satisfies all its cost functions.
In the case of anytime algorithms, each assignment is complete and gradually improves over time, which makes it possible to stop the algorithm at ay point. In the case of anytime algorithms, each assignment is complete and gradually improves over time, which makes it possible to stop the algorithm at any point.
The optimization objective is represented by the function $\Obj$, which The optimization objective is represented by the function $\Obj$, which
can be of different nature, either a minimisation or a maximisation. A solution is a complete assignment with can be of different nature, either a minimisation or a maximisation. A solution is a complete assignment with
...@@ -104,12 +104,12 @@ cost different from $\bot$, and an optimal solution is a solution with ...@@ -104,12 +104,12 @@ cost different from $\bot$, and an optimal solution is a solution with
minimal cost (resp. maximal utility). In general, this function is a sum of cost (resp. utility) constraints: minimal cost (resp. maximal utility). In general, this function is a sum of cost (resp. utility) constraints:
$F = \Sigma_i f_i$ ; but some approaches can use other kind of aggregation. $F = \Sigma_i f_i$ ; but some approaches can use other kind of aggregation.
Additionally, in the following Additionally, in the following we consider that
\begin{itemize} \begin{itemize}
\item We considers that $n = m$, \item $n = m$,
i.e. each agent controls only one variable. This restriction is often considered in the literature. i.e. each agent controls only one variable. This restriction is often considered in the literature;
\item Constraints are binary constraints. More precisely, $F$ contains at most one function $f_{ij}$ per pair ${i,j}$. \item constraints are binary constraints. More precisely, $F$ contains at most one function $f_{ij}$ per pair ${i,j}$;
\item There can be as many constraints as needed for each variable. \item there can be as many constraints as needed for each variable.
\end{itemize}. \end{itemize}.
With these assumptions, a DCOP can be easily represented as a graph where vertices are agents (each agent owns one variable) and With these assumptions, a DCOP can be easily represented as a graph where vertices are agents (each agent owns one variable) and
...@@ -120,7 +120,7 @@ edges are binary constraints. Note that since constraints are binary and can onl ...@@ -120,7 +120,7 @@ edges are binary constraints. Note that since constraints are binary and can onl
\begin{itemize} \begin{itemize}
\item $A = \{a_1 , \ldots, a_m \}$ ; \item $A = \{a_1 , \ldots, a_m \}$ ;
\item $X = \{ x_1, \ldots, x_m \}$ ; \item $X = \{ x_1, \ldots, x_m \}$ ;
\item $F$ contains at most one function $f_{ij}(x_i,x_j)$ per pair ${i,j}$, $1 \leq i < j \leq m$ ; To simplify, $f_{ij}(x_i,x_j)$ and $f_{ji}(x_j,x_i)$ denote the same function. \item $F$ contains at most one function $f_{ij}(x_i,x_j)$ per pair ${i,j}$, $1 \leq i < j \leq m$ ; to simplify, $f_{ij}(x_i,x_j)$ and $f_{ji}(x_j,x_i)$ denote the same function.
\item $\alpha(x_i) = a_i$, each agent $a_i$ controls the variable $x_i$ \item $\alpha(x_i) = a_i$, each agent $a_i$ controls the variable $x_i$
\end{itemize} \end{itemize}
...@@ -150,15 +150,15 @@ In the multi-agent domain, DCOPs are classified according to their set of charac ...@@ -150,15 +150,15 @@ In the multi-agent domain, DCOPs are classified according to their set of charac
Generally speaking, most DCOPs rely on deterministic action effects, a cooperative group-behaviour and total knowledge. Generally speaking, most DCOPs rely on deterministic action effects, a cooperative group-behaviour and total knowledge.
MGM and MGM-2 are deterministic both in terms of actions and environment, with cooperative group behaviour sharing incomplete (local) knowledge. MGM and MGM-2 are deterministic both in terms of actions and environment, with cooperative group behaviour sharing incomplete (local) knowledge.
Example of DCOP applications: DCOP is used for different applications, by example:
\begin{itemize} \begin{itemize}
\item Disaster management \item Disaster management
\item Radio frequency allocation problems \item Radio frequency allocation problems
\item Recommendation Systems \item Recommendation Systems
\item Scheduling \item Scheduling
\item Sensor networks \item Sensor networks
\item Service-oriented (cloud, server, power supply) \item Service-oriented management (cloud, server, power supply)
\item Supply Chain \item Supply chain management
\end{itemize} \end{itemize}
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
\subsection{General overview} \subsection{General overview}
DCOPs can be adressed by a series of algorithms descibed in \cite{fioretto18jair} and presented in \ref{fioretto_taxo}. DCOPs can be adressed by a series of algorithms descibed in \cite{fioretto18jair} and presented in Figure~\ref{fioretto_taxo}.
The first distinction made is in terms of solution optimality with complete algorithms guaranteeing the optimal solution to be found while incomplete ones offer no guarantee but have shorter execution times. The first distinction made is in terms of solution optimality with complete algorithms guaranteeing the optimal solution to be found while incomplete ones offer no guarantee but have shorter execution times.
Then, they are classified according to their (lack or presence of) centralisation and synchronicity. Then, they are classified according to their (lack or presence of) centralisation and synchronicity.
Finally, they are divided based on their exploration process, revolving around three main frames, namely search, inference and sampling. Finally, they are divided based on their exploration process, revolving around three main frames, namely search, inference and sampling.
...@@ -21,8 +21,8 @@ In our case, we focus on MGM and its coordinated variant MGM-2 which are incompl ...@@ -21,8 +21,8 @@ In our case, we focus on MGM and its coordinated variant MGM-2 which are incompl
Both MGM and MGM-2 are extensively described in \cite{maheswaran04pdcs}'s paper. Both MGM and MGM-2 are extensively described in \cite{maheswaran04pdcs}'s paper.
In this paper the 2-coordinated algorithm is detailed and hints are given at k-coordinated versions. In this paper the 2-coordinated algorithm is detailed and hints are given at k-coordinated versions.
Historically, MGM evolved from DBA with the difference that there is no change on constraint costs to exit local minima issues and \textbf{no need for DBA's global knowledge of solution quality}. Historically, MGM evolved from DBA with the difference that there is no change on constraint costs to exit local minima issues and no need for DBA's global knowledge of solution quality.
Another algorithm MGM is often compared with is DSA, the difference lies is the guarantees MGM offers when compared with DSA, MGM nevers drops below 0 while DSA might. Another algorithm MGM is often compared with is DSA. The difference lies is the guarantees MGM offers when compared with DSA. MGM's gain nevers drops below 0 while DSA might.
In terms of solution quality, both MGM and MGM-2 are proved to be monotonous, intuitively because since the utility function is the sum of all neighbours' utilities, if one increases, all neighbours' utility consequently increases too. In terms of solution quality, both MGM and MGM-2 are proved to be monotonous, intuitively because since the utility function is the sum of all neighbours' utilities, if one increases, all neighbours' utility consequently increases too.
...@@ -38,19 +38,19 @@ As opposed with coalition scenarios where a manager handles agent's decisions, t ...@@ -38,19 +38,19 @@ As opposed with coalition scenarios where a manager handles agent's decisions, t
A notion of solidarity is necessary here, hinting at a cooperative environment but could be replaced by compensations between agents in a competitive environment. A notion of solidarity is necessary here, hinting at a cooperative environment but could be replaced by compensations between agents in a competitive environment.
In MGM-2, coordinated pairs consider the overall gain they and their partner can achieve, irrespective of whether their own gain is better or not. In MGM-2, coordinated pairs consider the overall gain they and their partner can achieve, irrespective of whether their own gain is better or not.
This is possible because we base the interaction on a cooperative framework. Were it competitive, This is possible because we base the interaction on a cooperative framework. Were it competitive,
A joint action is considered useful if the sum of the 2 partenered agents utility functions increases, even if one of them diminishes. a joint action is considered useful if the sum of the 2 partenered agents utility functions increases, even if one of them diminishes.
\subsubsection{General overview of the algorithms} \subsubsection{General overview of the algorithms}
Globally speaking, the process of MGM could be summed-up as follows: Globally speaking, the process of MGM could be summed-up as follows:
At the beginning of each round, agents inform their neighbours of their current value. at the beginning of each round, agents inform their neighbours of their current value.
Thanks to this information received from each of their neighbours, each agent is then capable of computing the changes it can make to its own value to change its utility taking into account each neighbours' value. Thanks to this information received from each of their neighbours, each agent is then capable of computing the changes it can make to its own value to change its utility taking into account each neighbours' value.
Once this is done, each agent selects the best move he can do and the corresponding gain and informs each of its neighbours of this. Once this is done, each agent selects the best move he can do and the corresponding gain and informs each of its neighbours of this.
Among each neighbourhood, a single agent will be allowed to act, the one having made the best offer of move. Among each neighbourhood, a single agent will be allowed to act, the one having made the best offer of move.
The agent thus selected updates its value and another round can begin. The agent thus selected updates its value and another round can begin.
In MGM-2, the process is slightly more complex since coordinated moves come into play. In MGM-2, the process is slightly more complex since coordinated moves come into play.
The difference starts from the begining of the round where agents are randomly split into 2 sets, Offerers and Receivers. The difference starts from the begining of the round where agents are randomly split into 2 sets, \emph{Offerers} and \emph{Receivers}.
Each set will have a very different behaviour. Each set will have a very different behaviour.
Offerers select a neighbour at random, make an offer to said neighbour and wait for their neighbour's response. Offerers select a neighbour at random, make an offer to said neighbour and wait for their neighbour's response.
If the neighbour declines the offer, they switch back to an MGM-like behaviour where they compute their best solo move (like they would in MGM) and so on. If the neighbour declines the offer, they switch back to an MGM-like behaviour where they compute their best solo move (like they would in MGM) and so on.
...@@ -72,15 +72,15 @@ A No-Go means neither of them will make a move this round and so they move on to ...@@ -72,15 +72,15 @@ A No-Go means neither of them will make a move this round and so they move on to
All variables are in fact agents in a DCOP game, the optimal solution corresponds to a Nash Equilibrium in the specified game. All variables are in fact agents in a DCOP game, the optimal solution corresponds to a Nash Equilibrium in the specified game.
The notion of vicinity is crucial and should be considered fixed once and for all, the agent's neighbours do not vary during the game. The notion of vicinity is crucial and should be considered fixed once and for all, the agent's neighbours do not vary during the game.
A round starts by all agents broadcasting the current value to their vicinity. A round starts by all agents broadcasting the current value to their vicinity.
This means each agent sends one message and receives $\#\text{vicinity}$ messages, each containing one a particular neighbours's current value. This means each agent sends one message and receives $|\text{\emph{vicinity}}|$ messages, each containing one particular neighbours's current value.
At this stage, all agents are now aware of their own value as well as of of their vicinity values. At this stage, all agents are now aware of their own value as well as of of their vicinity values.
Now the stake is to select which agents will be allowed to act, aka modify their value, the set of them will be called M. Now the stake is to select which agents will be allowed to act, aka modify their value, the set of them will be called $M$.
To select said agents, each of them broadcasts a gain message, stating the $\epsilon$ by which it can improve its current local utility value \textit{if} allowed to act. To select said agents, each of them broadcasts a gain message, stating the $\epsilon$ by which it can improve its current local utility value \textit{if} allowed to act.
At this stage, each agent knows the $\epsilon$ by which it can improve but also all its vicinity's $\epsilon$s. At this stage, each agent knows the $\epsilon$ by which it can improve but also all its vicinity's $\epsilon$s.
The winner is simply the one which yields the highest $\epsilon$ (potential improvement). The winner is simply the one which yields the highest $\epsilon$ (potential improvement).
Implementation should take into account possible ties and how to break them. Implementation should take into account possible ties and how to break them.
In the case of MGM-2, it is a pair of agents which gain is highest which are allowed to act. In the case of MGM-2, it is a pair of agents which gain is highest which are allowed to act.
The question of how to determine the highest pair gain for MGM-2 relies on which actions are acceptable or not by agents, for instance i cooperative environments, a joint gain can be considered acceptable even if one of the agents actually looses, in other settings this might not be the case and acceptable actions would in this case be limited to coordinated actions which improve both agent's value. The question of how to determine the highest pair gain for MGM-2 relies on which actions are acceptable or not by agents, for instance in cooperative environments, a joint gain can be considered acceptable even if one of the agents actually looses, in other settings this might not be the case and acceptable actions would in this case be limited to coordinated actions which improve both agent's value.
The winner(s in the case of MGM-2) act(s) and update its/their value(s) consequently. The winner (winners in the case of MGM-2) act(s) and update its/their value(s) consequently.
Both processes go on until the algorithm is stopped, either by having reached the predefined number of cycles or by reaching a NE. Both processes go on until the algorithm is stopped, either by having reached the predefined number of cycles or by reaching a Nash equilibrium.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment