Beyond $log^2(T)$ regret for decentralized bandits in matching markets |
We design decentralized algorithms for regret minimization in the two sided matching market with one-sided bandit feedback that significantly improves upon the prior works (Liu et al.\,2020a, Sankararaman et al.\,2020, Liu et al.\,2020b). First, for general markets, for any $\varepsilon > 0$, we design an algorithm that achieves a $O(\log^{1+\varepsilon}(T))$ regret to the agent-optimal stable matching, with unknown time horizon $T$, improving upon the $O(\log^{2}(T))$ regret achieved in (Liu et al.\,2020b). Second, we provide the optimal $\Theta(\log(T))$ agent-optimal regret for markets satisfying {\em uniqueness consistency} – markets where leaving participants don’t alter the original stable matching. Previously, $\Theta(\log(T))$ regret was achievable (Sankararaman et al.\,2020, Liu et al.\,2020b) in the much restricted {\em serial dictatorship} setting, when all arms have the same preference over the agents. We propose a phase based algorithm, where in each phase, besides deleting the globally communicated dominated arms the agents locally delete arms with which they collide often. This \emph{local deletion} is pivotal in breaking deadlocks arising from rank heterogeneity of agents across arms. We further demonstrate superiority of our algorithm over existing works through simulations. |
inproceedings |
Proceedings of Machine Learning Research |
PMLR |
2640-3498 |
basu21a |
0 |
Beyond $log^2(T)$ regret for decentralized bandits in matching markets |
705 |
715 |
705-715 |
705 |
false |
Basu, Soumya and Sankararaman, Karthik Abinav and Sankararaman, Abishek |
|
given |
family |
Karthik Abinav |
Sankararaman |
|
given |
family |
Abishek |
Sankararaman |
|
|
2021-07-01 |
|
Proceedings of the 38th International Conference on Machine Learning |
139 |
inproceedings |
|
|
label |
link |
Supplementary PDF |
|
|
|