$n$-LIPO: Framework for Diverse Cooperative Agent Generation Using Policy Compatibility

Rujikorn Charakorn; Poramate Manoonpong; Nat Dilokthanakul

doi:10.1109/tai.2025.3566067

$n$-LIPO: Framework for Diverse Cooperative Agent Generation Using Policy Compatibility

Date

2025-5-1

Authors

Rujikorn Charakorn

Poramate Manoonpong

Nat Dilokthanakul

Publisher

IEEE Transactions on Artificial Intelligence

Abstract

Diverse training partners in multi-agent tasks are crucial for training a robust and adaptable cooperative agent. Prior methods often rely on state-action information to diversify partners’ behaviors, but this can lead to minor variations instead of diverse behaviors and solutions. We address this limitation by introducing a novel training objective based on “policy compatibility.” Our method learns diverse behaviors by encouraging agents within a team to be compatible with each other while being incompatible with agents from other teams. We theoretically prove that incompatible policies are inherently dissimilar, allowing us to use policy compatibility as a proxy for diversity. We call this method <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Learning Incompatible Policies for n-Player Cooperative Games (<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n-LIPO). We propose to further diversify individual policies by incorporating a mutual information objective using state-action information. We empirically demonstrate that <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n-LIPO effectively generates diverse joint policies in various two-player and multi-player cooperative environments. In a complex cooperative task, two-player multi-recipe Overcooked, we find that <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n-LIPO generates a population of behaviorally diverse partners. These populations are then used to train robust generalist agents that can generalize better than using baseline populations. Finally, we demonstrate that <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n-LIPO can be applied to a high-dimensional StarCraft Multi-Agent Challenge (SMAC) multi-player cooperative environment to discover diverse winning strategies when only a single goal exists. Additional visualization can also be accessed at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://sites.google.com/view/n-lipo/home</uri>.