Rezumat articol ediţie STUDIA UNIVERSITATIS BABEŞ-BOLYAI

În partea de jos este prezentat rezumatul articolului selectat. Pentru revenire la cuprinsul ediţiei din care face parte acest articol, se accesează linkul din titlu. Pentru vizualizarea tuturor articolelor din arhivă la care este autor/coautor unul din autorii de mai jos, se accesează linkul din numele autorului.

 
       
         
    STUDIA INFORMATICA - Ediţia nr.2 din 2021  
         
  Articol:   DEEP REINFORCEMENT LEARNING FROM SELF-PLAY IN NO-LIMIT TEXAS HOLD’EM POKER.

Autori:  TIDOR-VLAD PRICOPE.
 
       
         
  Rezumat:  
DOI: 10.24193/subbi.2021.2.04

Published Online: 2021-12-20
Published Print: 2021-12-30
pp. 51-68

VIEW PDF


FULL PDF

Imperfect information games describe many practical applications found in the real world as the information space is rarely fully available. This particular set of problems is challenging due to the random factor that makes even adaptive methods fail to correctly model the problem and find the best solution. Neural Fictitious Self Play (NFSP) is a powerful algorithm for learning approximate Nash equilibrium of imperfect information games from self-play. However, it uses only crude data as input and its most successful experiment was on the in-limit version of Texas Hold’em Poker. In this paper, we develop a new variant of NFSP that combines the established fictitious self-play with neural gradient play in an attempt to improve the performance on large-scale zero-sum imperfect information games and to solve the more complex no-limit version of Texas Hold’em Poker, using powerful handcrafted metrics and heuristics alongside crude, raw data. When applied to no-limit Hold’em Poker, the agents trained through self-play outperformed the ones that used fictitious play with a normal-form single-step approach to the game. Moreover, we showed that our algorithm converges close to a Nash equilibrium within the limited training process of our agents with very limited hardware. Finally, our best self-play-based agent learnt a strategy that rivals expert human level.

Keywords and phrases: Artificial Intelligence, Computer Poker, Adaptive Learning, Fictitious Play, Self-Play, Deep Reinforcement Learning, Neural Networks.

2010 Mathematics Subject Classification. 68T05.

1998 CR Categories and Descriptors. I.2.1 [Artificial Intelligence]: Learning – Applications and Expert Systems - Games.
 
         
     
         
         
      Revenire la pagina precedentă