Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pit.py policy #4

Open
Bobingstern opened this issue Oct 23, 2022 · 3 comments
Open

Pit.py policy #4

Bobingstern opened this issue Oct 23, 2022 · 3 comments

Comments

@Bobingstern
Copy link

In pit.py inside the function n1p, shouldn't the return value be the argmax of the policy rather than a random choice?

@bhansconnect
Copy link
Owner

bhansconnect commented Oct 24, 2022

If you use the argmax, two agents would play the exact same game over and over again. So it wouldn't be a good benchmark of their performance.

There are a few options that do work instead:

  • start with some sort of opening book to get the agents to a unique sub game before using argmax for the rest of the turns
  • play totally randomly for the first few turns before switching to the argmax agent.
  • play with some sort of randomness using temperature to make the agents generally pick better moved. Especially towards the end of the game. (This is what is happening in this repo). The agents will still be heavily weighted to the best move.

Extra note: a more aggressive temperature setting is used in pit compared with training. So less exploring and more exploiting.

@Bobingstern
Copy link
Author

Ah, so it's used for benchmarking. I assume that if you were to deploy it against a human you would use argmax then correct?

@bhansconnect
Copy link
Owner

Oh, actually, I haven't look at this version of the repo in a while. pit-multi is for benchmarking. pit is for single game tests. So I guess to be optimal against a human you would use argmax. Though you may still want a little bit of randomness at the begining in some games. Otherwise, it might keep going for the same opening. Which could get dull.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants