-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pit.py policy #4
Comments
If you use the argmax, two agents would play the exact same game over and over again. So it wouldn't be a good benchmark of their performance. There are a few options that do work instead:
Extra note: a more aggressive temperature setting is used in pit compared with training. So less exploring and more exploiting. |
Ah, so it's used for benchmarking. I assume that if you were to deploy it against a human you would use argmax then correct? |
Oh, actually, I haven't look at this version of the repo in a while. pit-multi is for benchmarking. pit is for single game tests. So I guess to be optimal against a human you would use argmax. Though you may still want a little bit of randomness at the begining in some games. Otherwise, it might keep going for the same opening. Which could get dull. |
In pit.py inside the function n1p, shouldn't the return value be the argmax of the policy rather than a random choice?
The text was updated successfully, but these errors were encountered: