Pit.py policy #4

Bobingstern · 2022-10-23T23:30:52Z

In pit.py inside the function n1p, shouldn't the return value be the argmax of the policy rather than a random choice?

bhansconnect · 2022-10-24T00:11:57Z

If you use the argmax, two agents would play the exact same game over and over again. So it wouldn't be a good benchmark of their performance.

There are a few options that do work instead:

start with some sort of opening book to get the agents to a unique sub game before using argmax for the rest of the turns
play totally randomly for the first few turns before switching to the argmax agent.
play with some sort of randomness using temperature to make the agents generally pick better moved. Especially towards the end of the game. (This is what is happening in this repo). The agents will still be heavily weighted to the best move.

Extra note: a more aggressive temperature setting is used in pit compared with training. So less exploring and more exploiting.

Bobingstern · 2022-10-24T01:29:04Z

Ah, so it's used for benchmarking. I assume that if you were to deploy it against a human you would use argmax then correct?

bhansconnect · 2022-10-24T01:37:35Z

Oh, actually, I haven't look at this version of the repo in a while. pit-multi is for benchmarking. pit is for single game tests. So I guess to be optimal against a human you would use argmax. Though you may still want a little bit of randomness at the begining in some games. Otherwise, it might keep going for the same opening. Which could get dull.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pit.py policy #4

Pit.py policy #4

Bobingstern commented Oct 23, 2022

bhansconnect commented Oct 24, 2022 •

edited

Loading

Bobingstern commented Oct 24, 2022

bhansconnect commented Oct 24, 2022

Pit.py policy #4

Pit.py policy #4

Comments

Bobingstern commented Oct 23, 2022

bhansconnect commented Oct 24, 2022 • edited Loading

Bobingstern commented Oct 24, 2022

bhansconnect commented Oct 24, 2022

bhansconnect commented Oct 24, 2022 •

edited

Loading