Skip to content

Commit

Permalink
Merge pull request #10 from sgrvinod/0.3.0
Browse files Browse the repository at this point in the history
0.3.0
  • Loading branch information
sgrvinod authored Aug 17, 2024
2 parents 1dcf110 + 34d648f commit 30ebfa1
Show file tree
Hide file tree
Showing 42 changed files with 72,744 additions and 107 deletions.
23 changes: 22 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
# Change Log

## v0.3.0

### Added

* There are 3 new datasets: [ML23c](https://github.com/sgrvinod/chess-transformers#ml23c), [GC22c](https://github.com/sgrvinod/chess-transformers#gc22c), and [ML23d](https://github.com/sgrvinod/chess-transformers#ml23d).
* A new naming convention for datasets is used. Datasets are now named in the format "[*PGN Fileset*][*Filters*]". For example, *LE1222* is now called [*LE22ct*](https://github.com/sgrvinod/chess-transformers#le22ct), where *LE22* is the name of the PGN fileset from which this dataset was derived, and "*c*", "*t*" are filters for games that ended in checkmates and games that used a specific time control respectively.
* [*CT-EFT-85*](https://github.com/sgrvinod/chess-transformers#ct-eft-85) is a new trained model with about 85 million parameters.
* **`chess_transformers.train.utils.get_lr()`** now accepts new arguments, `schedule` and `decay`, to accomodate a new learning rate schedule: exponential decay after warmup.
* **`chess_transformers.data.prepare_data()`** now handles errors where there is a mismatch between the number of moves and the number of FENs, or when the recorded result in the PGN file was incorrect. Such games are now reported and excluded during dataset creation.

### Changed

* The *LE1222* and *LE1222x* datasets are now renamed to [*LE22ct*](https://github.com/sgrvinod/chess-transformers#le22ct) and [*LE22c*](https://github.com/sgrvinod/chess-transformers#le22c) respectively.
* All calls to **`chess_transformers.train.utils.get_lr()`** now use the `schedule` and `decay` arguments, even in cases where a user-defined decay is not required.
* **`chess_transformers.train.datasets.ChessDatasetFT`** was optimized for large datasets. A list of indices for the data split is no longer maintained or indexed in the dataset.
* Dependencies in [**`setup.py`**](https://github.com/sgrvinod/chess-transformers/blob/main/setup.py) have been updated to newer versions.
* Fixed an error in **`chess_transformers.play.model_v_model()`** where a move would be attempted by the model playing black even after white won the game with a checkmate.
* Fixed the `EVAL_GAMES_FOLDER` parameter in the model configuration files pointing to the incorrect folder name **`chess_transformers/eval`** instead of **`chess_transformers/evaluate`**.
* Fixed an error in **`chess_transformers.evaluate.metrics.elo_delta_margin()`** where the upper limit of the winrate for the confidence interval was not capped at a value of 1.
* All calls to `torch.load()` now use `weights_only=True` in compliance with its updated API.

## v0.2.1

### Changed
Expand All @@ -13,7 +34,7 @@
* **`ChessTransformerEncoderFT`** is an encoder-only transformer that predicts source (*From*) and destination squares (*To*) squares for the next half-move, instead of the half-move in UCI notation.
* [*CT-EFT-20*](https://github.com/sgrvinod/chess-transformers#ct-eft-20) is a new trained model of this type with about 20 million parameters.
* **`ChessDatasetFT`** is a PyTorch dataset class for this model type.
* [**`chess_transformer.data.levels`**](https://github.com/sgrvinod/chess-transformers/blob/main/chess_transformers/data/levels.py) provides a standardized vocabulary (with indices) for oft-used categorical variables. All models and datasets will hereon use this standard vocabulary instead of a dataset-specific vocabulary.
* [**`chess_transformers.data.levels`**](https://github.com/sgrvinod/chess-transformers/blob/main/chess_transformers/data/levels.py) provides a standardized vocabulary (with indices) for oft-used categorical variables. All models and datasets will hereon use this standard vocabulary instead of a dataset-specific vocabulary.

### Changed

Expand Down
195 changes: 165 additions & 30 deletions README.md

Large diffs are not rendered by default.

21 changes: 21 additions & 0 deletions chess_transformers/configs/data/GC22c.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import os

###############################
############ Name #############
###############################

NAME = "GC22c" # name and identifier for this configuration

###############################
############ Data #############
###############################

DATA_FOLDER = (
os.path.join(os.environ.get("CT_DATA_FOLDER"), NAME)
if os.environ.get("CT_DATA_FOLDER")
else None
) # folder containing all data files
H5_FILE = NAME + ".h5" # H5 file containing data
MAX_MOVE_SEQUENCE_LENGTH = 10 # expected maximum length of move sequences
EXPECTED_ROWS = 27000000 # expected number of rows, approximately, in the data
VAL_SPLIT_FRACTION = 0.95 # marker (% into the data) where the validation split begins
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
############ Name #############
###############################

NAME = "LE1222x" # name and identifier for this configuration
NAME = "LE22c" # name and identifier for this configuration

###############################
############ Data #############
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
############ Name #############
###############################

NAME = "LE1222" # name and identifier for this configuration
NAME = "LE22ct" # name and identifier for this configuration

###############################
############ Data #############
Expand Down
21 changes: 21 additions & 0 deletions chess_transformers/configs/data/ML23c.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import os

###############################
############ Name #############
###############################

NAME = "ML23c" # name and identifier for this configuration

###############################
############ Data #############
###############################

DATA_FOLDER = (
os.path.join(os.environ.get("CT_DATA_FOLDER"), NAME)
if os.environ.get("CT_DATA_FOLDER")
else None
) # folder containing all data files
H5_FILE = NAME + ".h5" # H5 file containing data
MAX_MOVE_SEQUENCE_LENGTH = 10 # expected maximum length of move sequences
EXPECTED_ROWS = 11000000 # expected number of rows, approximately, in the data
VAL_SPLIT_FRACTION = 0.925 # marker (% into the data) where the validation split begins
22 changes: 22 additions & 0 deletions chess_transformers/configs/data/ML23d.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import os

###############################
############ Name #############
###############################

NAME = "ML23d" # name and identifier for this configuration

###############################
############ Data #############
###############################

DATA_FOLDER = (
os.path.join(os.environ.get("CT_DATA_FOLDER"), NAME)
if os.environ.get("CT_DATA_FOLDER")
else None
) # folder containing all data files
H5_FILE = NAME + ".h5" # H5 file containing data
MAX_MOVE_SEQUENCE_LENGTH = 10 # expected maximum length of move sequences
EXPECTED_ROWS = 170000000 # expected number of rows, approximately, in the data
VAL_SPLIT_FRACTION = 0.98 # marker (% into the data) where the validation split begins
ADD_LOSS_TOKEN = False
2 changes: 1 addition & 1 deletion chess_transformers/configs/data/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__all__ = ["LE1222", "LE1222x"]
__all__ = ["LE22c", "LE22ct", "ML23c", "ML23d", "GC22c"]
16 changes: 11 additions & 5 deletions chess_transformers/configs/models/CT-E-20.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pathlib

from chess_transformers.train.utils import get_lr
from chess_transformers.configs.data.LE1222 import *
from chess_transformers.configs.data.LE22ct import *
from chess_transformers.configs.other.stockfish import *
from chess_transformers.train.datasets import ChessDataset
from chess_transformers.configs.other.fairy_stockfish import *
Expand Down Expand Up @@ -64,10 +64,16 @@
PRINT_FREQUENCY = 1 # print status once every so many steps
N_STEPS = 100000 # number of training steps
WARMUP_STEPS = 8000 # number of warmup steps where learning rate is increased linearly; twice the value in the paper, as in the official transformer repo.
STEP = 1 # the step number, start from 1 to prevent math error in the next line
STEP = 1 # the step number, start from 1 to prevent math error in the 'LR' line
LR_SCHEDULE = "vaswani" # the learning rate schedule; see utils.py for learning rate schedule
LR_DECAY = None # the decay rate for 'exp_decay' schedule
LR = get_lr(
step=STEP, d_model=D_MODEL, warmup_steps=WARMUP_STEPS
) # see utils.py for learning rate schedule; twice the schedule in the paper, as in the official transformer repo.
step=STEP,
d_model=D_MODEL,
warmup_steps=WARMUP_STEPS,
schedule=LR_SCHEDULE,
decay=LR_DECAY,
) # see utils.py for learning rate schedule
START_EPOCH = 0 # start at this epoch
BETAS = (0.9, 0.98) # beta coefficients in the Adam optimizer
EPSILON = 1e-9 # epsilon term in the Adam optimizer
Expand Down Expand Up @@ -105,5 +111,5 @@
################################

EVAL_GAMES_FOLDER = str(
pathlib.Path(__file__).parent.parent.parent.resolve() / "eval" / "games" / NAME
pathlib.Path(__file__).parent.parent.parent.resolve() / "evaluate" / "games" / NAME
) # folder where evaluation games are saved in PGN files
16 changes: 11 additions & 5 deletions chess_transformers/configs/models/CT-ED-45.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pathlib

from chess_transformers.train.utils import get_lr
from chess_transformers.configs.data.LE1222 import *
from chess_transformers.configs.data.LE22ct import *
from chess_transformers.configs.other.stockfish import *
from chess_transformers.train.datasets import ChessDataset
from chess_transformers.configs.other.fairy_stockfish import *
Expand Down Expand Up @@ -64,10 +64,16 @@
PRINT_FREQUENCY = 1 # print status once every so many steps
N_STEPS = 100000 # number of training steps
WARMUP_STEPS = 8000 # number of warmup steps where learning rate is increased linearly; twice the value in the paper, as in the official transformer repo.
STEP = 1 # the step number, start from 1 to prevent math error in the next line
STEP = 1 # the step number, start from 1 to prevent math error in the 'LR' line
LR_SCHEDULE = "vaswani" # the learning rate schedule; see utils.py for learning rate schedule
LR_DECAY = None # the decay rate for 'exp_decay' schedule
LR = get_lr(
step=STEP, d_model=D_MODEL, warmup_steps=WARMUP_STEPS
) # see utils.py for learning rate schedule; twice the schedule in the paper, as in the official transformer repo.
step=STEP,
d_model=D_MODEL,
warmup_steps=WARMUP_STEPS,
schedule=LR_SCHEDULE,
decay=LR_DECAY,
) # see utils.py for learning rate schedule
START_EPOCH = 0 # start at this epoch
BETAS = (0.9, 0.98) # beta coefficients in the Adam optimizer
EPSILON = 1e-9 # epsilon term in the Adam optimizer
Expand Down Expand Up @@ -105,5 +111,5 @@
################################

EVAL_GAMES_FOLDER = str(
pathlib.Path(__file__).parent.parent.parent.resolve() / "eval" / "games" / NAME
pathlib.Path(__file__).parent.parent.parent.resolve() / "evaluate" / "games" / NAME
) # folder where evaluation games are saved in PGN files
16 changes: 11 additions & 5 deletions chess_transformers/configs/models/CT-EFT-20.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pathlib

from chess_transformers.train.utils import get_lr
from chess_transformers.configs.data.LE1222 import *
from chess_transformers.configs.data.LE22ct import *
from chess_transformers.configs.other.stockfish import *
from chess_transformers.train.datasets import ChessDatasetFT
from chess_transformers.configs.other.fairy_stockfish import *
Expand Down Expand Up @@ -64,10 +64,16 @@
PRINT_FREQUENCY = 1 # print status once every so many steps
N_STEPS = 100000 # number of training steps
WARMUP_STEPS = 8000 # number of warmup steps where learning rate is increased linearly; twice the value in the paper, as in the official transformer repo.
STEP = 1 # the step number, start from 1 to prevent math error in the next line
STEP = 1 # the step number, start from 1 to prevent math error in the 'LR' line
LR_SCHEDULE = "vaswani" # the learning rate schedule; see utils.py for learning rate schedule
LR_DECAY = None # the decay rate for 'exp_decay' schedule
LR = get_lr(
step=STEP, d_model=D_MODEL, warmup_steps=WARMUP_STEPS
) # see utils.py for learning rate schedule; twice the schedule in the paper, as in the official transformer repo.
step=STEP,
d_model=D_MODEL,
warmup_steps=WARMUP_STEPS,
schedule=LR_SCHEDULE,
decay=LR_DECAY,
) # see utils.py for learning rate schedule
START_EPOCH = 0 # start at this epoch
BETAS = (0.9, 0.98) # beta coefficients in the Adam optimizer
EPSILON = 1e-9 # epsilon term in the Adam optimizer
Expand Down Expand Up @@ -105,5 +111,5 @@
################################

EVAL_GAMES_FOLDER = str(
pathlib.Path(__file__).parent.parent.parent.resolve() / "eval" / "games" / NAME
pathlib.Path(__file__).parent.parent.parent.resolve() / "evaluate" / "games" / NAME
) # folder where evaluation games are saved in PGN files
118 changes: 118 additions & 0 deletions chess_transformers/configs/models/CT-EFT-85.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
import torch
import pathlib

from chess_transformers.train.utils import get_lr
from chess_transformers.configs.data.LE22c import *
from chess_transformers.configs.other.stockfish import *
from chess_transformers.train.datasets import ChessDatasetFT
from chess_transformers.configs.other.fairy_stockfish import *
from chess_transformers.transformers.criteria import LabelSmoothedCE
from chess_transformers.data.levels import TURN, PIECES, UCI_MOVES, BOOL
from chess_transformers.transformers.models import ChessTransformerEncoderFT


###############################
############ Name #############
###############################

NAME = "CT-EFT-85" # name and identifier for this configuration

###############################
######### Dataloading #########
###############################

DATASET = ChessDatasetFT # custom PyTorch dataset
BATCH_SIZE = 512 # batch size
NUM_WORKERS = 8 # number of workers to use for dataloading
PREFETCH_FACTOR = 2 # number of batches to prefetch per worker
PIN_MEMORY = False # pin to GPU memory when dataloading?

###############################
############ Model ############
###############################

VOCAB_SIZES = {
"moves": len(UCI_MOVES),
"turn": len(TURN),
"white_kingside_castling_rights": len(BOOL),
"white_queenside_castling_rights": len(BOOL),
"black_kingside_castling_rights": len(BOOL),
"black_queenside_castling_rights": len(BOOL),
"board_position": len(PIECES),
} # vocabulary sizes
D_MODEL = 768 # size of vectors throughout the transformer model
N_HEADS = 12 # number of heads in the multi-head attention
D_QUERIES = 64 # size of query vectors (and also the size of the key vectors) in the multi-head attention
D_VALUES = 64 # size of value vectors in the multi-head attention
D_INNER = 4 * D_MODEL # an intermediate size in the position-wise FC
N_LAYERS = 12 # number of layers in the Encoder and Decoder
DROPOUT = 0.2 # dropout probability
N_MOVES = 1 # expected maximum length of move sequences in the model, <= MAX_MOVE_SEQUENCE_LENGTH
DISABLE_COMPILATION = False # disable model compilation?
COMPILATION_MODE = "default" # mode of model compilation (see torch.compile())
DYNAMIC_COMPILATION = True # expect tensors with dynamic shapes?
SAMPLING_K = 1 # k in top-k sampling model predictions during play
MODEL = ChessTransformerEncoderFT # custom PyTorch model to train

###############################
########### Training ##########
###############################

BATCHES_PER_STEP = (
4 # perform a training step, i.e. update parameters, once every so many batches
)
PRINT_FREQUENCY = 1 # print status once every so many steps
N_STEPS = 500000 # number of training steps
WARMUP_STEPS = 8000 # number of warmup steps where learning rate is increased linearly; twice the value in the paper, as in the official transformer repo.
STEP = 1 # the step number, start from 1 to prevent math error in the 'LR' line
LR_SCHEDULE = "exp_decay" # the learning rate schedule; see utils.py for learning rate schedule
LR_DECAY = 0.06 # the decay rate for 'exp_decay' schedule
LR = get_lr(
step=STEP,
d_model=D_MODEL,
warmup_steps=WARMUP_STEPS,
schedule=LR_SCHEDULE,
decay=LR_DECAY,
) # see utils.py for learning rate schedule
START_EPOCH = 0 # start at this epoch
BETAS = (0.9, 0.98) # beta coefficients in the Adam optimizer
EPSILON = 1e-9 # epsilon term in the Adam optimizer
LABEL_SMOOTHING = 0.1 # label smoothing co-efficient in the Cross Entropy loss
BOARD_STATUS_LENGTH = 70 # total length of input sequence
USE_AMP = True # use automatic mixed precision training?
CRITERION = LabelSmoothedCE # training criterion (loss)
OPTIMIZER = torch.optim.Adam # optimizer
LOGS_FOLDER = str(
pathlib.Path(__file__).parent.parent.parent.resolve() / "train" / "logs" / NAME
) # logs folder

###############################
######### Checkpoints #########
###############################

CHECKPOINT_FOLDER = str(
pathlib.Path(__file__).parent.parent.parent.resolve() / "checkpoints" / NAME
) # folder containing checkpoints
TRAINING_CHECKPOINT = None # path to model checkpoint to resume training, None if none
AVERAGE_STEPS = {491000, 492500, 494000, 495500, 497000, 498500, 500000}
CHECKPOINT_AVG_PREFIX = (
"step" # prefix to add to checkpoint name when saving checkpoints for averaging
)
CHECKPOINT_AVG_SUFFIX = (
".pt" # checkpoint end string to match checkpoints saved for averaging
)
FINAL_CHECKPOINT = (
"averaged_" + NAME + ".pt"
) # final checkpoint to be used for eval/inference
FINAL_CHECKPOINT_GDID = (
"1OHtg336ujlOjp5Kp0KjE1fAPF74aZpZD" # Google Drive ID for download
)


################################
########## Evaluation ##########
################################

EVAL_GAMES_FOLDER = str(
pathlib.Path(__file__).parent.parent.parent.resolve() / "evaluate" / "games" / NAME
) # folder where evaluation games are saved in PGN files
2 changes: 1 addition & 1 deletion chess_transformers/configs/models/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__all__ = ["CT-E-19", "CT-ED-45", "CT-EFT-20"]
__all__ = ["CT-E-19", "CT-ED-45", "CT-EFT-20", "CT-EFT-85"]
Loading

0 comments on commit 30ebfa1

Please sign in to comment.