Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourcery refactored main branch #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

sourcery-ai[bot]
Copy link

@sourcery-ai sourcery-ai bot commented Aug 9, 2023

Branch main refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the main branch, then run:

git fetch origin sourcery/main
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

@sourcery-ai sourcery-ai bot requested a review from X-oss-byte August 9, 2023 23:46
@stackblitz
Copy link

stackblitz bot commented Aug 9, 2023

Review PR in StackBlitz Codeflow Run & review this pull request in StackBlitz Codeflow.

Copy link
Author

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sourcery timed out performing refactorings.

Due to GitHub API limits, only the first 60 comments can be shown.

Comment on lines -18 to +42
extras = {}
extras["quality"] = ["black ~= 23.1", "ruff >= 0.0.241", "hf-doc-builder >= 0.3.0", "urllib3 < 2.0.0"]
extras["docs"] = []
extras["test_prod"] = ["pytest", "pytest-xdist", "pytest-subtests", "parameterized"]
extras["test_dev"] = ["datasets", "evaluate", "transformers", "scipy", "scikit-learn", "deepspeed", "tqdm", "bitsandbytes"]
extras = {
"quality": [
"black ~= 23.1",
"ruff >= 0.0.241",
"hf-doc-builder >= 0.3.0",
"urllib3 < 2.0.0",
],
"docs": [],
"test_prod": [
"pytest",
"pytest-xdist",
"pytest-subtests",
"parameterized",
],
"test_dev": [
"datasets",
"evaluate",
"transformers",
"scipy",
"scikit-learn",
"deepspeed",
"tqdm",
"bitsandbytes",
],
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 18-22 refactored with the following changes:

id_to_label = list(set(all_labels))
id_to_label.sort()
id_to_label = sorted(set(all_labels))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function training_function refactored with the following changes:

id_to_label = list(set(all_labels))
id_to_label.sort()
id_to_label = sorted(set(all_labels))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function training_function refactored with the following changes:

for step, batch in enumerate(eval_dataloader):
for batch in eval_dataloader:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function training_function refactored with the following changes:

@find_executable_batch_size(starting_batch_size=int(observed_batch_size))
@find_executable_batch_size(starting_batch_size=observed_batch_size)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function training_function refactored with the following changes:

Comment on lines -23 to +24
cpu_left_col_base = [mem.copy() for i in range(6)]
cpu_right_col_base = [mem.copy() for i in range(6)]
cpu_left_col_base = [mem.copy() for _ in range(6)]
cpu_right_col_base = [mem.copy() for _ in range(6)]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Stage3.construct refactored with the following changes:

Comment on lines -23 to +24
cpu_left_col_base = [mem.copy() for i in range(6)]
cpu_right_col_base = [mem.copy() for i in range(6)]
cpu_left_col_base = [mem.copy() for _ in range(6)]
cpu_right_col_base = [mem.copy() for _ in range(6)]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Stage4.construct refactored with the following changes:

Comment on lines -24 to +25
cpu_left_col_base = [mem.copy() for i in range(6)]
cpu_right_col_base = [mem.copy() for i in range(6)]
cpu_left_col_base = [mem.copy() for _ in range(6)]
cpu_right_col_base = [mem.copy() for _ in range(6)]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Stage5.construct refactored with the following changes:

for torch_function_name in tensor_constructors_to_patch.keys():
for torch_function_name in tensor_constructors_to_patch:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function init_on_device refactored with the following changes:

Comment on lines -336 to +358
if set(device_map.values()) == {"cpu"} or set(device_map.values()) == {"cpu", "disk"}:
if set(device_map.values()) in [{"cpu"}, {"cpu", "disk"}]:
main_device = "cpu"
else:
main_device = [d for d in device_map.values() if d not in ["cpu", "disk"]][0]

if main_device != "cpu":
cpu_modules = [name for name, device in device_map.items() if device == "cpu"]
if state_dict is None and len(cpu_modules) > 0:
if state_dict is None and cpu_modules:
state_dict = extract_submodules_state_dict(model.state_dict(), cpu_modules)

disk_modules = [name for name, device in device_map.items() if device == "disk"]
if offload_dir is None and offload_index is None and len(disk_modules) > 0:
if offload_dir is None and offload_index is None and disk_modules:
raise ValueError(
"We need an `offload_dir` to dispatch this model according to this `device_map`, the following submodules "
f"need to be offloaded: {', '.join(disk_modules)}."
)
if (
len(disk_modules) > 0
disk_modules
and offload_index is None
and (not os.path.isdir(offload_dir) or not os.path.isfile(os.path.join(offload_dir, "index.json")))
and (
not os.path.isdir(offload_dir)
or not os.path.isfile(os.path.join(offload_dir, "index.json"))
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function dispatch_model refactored with the following changes:

Comment on lines -98 to +99
# Random number generator states
states = {}
states_name = f"{RNG_STATE_NAME}_{process_index}.pkl"
states["random_state"] = random.getstate()
states = {"random_state": random.getstate()}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function save_accelerator_state refactored with the following changes:

This removes the following comments ( why? ):

# Random number generator states

_PYTORCH_DATALOADER_KWARGS.update(additional_kwargs)
_PYTORCH_DATALOADER_KWARGS |= additional_kwargs
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 64-64 refactored with the following changes:

Comment on lines -160 to +168
# If drop_last is True of the last batch was full, iteration is over, otherwise...
if not self.drop_last and len(initial_data) > 0 and len(batch) < self.batch_size:
if not self.even_batches:
if len(batch) > batch_length * self.process_index:
yield batch[batch_length * self.process_index : batch_length * (self.process_index + 1)]
else:
if self.even_batches:
# For degenerate cases where the dataset has less than num_process * batch_size samples
while len(initial_data) < self.batch_size:
initial_data += initial_data
batch = batch + initial_data
yield batch[batch_length * self.process_index : batch_length * (self.process_index + 1)]
elif len(batch) > batch_length * self.process_index:
yield batch[batch_length * self.process_index : batch_length * (self.process_index + 1)]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function BatchSamplerShard._iter_with_split refactored with the following changes:

This removes the following comments ( why? ):

# If drop_last is True of the last batch was full, iteration is over, otherwise...

Comment on lines -189 to +188
# If drop_last is True, iteration is over, otherwise...
if not self.drop_last and len(initial_data) > 0:
if not self.even_batches:
if len(batch_to_yield) > 0:
yield batch_to_yield
else:
if self.even_batches:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function BatchSamplerShard._iter_with_no_split refactored with the following changes:

This removes the following comments ( why? ):

# If drop_last is True, iteration is over, otherwise...

Comment on lines -515 to +512
# num_processes batches of the main iterator are concatenated then dispatched and split.
# We add the batches one by one so we have the remainder available when drop_last=False.
batches = []
for _ in range(self.state.num_processes):
batches.append(next(iterator))
batches = [next(iterator) for _ in range(self.state.num_processes)]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DataLoaderDispatcher._fetch_batches refactored with the following changes:

This removes the following comments ( why? ):

# num_processes batches of the main iterator are concatenated then dispatched and split.
# If drop_last is False and split_batches is False, we may have a remainder to take care of.
# We add the batches one by one so we have the remainder available when drop_last=False.

Comment on lines -1004 to +998
if not self.in_dataloader:
return -1
return self.active_dataloader.remainder
return -1 if not self.in_dataloader else self.active_dataloader.remainder
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function GradientState.remainder refactored with the following changes:

Comment on lines -116 to +136
if not _blank:
err = ""
if not hasattr(self, "name"):
err += "`name`"
if not hasattr(self, "requires_logging_directory"):
if len(err) > 0:
err += ", "
err += "`requires_logging_directory`"
if _blank:
return
err = ""
if not hasattr(self, "name"):
err += "`name`"
if not hasattr(self, "requires_logging_directory"):
if err != "":
err += ", "
err += "`requires_logging_directory`"

# as tracker is a @property that relies on post-init
if "tracker" not in dir(self):
if len(err) > 0:
err += ", "
err += "`tracker`"
if len(err) > 0:
raise NotImplementedError(
f"The implementation for this tracker class is missing the following "
f"required attributes. Please define them in the class definition: "
f"{err}"
)
if "tracker" not in dir(self):
if err != "":
err += ", "
err += "`tracker`"
if err != "":
raise NotImplementedError(
f"The implementation for this tracker class is missing the following "
f"required attributes. Please define them in the class definition: "
f"{err}"
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function GeneralTracker.__init__ refactored with the following changes:

Comment on lines -92 to +122
opts = parser._actions
titles = [
"Hardware Selection Arguments",
"Resource Selection Arguments",
"Training Paradigm Arguments",
"positional arguments",
"optional arguments",
]
if len(args) > 1:
used_platforms = [arg for arg in args if arg in options_to_group.keys()]
args = list(map(clean_option, args))
used_titles = [options_to_group[o] for o in used_platforms]
opts = parser._actions
titles = [
"Hardware Selection Arguments",
"Resource Selection Arguments",
"Training Paradigm Arguments",
"positional arguments",
"optional arguments",
]
for i, arg in enumerate(opts):
# If the argument's container is outside of the used titles, hide it
if arg.container.title not in titles + used_titles:
setattr(opts[i], "help", argparse.SUPPRESS)
# If the argument is hardware selection, but not being passed, hide it
elif arg.container.title == "Hardware Selection Arguments":
if set(arg.option_strings).isdisjoint(set(args)):
setattr(opts[i], "help", argparse.SUPPRESS)
else:
setattr(opts[i], "help", arg.help + " (currently selected)")
# If the argument is a training paradigm, but not being passed, hide it
setattr(opts[i], "help", f"{arg.help} (currently selected)")
elif arg.container.title == "Training Paradigm Arguments":
if set(arg.option_strings).isdisjoint(set(used_platforms)):
setattr(opts[i], "help", argparse.SUPPRESS)
else:
setattr(opts[i], "help", arg.help + " (currently selected)")
for i, group in enumerate(list(parser._action_groups)):
setattr(opts[i], "help", f"{arg.help} (currently selected)")
for group in list(parser._action_groups):
# If all arguments in the group are hidden, hide the group
if all([arg.help == argparse.SUPPRESS for arg in group._group_actions]):
if all(
arg.help == argparse.SUPPRESS for arg in group._group_actions
):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _CustomHelpAction.__call__ refactored with the following changes:

This removes the following comments ( why? ):

# If the argument is a training paradigm, but not being passed, hide it
# If the argument is hardware selection, but not being passed, hide it

Comment on lines -689 to +693
if is_rich_available() and debug:
console = get_console()
console.print("\n[bold red]Using --debug, `torch.distributed` Stack Trace:[/bold red]")
console.print_exception(suppress=[__file__], show_locals=False)
else:
if not is_rich_available() or not debug:
raise
console = get_console()
console.print("\n[bold red]Using --debug, `torch.distributed` Stack Trace:[/bold red]")
console.print_exception(suppress=[__file__], show_locals=False)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function deepspeed_launcher refactored with the following changes:

Comment on lines -741 to +740
if args.tpu_use_sudo:
new_cmd = ["sudo"]
else:
new_cmd = []

new_cmd = ["sudo"] if args.tpu_use_sudo else []
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function tpu_pod_launcher refactored with the following changes:

Comment on lines -831 to +835
args.multi_gpu = (
True
if defaults.distributed_type
in (DistributedType.MULTI_GPU, DistributedType.MULTI_NPU, DistributedType.MULTI_XPU)
else False
args.multi_gpu = defaults.distributed_type in (
DistributedType.MULTI_GPU,
DistributedType.MULTI_NPU,
DistributedType.MULTI_XPU,
)
args.tpu = defaults.distributed_type == DistributedType.TPU
args.use_fsdp = defaults.distributed_type == DistributedType.FSDP
args.use_megatron_lm = defaults.distributed_type == DistributedType.MEGATRON_LM
args.tpu_use_cluster = defaults.tpu_use_cluster if args.tpu else False
if args.gpu_ids is None:
if defaults.gpu_ids is not None:
args.gpu_ids = defaults.gpu_ids
else:
args.gpu_ids = "all"

args.gpu_ids = defaults.gpu_ids if defaults.gpu_ids is not None else "all"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _validate_launch_command refactored with the following changes:

Comment on lines -37 to +41
if compute_environment == ComputeEnvironment.AMAZON_SAGEMAKER:
config = get_sagemaker_input()
else:
config = get_cluster_input()
return config
return (
get_sagemaker_input()
if compute_environment == ComputeEnvironment.AMAZON_SAGEMAKER
else get_cluster_input()
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_user_input refactored with the following changes:

Comment on lines -44 to +52
if config_file is not None:
if not os.path.isfile(config_file):
raise FileNotFoundError(
f"The passed configuration file `{config_file}` does not exist. "
"Please pass an existing file to `accelerate launch`, or use the the default one "
"created through `accelerate config` and run `accelerate launch` "
"without the `--config_file` argument."
)
else:
if config_file is None:
config_file = default_config_file
elif not os.path.isfile(config_file):
raise FileNotFoundError(
f"The passed configuration file `{config_file}` does not exist. "
"Please pass an existing file to `accelerate launch`, or use the the default one "
"created through `accelerate config` and run `accelerate launch` "
"without the `--config_file` argument."
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function load_config_from_file refactored with the following changes:

Comment on lines -91 to +90
result = {k: v for k, v in result.items() if v is not None}
return result
return {k: v for k, v in result.items() if v is not None}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function BaseConfig.to_dict refactored with the following changes:

Comment on lines -64 to +74
if num_gpus > 1:
config["distributed_type"] = "MULTI_GPU"
else:
config["distributed_type"] = "NO"
config["distributed_type"] = "MULTI_GPU" if num_gpus > 1 else "NO"
elif is_xpu_available() and use_xpu:
num_xpus = torch.xpu.device_count()
config["num_processes"] = num_xpus
config["use_cpu"] = False
if num_xpus > 1:
config["distributed_type"] = "MULTI_XPU"
else:
config["distributed_type"] = "NO"
config["distributed_type"] = "MULTI_XPU" if num_xpus > 1 else "NO"
elif is_npu_available():
num_npus = torch.npu.device_count()
config["num_processes"] = num_npus
config["use_cpu"] = False
if num_npus > 1:
config["distributed_type"] = "MULTI_NPU"
else:
config["distributed_type"] = "NO"
config["distributed_type"] = "MULTI_NPU" if num_npus > 1 else "NO"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function write_basic_config refactored with the following changes:

if torch.cuda.is_available():
num_gpus = torch.cuda.device_count()
else:
num_gpus = 0
num_gpus = torch.cuda.device_count() if torch.cuda.is_available() else 0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

for element in self.data:
yield element
yield from self.data
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DummyIterableDataset.__iter__ refactored with the following changes:

  • Replace yield inside for loop with yield from (yield-from)

result = []
for batch in dl:
result.append(gather(batch))
result = [gather(batch) for batch in dl]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function dl_preparation_check refactored with the following changes:

Comment on lines -234 to +226
result = []
for batch in dl:
result.append(gather(batch))
result = [gather(batch) for batch in dl]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function central_dl_preparation_check refactored with the following changes:

Comment on lines -298 to +282
for epoch in range(3):
for _ in range(3):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function mock_training refactored with the following changes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants