[WIP] Upgrades #150

Erotemic · 2025-01-05T18:20:38Z

I'm working on improving the repo so it can be used seemlessly with arbitrary COCO manifests (currently using the kwcoco package, but that can be factored out).

I'm also making other changes / fixing issues / style as I go along. I'm attempting to keep the git commit messages clean so pieces of this can be broken off if they want to be merged upstream. But my end goal is to just make this usable as a command line tool that doesn't requires messing with config files and putting them in the right place.

…e time

henrytsui000 · 2025-03-10T12:48:26Z

Hi,

Apologies for the delay—I’m currently serving my country’s mandatory military duty, so code reviews will be slower than usual. I’ll gradually review the code over the coming days. I can still check my email daily, so feel free to reach out if anything is urgent.

Best regards,
Henry Tsui

Erotemic · 2025-03-10T13:46:01Z

Of course! Most of my work on this is as a hobby project, so I'm not available all the time either.

That being said, this PR has gotten very big and should be broken down into smaller more manageable pieces for review. In my most recent work, I've been less disciplined about keeping things non-experimental, so some commits are changing hard-coded defaults. Those experimental sections should not be merged.

To move forward it would be best to prioritize a list of features added in this branch to separate into a standalone branch. Reviewing what I have done so far could help build that list. In the meantime here is a list of what I think some of the more important features added here are:

Support for kwcoco, which makes it much easier to train on a custom dataset. The new file train_kwcoco_demo.sh provides an end-to-end example that generates a dataset, trains on it, predicts the new model, and then evaluates it.
Exposing all options for lightning's Trainer. I did this in a more recent commit, and it makes a big difference. Giving the user access to accumulate_batches and how to handle gradient clipping makes this repo applicable to many more problems.
Remove the dependency on opencv-python. For reasons I've discussed in other locations: Add cv2 graphics / headless extras to setup.py open-mmlab/mmcv#2775
Readd HorizontalFlip and VerticalFlip augmentations, but default them to zero.
Write the training config yaml to the training directory (answers the question: what model do these weights correspond to?)
Fix an issue with loading pretrained weights, and handle more cases where the model has changed slightly (By including torch-liberator we can do even better than this!)
Adding more logging statements
Improve the output of forward functions to be a dictionary instead of a tuple (which makes them much easier to change in the future)
Fix issues with logging (at least when wandb is disabled)
Visualize training and validation batches on disk (needs a little cleanup, but is getting there, and very useful for debugging)
Visualize tensorboard graphs on disk
Support for non-SGD optimizers (I have AdamW working, but the custom scheduler code seems not to work right yet).
Customize Trainer to leverage CUDA cores on 3090 GPUs.

Let me know which of these you want to prioritize and I'll split them into separate PRs. Feel free to give any general reviews here.

I do have one minimal open PR here: #160

salvaba94 · 2025-06-16T18:46:20Z

Hi @Erotemic , did you manage to reproduce the results of the original YOLOv9 with this set of upgrades?

Erotemic · 2025-06-16T19:14:53Z

I haven't tried, but I also didn't do anything to modify algorithm behavior or defaults. Just quality of life improvements.

This was referenced Jan 23, 2025

Upgrades VIAME/YOLO#1

Merged

Add a COCO to YOLO converter. #80

Open

Type Annotation for Config is incorrect. #159

Open

feat: add lazy imports for faster startup time #160

Open

Erotemic added 26 commits March 8, 2025 13:14

feat: Basic support for kwcoco files

675e7a6

refactor: cleanup code golf

96aed7a

change: disable determinism by default

1a0a31d

docs: add fixme note

90e15fd

change: other deterministic disable

b50209e

refactor: Remove import *, and use getattr to avoid an unsafe eval

f0b17d0

fix: handle case where classes is not in epoch_metrics

056cd57

fix: error when v_num is not in the loss dict

b78a7f3

lint: remove unused f-string

1d9e692

fix: type error in main

623264a

test: add doctest for DualLoss with helper config_utils

4a299ff

feat: add lazy imports for faster startup time

444d5c8

feat: allow user to specify accelerator

c63660b

feat: allow user to simplify output with environ

b34ef90

refactor: disable validation sanity check for faster training respons…

585dc34

…e time

fix: ensure categories are remapped with kwcoco

fbcd5c0

doc: add todo about data / classes

1665a0e

fix: dont assume iscrowd exists

0815249

fix: valid points check was incorrect

439c631

feat: use kwimage to handle more polygon reprs

039ff67

add: kwcoco training tutorial

cc60ee0

refactor: improve on-disk batch viz

8d899fa

feat: add weights loading log statement

3ccd4f3

fix: workaround weight loading issue at inference time

9c3eba6

feat: basic inference on a coco file

975ae5f

refactor: remove opencv-python requirement

aaed2a3

Erotemic added 11 commits March 8, 2025 13:14

add: callback for 3090 optimization

f780250

refactor: Use more callbacks

b0efc4e

save checkpoints with train loss

de2a4c5

fix typo

81c2ed7

rework float32_matmul_precision hack

a1b7426

add: tensorboad plotting callbacks

0ff0639

Better tensorboard plotter, training on demo works now

ff12c1f

log more than 1 image

4b40050

try to use overviews, but disable because it caused a crash

77e089b

remove assert for image logger

442689f

minor debug tweaks

8d7a738

Erotemic force-pushed the upgrades branch from 2736c1c to 8d7a738 Compare March 8, 2025 18:15

Erotemic mentioned this pull request Mar 8, 2025

I've no idea how to work with this #177

Open

Erotemic added 11 commits March 8, 2025 14:32

add: expose all lightning trainer args via hydra

5a69084

add trainer config yaml

40d86ee

Update logging location and log train batches (todo: make optional)

f8ebe45

improve image logger to show train and val

4137596

Fix outputs now being a dict

b81f5e2

Re-expose horizontal and vertical flips

238b44c

Add note

33781e0

Add notes

7b28d45

Rework optimizer creation to handle more optimizers

4bd3823

Replace lambda with def

a6e1be4

better schedule logging

4b51b9f

Erotemic added 2 commits March 11, 2025 18:20

Add doctest examples to BoxMatcher, refactor collate_fn

e0ce8c0

Fix issue with _is_coco

543599d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Upgrades #150

[WIP] Upgrades #150

Uh oh!

Erotemic commented Jan 5, 2025

Uh oh!

henrytsui000 commented Mar 10, 2025

Uh oh!

Erotemic commented Mar 10, 2025 •

edited

Loading

Uh oh!

salvaba94 commented Jun 16, 2025

Uh oh!

Erotemic commented Jun 16, 2025

Uh oh!

Uh oh!

[WIP] Upgrades #150

Are you sure you want to change the base?

[WIP] Upgrades #150

Uh oh!

Conversation

Erotemic commented Jan 5, 2025

Uh oh!

henrytsui000 commented Mar 10, 2025

Uh oh!

Erotemic commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

salvaba94 commented Jun 16, 2025

Uh oh!

Erotemic commented Jun 16, 2025

Uh oh!

Uh oh!

Erotemic commented Mar 10, 2025 •

edited

Loading