Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
c850b78
自动定量评估
KiritoFD Jan 23, 2026
4a3dbd1
loss of details
KiritoFD Jan 24, 2026
4397ed0
增加MSE辅助保留内容,速度正则化解决亮度变化
KiritoFD Jan 24, 2026
5bc85f0
balance loss
KiritoFD Jan 25, 2026
3a3dad8
good results 250epoch
KiritoFD Jan 25, 2026
3fd33db
patch3,7曲线离得很近,需要给大权重拉开
KiritoFD Jan 26, 2026
2e075c0
一直在动亮度是高频没了;尝试视频,全是飘动色块。验证脚本,潜空间上的FFT不行
KiritoFD Jan 26, 2026
db7b953
验证CNN滤波,不好,需要改为可学习CNN
KiritoFD Jan 26, 2026
e7afe09
训练代理网络用于滤波📊 Proxy Benchmark Report (N=50) ---------------------------…
KiritoFD Jan 26, 2026
77de61d
100epoch best
KiritoFD Jan 27, 2026
d21c00f
verified lora on latents : [Test 1] Zero Initialization Safety Max…
KiritoFD Jan 27, 2026
9544cb4
Verified Edge IoU: 0.5412 ✅ Correct: Identity path with LoRA perturba…
KiritoFD Jan 27, 2026
2e77e7b
LORA没加进去,风格学的意外的好,就是图片质量不太好
KiritoFD Jan 27, 2026
578ba74
拆分了过长的train.py
KiritoFD Jan 27, 2026
d82ea9a
250 epoch,有提升但是噪点严重,改变太弱了
KiritoFD Jan 28, 2026
c043767
风格强多了,加了cross_attn,encoder里面也加了AdaGN,原来的配置权重需要调整
KiritoFD Jan 28, 2026
11798d4
MSE完全爆炸
KiritoFD Jan 28, 2026
2883313
学习率太大跑飞了,下次loss上升先调整学习率
KiritoFD Jan 28, 2026
df7d210
100epoch,很稳定,准备加强风格
KiritoFD Jan 28, 2026
154cffe
400epoch已经收敛
KiritoFD Jan 29, 2026
9110d69
评估分频方式
KiritoFD Jan 29, 2026
fe6acca
优化eval部分的infra,loss部分计算转为Conv避免unfold节省显存带宽,推理batch化
KiritoFD Jan 29, 2026
c6f2d13
修复从ckpt重训的问题;余弦退火调度lr
KiritoFD Jan 30, 2026
8c0efff
两种Loss的权重归一化
KiritoFD Jan 30, 2026
32b25f5
减小LR,增大MSE权重
KiritoFD Jan 31, 2026
901a9fb
swd scale
KiritoFD Jan 31, 2026
7ef0105
去掉cross_attn,用回AdaGN
KiritoFD Jan 31, 2026
a433d3e
去掉无效的CA Grad;1259的patch size是正确的
KiritoFD Jan 31, 2026
c39129d
指标又提升但是结果图片不好,噪声调度有问题
KiritoFD Jan 31, 2026
5605964
Cross-Attention 在纯风格迁移中可能导致内容语义的过度纠缠(过拟合语义而非纹理)
KiritoFD Jan 31, 2026
bd3bce0
调整推理batch为24
KiritoFD Jan 31, 2026
10deda1
提升网络容量 减小层数
KiritoFD Feb 1, 2026
cd9a6bc
增大通道宽度
KiritoFD Feb 1, 2026
a08aa6d
CNN分类器评估,效果很差
KiritoFD Feb 4, 2026
c25c46d
简化代码,修正训练目标:结构和风格损失不再对抗
KiritoFD Feb 6, 2026
5aaee8d
debug VRAM
KiritoFD Feb 7, 2026
2cd57ec
效果不理想,可作为基线
KiritoFD Feb 7, 2026
e4f9017
6M model tryout
KiritoFD Feb 7, 2026
d54006d
complete fail
KiritoFD Feb 8, 2026
38e00fc
小规模验证,分类成绩很好但是画面有点崩,增大通道数,在4个以上的通道做风格是对的,就是要加强一下结构
KiritoFD Feb 8, 2026
b0c432f
蒸馏把风格放进模型,推理不需要参考图
KiritoFD Feb 8, 2026
f83537a
从ckpt逆向,回滚到风格发挥作用的版本
KiritoFD Feb 9, 2026
eddaa5c
终于推动了,但是有点过头了
KiritoFD Feb 9, 2026
b085f94
风格分类很强,结构完全炸了
KiritoFD Feb 9, 2026
8ce6d8e
分类完全学会了
KiritoFD Feb 9, 2026
d66ecbc
完整实验
KiritoFD Feb 9, 2026
9e7362b
风格确实好了,雾也解决了,就提升画质就行了
KiritoFD Feb 10, 2026
29ef531
相当不错的结果,就是还有一点雾,并不严重,把Cycle改到MSE是对的
KiritoFD Feb 10, 2026
9acbefc
分类器好像被hack掉了
KiritoFD Feb 10, 2026
9a2b1b3
均衡了,跑全量实验
KiritoFD Feb 10, 2026
96804f9
看上去风格还OK,但是突然出现了网格
KiritoFD Feb 11, 2026
c19371e
笔触风格
KiritoFD Feb 11, 2026
27921b4
修复infra,风格略弱,加强
KiritoFD Feb 12, 2026
58af1eb
风格注入在map16中频大块,map32高频笔触
KiritoFD Feb 12, 2026
e23409a
简化优化风格注入
KiritoFD Feb 12, 2026
e2aa996
fixed infra
KiritoFD Feb 12, 2026
1599a2f
风格消融实验
KiritoFD Feb 13, 2026
e3358c2
实验整合进experiments目录
KiritoFD Feb 13, 2026
b4fa07c
修正消融
KiritoFD Feb 13, 2026
3d4af4c
消融实验结果
KiritoFD Feb 14, 2026
17365cf
完整消融
KiritoFD Feb 14, 2026
c505d3d
gram白化
KiritoFD Feb 14, 2026
c9b2cde
no-edge
KiritoFD Feb 15, 2026
c9f17b5
no-edge
KiritoFD Feb 15, 2026
8123db1
semigroup=+5507.2MB逆天占用
KiritoFD Feb 15, 2026
72f6c10
ablation
KiritoFD Feb 15, 2026
f055c13
ablation
KiritoFD Feb 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,12 @@
*.onnx
!/src/checkpoints-lsfm/
!/src/checkpoints-lsfm/**
data/*
*.zip
Thermal/.compile_cache/*
compile_cache/*
torch_compile_cache/*
Cycle-NCE/torch_compile_cache/*
*.bin
*.safetensors
eval_cache/*
38 changes: 38 additions & 0 deletions Cycle-NCE/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Python
__pycache__/
*.py[cod]
*.pyo
*.pyd

# Virtual envs
.venv/
venv/
env/

# OS / editor
.DS_Store
Thumbs.db
.idea/
.vscode/

# Experiment artifacts
logs/
full_eval/
inference/
experiments/
*_cache/
eval_cache/
torch_compile_cache/
comile_cache/

# Local run folders (repo-specific)
full_300-*/
full_300_*/
full_strong_style/
overfit50-*/

# Large binaries
*.pt
*.ckpt
*.bin

1 change: 1 addition & 0 deletions Cycle-NCE/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.14
190 changes: 190 additions & 0 deletions Cycle-NCE/MODEL_AND_TRAINING_DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Latent AdaCUT: Model and Training Design (Current)

## 1. Scope and Goal

This document explains the current model/training design in `Cycle-NCE/src`:

- how style is injected into latent space;
- how losses are scheduled over long runs (300 epochs);
- why image quality can degrade (fog/checkerboard-like artifacts);
- why current learning rate is chosen and what stability risks remain.

The target here is practical: keep style transfer effective while preventing structure collapse and visual artifacts.

## 2. Model Architecture (Core Principle)

Main file: `src/model.py`

### 2.1 Input/Output Domain

- Input and output are latent tensors (`[B, 4, 32, 32]` for SD-style 256px latent grid).
- The network predicts a residual latent delta and outputs:
- `pred = content + delta`.

This keeps content identity easier to preserve than direct latent synthesis.

### 2.2 Style Injection Pathways

The model has both global and spatial style paths:

- Global style code:
- `style_ref -> style_enc -> style_proj`
- `style_id -> style_emb`
- mixed by `style_mix_alpha`.
- Spatial style maps:
- reference-style spatial features (`32x32`, `16x16`)
- style-id spatial priors (`style_spatial_id_32`, `style_spatial_id_16`).

Injection points:

- pre-block and in-block injection at 32/16 scales;
- decoder spatial injection (`style_spatial_dec_gain_*`);
- optional texture head and force path.

This is why style can be injected without reference image at inference: style-id priors are learnable.

### 2.3 Frequency Bias and Output Control

Current model includes:

- optional high-frequency bias on delta (`use_delta_highpass_bias`);
- style gate floor / style force path controls;
- bilinear upsampling and optional blur in style/downsample branches.

The intent is to avoid pure low-frequency color-shift shortcuts and keep local texture changes.

## 3. Objective Design (Current)

Main file: `src/losses.py`

### 3.1 Teacher-Student Setup

Two forward passes per batch:

- Teacher: uses style reference (`style_ref=target_style`).
- Student: deployment path (`style_ref=None`, style-id only).

This trains the inference path directly instead of relying only on reference-guided outputs.

### 3.2 Distill and Style Closure

- Distill:
- now supports low-pass-only distillation (`distill_low_only=true`),
- and cross-domain-only aggregation (`distill_cross_domain_only=true`).
- Code loss:
- teacher output code -> reference code,
- student output code -> style-id prototype.

This keeps style conditioning active and reduces identity collapse.

### 3.3 Structure Constraints (Reworked)

`cycle` and `struct` now share the same configurable alignment form:

- loss type: `l1` or `mse`;
- low-pass blend strength: `[0, 1]` numeric parameter.

Config keys:

- `cycle_loss_type`, `cycle_lowpass_strength`
- `struct_loss_type`, `struct_lowpass_strength`

Extra structure terms:

- edge term (`w_edge`, Sobel magnitude),
- cycle edge blending (`cycle_edge_strength`),
- delta TV penalty (`w_delta_tv`) to suppress periodic artifacts.

### 3.4 NCE and Scheduling

- NCE is optional and ramped in with warmup+ramp.
- All major structure terms are scheduled by `_ramp_weight`.

This is critical: style must establish first, then structure regularization takes over.

## 4. Why Quality Degrades (Fog/Artifacts)

Observed failure modes were consistent with design tradeoffs:

1. Over-constrained structure stack:
- cycle + struct + edge + NCE + TV all high/early
- pushes model to safe, low-variance outputs (fog/softness).
2. Low-pass-dominant constraints:
- if low-pass strength is high, high-frequency details are weakly protected.
3. Strong MSE structure terms:
- MSE can over-penalize deviations and bias toward conservative smooth outputs.

Mitigation already applied:

- reduced structure weights and delayed warmup/ramp in `config.json`;
- switched distill to low-pass-only + cross-domain only;
- reduced TV strength and low-pass blend strength.

## 5. Learning Rate Assessment (Current 300-Epoch Config)

Config: `src/config.json`

- `learning_rate = 1.5e-4` (reduced from `2.0e-4`);
- cosine decay to `5e-6`;
- `grad_clip_norm = 1.0`;
- AdamW + bf16 + TF32.

### 5.1 Is it likely to explode?

Numerical explosion risk is low because:

- no adversarial discriminator instability here;
- gradient clipping is enabled;
- long cosine decay lowers step size over time.

### 5.2 Real risk at this LR

The practical risk is not NaN/overflow but early optimization bias:

- with large batch (`128`) and no explicit LR warmup,
- optimizer can settle into conservative local minima before style path matures.

So the LR choice is "safe but still needs schedule discipline", not "unsafe runaway".

## 6. Current 300-Epoch Strategy (Rationale)

In `src/config.json`:

- style phase first:
- style terms active from start (`gram/code/push/distill`);
- structure phase later:
- `struct/edge` warmup+ramp delayed;
- `cycle/nce` warmup+ramp delayed further.

This prioritizes learning "how to change style" before enforcing strict structure consistency.

## 7. Practical Monitoring Checklist

Per epoch, watch these together (not single scalar only):

- `distill`, `code`, `gram`, `push`
- `cycle`, `struct`, `edge`, `delta_tv`, `nce`
- effective weights: `w_cycle_eff`, `w_struct_eff`, `w_edge_eff`, `w_nce_eff`

And in full eval:

- `photo->Hayao classifier_acc` (style transfer direction)
- `clip_style` and `clip_content`
- visual collage for fog/checkerboard artifacts.

If transfer collapses:

- reduce structure stack first (weights or warmup timing),
- do not immediately increase all style terms simultaneously.

## 8. Recommended Next Iteration

For the next main run:

- keep current `1.5e-4` LR and 300-epoch schedule;
- run until first 2 full-eval checkpoints (epoch 50, 100) before judging;
- if fog persists:
- lower `cycle_lowpass_strength`,
- reduce `w_delta_tv`,
- reduce early `w_struct/w_edge` further before touching style terms.

Loading