Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 19 additions & 23 deletions recipes_source/compiling_optimizer.rst
Original file line number Diff line number Diff line change
@@ -1,25 +1,24 @@
(beta) Compiling the optimizer with torch.compile
(beta) torch.compile로 옵티마이저 컴파일하기
==========================================================================================

**Author:** `Michael Lazos <https://github.com/mlazos>`_
**저자:** `Michael Lazos <https://github.com/mlazos>`_
**번역** `김승환 <https://github.com/7SH7>`_

The optimizer is a key algorithm for training any deep learning model.
Since it is responsible for updating every model parameter, it can often
become the bottleneck in training performance for large models. In this recipe,
we will apply ``torch.compile`` to the optimizer to observe the GPU performance
improvement.
옵티마이저는 딥러닝 모델을 훈련하는 핵심 알고리즘입니다.
모든 모델 파라미터를 업데이트하는 역할을 하기 때문에, 대규모 모델에서는 종종 훈련 성능의 병목이 될 수 있습니다.
이 레시피에서는 옵티마이저에 ``torch.compile``을 적용하여 GPU 성능 향상을 관찰해보겠습니다.

.. note::

This tutorial requires PyTorch 2.2.0 or later.
이 튜토리얼은 PyTorch 2.2.0 이상이 필요합니다.

Model Setup
모델 설정
~~~~~~~~~~~~~~~~~~~~~
For this example, we'll use a simple sequence of linear layers.
Since we are only benchmarking the optimizer, the choice of model doesn't matter
because optimizer performance is a function of the number of parameters.
이 예제에서는 간단한 선형 계층의 시퀀스를 사용할 것입니다.
우리는 옵티마이저의 성능만 벤치마킹할 것이기 때문에, 모델의 선택은 중요하지 않습니다.
옵티마이저의 성능은 파라미터의 수에 따라 달라지기 때문입니다.

Depending on what machine you are using, your exact results may vary.
사용하는 머신에 따라 정확한 결과는 다를 수 있습니다.

.. code-block:: python

Expand All @@ -32,19 +31,17 @@ Depending on what machine you are using, your exact results may vary.
output = model(input)
output.sum().backward()

Setting up and running the optimizer benchmark
옵티마이저 벤치마크 설정 및 실행
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In this example, we'll use the Adam optimizer
and create a helper function to wrap the step()
in ``torch.compile()``.
이 예제에서는 Adam 옵티마이저를 사용하고, ``torch.compile()``에서 step()을 래핑하는 도우미 함수를 생성합니다.

.. note::

``torch.compile`` is only supported on cuda devices with compute capability >= 7.0

.. code-block:: python

# exit cleanly if we are on a device that doesn't support torch.compile
# torch.compile이 지원되지 않는 디바이스에서는 깔끔하게 종료합니다.
if torch.cuda.get_device_capability() < (7, 0):
print("Exiting because torch.compile is not supported on this device.")
import sys
Expand All @@ -59,7 +56,7 @@ in ``torch.compile()``.
opt.step()


# Let's define a helpful benchmarking function:
# 유용한 벤치마킹 함수를 정의해봅시다:
import torch.utils.benchmark as benchmark


Expand All @@ -70,7 +67,7 @@ in ``torch.compile()``.
return t0.blocked_autorange().mean * 1e6


# Warmup runs to compile the function
# 함수를 컴파일하기 위한 웜업 실행
for _ in range(5):
fn()

Expand All @@ -82,13 +79,12 @@ in ``torch.compile()``.
print(f"eager runtime: {eager_runtime}us")
print(f"compiled runtime: {compiled_runtime}us")

Sample Results:
샘플 결과:

* Eager runtime: 747.2437149845064us
* Compiled runtime: 392.07384741178us

See Also
~~~~~~~~~

* For an in-depth technical overview, see
`Compiling the optimizer with PT2 <https://dev-discuss.pytorch.org/t/compiling-the-optimizer-with-pt2/1669>`__
* 심층적인 기술 개요를 위해서, `PT2로 옵티마이저 컴파일하기 <https://dev-discuss.pytorch.org/t/compiling-the-optimizer-with-pt2/1669>`__ 를 참조하세요.