[Tensorflow] Multi GPU 사용하기 (gpu 병렬처리)

데이터의 용량이 큰 경우,

가용할 수 있는 GPU가 여러 개인 경우,

더 효율적으로 모델을 학습할 수 있는 방법이 있습니다.

단일 GPU가 아닌 여러 개의 GPU를 활용하여 분산전략을 수행한다면, 더 빠르게 모델을 학습시킬 수 있습니다.

(물론 너무 적은 데이터의 경우 데이터를 분할하는 시간이 더 오래걸리기 때문에, 그러한 경우에는 단일 GPU를 사용하면 됩니다)

GPU병렬 처리에는 다양한 방법이 존재하는데요.

간단한 소개와 함께 대표적인 방법 코드를 통해 확인해보겠습니다.

방법1. MirroredStrategy

TensorFlow에서 여러 gpu를 활용한 학습에 추천하는 방법입니다.

사전에 설정된 & 사용 가능한 모든 GPU자원을 동시에 활용을 하는 방법을 채택하는데요. 이때 입력 데이터는 병렬적으로 처리되어 들어갑니다.

각 GPU는 할당된 데이터를 다루고, 각 GPU에서 얻은 기울기(gradients)는 최종적으로 집계되어 모델 가중치가 업데이트 됩니다.

MirrorStrategy를 활용하는 방법은 다음과 같습니다.

첫째, MirroredStrategy 인스턴스를 생성합니다.

둘째, 생성된 인스턴스의 scope를 이용하여 모델을 만들고 컴파일 합니다.

마지막으로 모델을 학습하면 됩니다.

# gpu 확인을 위한 함수 호출
from tensorflow.python.client import device_lib
device_lib.list_local_device()


# ------ 출력물 예제 -----
# [name: "/device:CPU:0"
#  device_type: "CPU"
#  memory_limit: 268435456
#  locality {
#  }
#  incarnation: 5201132580995199256,
#  name: "/device:GPU:0"
#  device_type: "GPU"
#  memory_limit: 10770322560
#  locality {
#    bus_id: 1
#    links {
#    }
#  }
#  incarnation: 7702391586114156914
#  physical_device_desc: "device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1",
#  name: "/device:GPU:1"
#  device_type: "GPU"
#  memory_limit: 10770692224

이제 MirroredStrategy 인스턴스를 생성하고 확인하겠습니다.

import tensorflow as tf

# gpu 이름
gpus = tf.config.experimental.list_logical_devices('GPU')

# gpu가 2개 이상이면, MirroredStrategy 인스턴스 생성
if len(gpus) > 1: 
    strategy = tf.distribute.MirroredStrategy([gpu.name for gpu in gpus])
    print('\n\n Running on multiple GPUs ', [gpu.name for gpu in gpus])

# gpu가 1개 이하면, 디폴트 전략 설정
else:
    strategy = tf.distribute.get_strategy() # default strategy that works on CPU and single GPU
    print('\n\n Running on single GPU', gpus[0].name)
    print('\n\n #accelerators: ', strategy.num_replicas_in_sync, '\n\n')


# ----- 출력물 ---
# INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
# Running on multiple GPUs  ['/device:GPU:0', '/device:GPU:1']

딥러닝 모델 생성 & 학습 (모델 설계는 각자 데이터와 분석 주제에 맞게 하시면 됩니다)

with strategy.scope():

    model = models.Sequential()

    model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation='relu'))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(128, (3, 3), activation='relu'))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(128, (3, 3), activation='relu'))
    model.add(layers.MaxPooling2D((2, 2)))

    model.add(layers.Flatten())
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(512, activation='relu'))
    model.add(layers.Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy',
                  optimizer=optimizers.Adam(lr=1e-4),
                  metrics=['acc'])

history = model.fit(
    train_generator,
    steps_per_epoch=100,
    epochs=100,
    validation_data=validation_generator,
    validation_steps=50
)

가장 중요한 점은 with 구문안에서 모델이 compile이 되어야 한다는 점입니다. 따라서 텐서플로우가 모델을 properly replicate 할 수 있습니다.

다음으로는 nvidia-smi를 통해 gpu가 가동중인지 확인해봅시다.

첫번째 사진은 모델 학습 전입니다.

다음 사진은 모델 학습 중일때 입니다.

방법2. Parameter Server Strategy

지금 이 방법부터 따로 실예제 코드를 넣진 않고, 유사 코드로 설명하겠습니다.

이 방법은 위 Mirrored Strategy와 비동기적으로 파라미터를 업데이트한다는 점이 다른점입니다 (Asynchronous replication).

각 분산처리에서 gradient 계산이 끝난 건 먼저 사용해서 parameter를 업데이트하는 방식입니다.

이 방식은 상대적으로 수렴이 느릴 수 있습니다.

하지만 모델이 굉장히 헤비하고 크다면. Mirrored Strategy보다 나은 전략을 제시할 수 있습니다. 왜냐하면 모든 gpu의 연산이 끝날 때까지 기다리는 Mirrored Strategy 방법과 다르게 계산이 끝난 gpu부터 사용하여 parameter를 업데이트 하기 때문이죠.

Parameter Server Strategy 인스턴스는 다음과 같이 생성할 수 있습니다.

strategy = tf.distribute.experimental.ParameterServerStrategy()

방법3. MultiWorkerMirroredStrategy

이 방법은 1번 방법과 마찬가지로 각 gpu마다 데이터를 동시에 처리합니다 (synchronous data-parellelism approach).

유일한 차이점은, 이 방법은 여러 기계 (multiple machines )로부터 모델을 학습시킬 때 사용하는 방법이라는 점입니다.

인스턴스 생성은 다음과 같습니다

strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()

각 방법별로 차이점을 알아보았습니다.

주어진 자원 할당과 데이터 & 모델 복잡도에 맞는 방법으로 활용하시면 도움이 될 듯 합니다.

분산학습에 대한 설명은 https://wooono.tistory.com/331 이 블로그 글이 잘 정리되어 있는것 같으니 참고 바랍니다.

저작자표시 (새창열림)

'파이썬 > Tensorflow' 카테고리의 다른 글

[tensorflow] 함수형 API 활용한 CNN 예시 (0)	2023.02.27
[tensorflow] tensorflow-metal 설치 방법 (m1맥북 gpu) (1)	2022.12.24

방법1. MirroredStrategy

방법2. Parameter Server Strategy

방법3. MultiWorkerMirroredStrategy

'파이썬 > Tensorflow' 카테고리의 다른 글

티스토리툴바