Graduate Theses & Dissertations

Exploring the Scalability of Deep Learning on GPU Clusters
In recent years, we have observed an unprecedented rise in popularity of AI-powered systems. They have become ubiquitous in modern life, being used by countless people every day. Many of these AI systems are powered, entirely or partially, by deep learning models. From language translation to image recognition, deep learning models are being used to build systems with unprecedented accuracy. The primary downside, is the significant time required to train the models. Fortunately, the time needed for training the models is reduced through the use of GPUs rather than CPUs. However, with model complexity ever increasing, training times even with GPUs are on the rise. One possible solution to ever-increasing training times is to use parallelization to enable the distributed training of models on GPU clusters. This thesis investigates how to utilise clusters of GPU-accelerated nodes to achieve the best scalability possible, thus minimising model training times. Author Keywords: Compute Canada, Deep Learning, Distributed Computing, Horovod, Parallel Computing, TensorFlow

Search Our Digital Collections

Query

Enabled Filters

  • (-) ≠ Reid
  • (-) ≠ Bowman
  • (-) = Computer science
  • (-) ≠ Weygang
  • (-) = Applied Modeling and Quantitative Methods
  • (-) ≠ Mathematics
  • (-) = Williams, Taylor Alan

Filter Results

Date

2011 - 2021
(decades)
Specify date range: Show
Format: 2021/10/21

Author Name

Degree