Graduate Theses & Dissertations


Support Vector Machines for Automated Galaxy Classification
Support Vector Machines (SVMs) are a deterministic, supervised machine learning algorithm that have been successfully applied to many areas of research. They are heavily grounded in mathematical theory and are effective at processing high-dimensional data. This thesis models a variety of galaxy classification tasks using SVMs and data from the Galaxy Zoo 2 project. SVM parameters were tuned in parallel using resources from Compute Canada, and a total of four experiments were completed to determine if invariance training and ensembles can be utilized to improve classification performance. It was found that SVMs performed well at many of the galaxy classification tasks examined, and the additional techniques explored did not provide a considerable improvement. Author Keywords: Compute Canada, Kernel, SDSS, SHARCNET, Support Vector Machine, SVM
Predicting Irregularities in Arrival Times for Toronto Transit Buses with LSTM Recurrent Neural Networks Using Vehicle Locations and Weather Data
Public transportation systems play important role in the quality of life of citizens in any metropolitan city. However, public transportation authorities face criticisms from commuters due to irregularities in bus arrival times. For example, transit bus users often complain when they miss the bus because it arrived too early or too late at the bus stop. Due to these irregularities, commuters may miss important appointments, wait for too long at the bus stop, or arrive late for work. This thesis seeks to predict the occurrence of irregularities in bus arrival times by developing machine learning models that use GPS locations of transit buses provided by the Toronto Transit Commission (TTC) and hourly weather data. We found that in nearly 37% of the time, buses either arrive early or late by more than 5 minutes, suggesting room for improvement in the current strategies employed by transit authorities. We compared the performance of three machine learning models, for which our Long Short-Term Memory (LSTM) [13] model outperformed all other models in terms of accuracy. The error rate for LSTM model was the lowest among Artificial Neural Network (ANN) and support vector regression (SVR). The improved accuracy achieved by LSTM is due to its ability to adjust and update the weights of neurons while maintaining long-term dependencies when encountering new stream of data. Author Keywords: ANN, LSTM, Machine Learning
Relationship Between Precarious Employment, Behaviour Addictions and Substance Use Among Canadian Young Adults
This thesis utilized a unique data-set, the Quinte Longitudinal Survey, to explore relationships among precarious employment and a range of mental health problems in a representative sample of Ontario young adults. Study 1 focused on various behavioural addictions (such as problem gambling, video gaming, internet use, exercise, compulsive shopping, and sex) and precarious employment. The results showed that precariously employed men were preoccupied with gambling and sex while their female counterparts preferred shopping. Gambling and excessive shopping diminished over time while excessive sexual practices increased. Study 2 focused on the association between precarious employment and substance abuse (such as tobacco, alcohol, cannabis, hallucinogens, stimulants, and other substances). The results showed that men used cannabis more than women, and the non-precarious employed group abused alcohol more than individuals in the precarious group. This research has implications for both health care professionals and intervention program developers when working with young adults in precarious jobs. Author Keywords: Behaviour Addictions, Precarious Employment, Substance Abuse, Young Adults
Exploring the Scalability of Deep Learning on GPU Clusters
In recent years, we have observed an unprecedented rise in popularity of AI-powered systems. They have become ubiquitous in modern life, being used by countless people every day. Many of these AI systems are powered, entirely or partially, by deep learning models. From language translation to image recognition, deep learning models are being used to build systems with unprecedented accuracy. The primary downside, is the significant time required to train the models. Fortunately, the time needed for training the models is reduced through the use of GPUs rather than CPUs. However, with model complexity ever increasing, training times even with GPUs are on the rise. One possible solution to ever-increasing training times is to use parallelization to enable the distributed training of models on GPU clusters. This thesis investigates how to utilise clusters of GPU-accelerated nodes to achieve the best scalability possible, thus minimising model training times. Author Keywords: Compute Canada, Deep Learning, Distributed Computing, Horovod, Parallel Computing, TensorFlow
Psychometric Properties of a Scale Developed from a Three-Factor Model of Social Competency
While existing models of emotional intelligence (EI) generally recognize the importance of social competencies (SC), there is a tendency in the literature to narrow the focus to competencies that pertain to the self. Given the experiential and perceptual differences between self- vs. other-oriented emotional abilities, this is an important limitation of existing EI models and assessment tools. This thesis explores the psychometric properties of a multidimensional model for SC. Chapter 1 describes the evolution of work on SCs in modern psychology and describes the multidimensional model of SC under review. Chapter 2 replicates this model across a variety of samples and explores the model’s construct validity via basic personality and EI constructs. Chapter 3 further explores the predictive validity of the SC measure within a group of project managers and several success and wellness variables. Chapter 4 examines potential applications for the model and suggestions for further research. Author Keywords: emotional intelligence, project management, social competency, work readiness
Cloud Versus Bare Metal
A comparison of two high performance computing clusters running on AWS and Sharcnet was done to determine which scenarios yield the best performance. Algorithm complexity ranged from O (n) to O (n3). Data sizes ranged from 195 KB to 2 GB. The Sharcnet hardware consisted of Intel E5-2683 and Intel E7-4850 processors with memory sizes ranging from 256 GB to 3072 GB. On AWS, C4.8xlarge instances were used, which run on Intel Xeon E5-2666 processors with 60 GB per instance. AWS was able to launch jobs immediately regardless of job size. The only limiting factors on AWS were algorithm complexity and memory usage, suggesting a memory bottleneck. Sharcnet had the best performance but could be hampered by the job scheduler. In conclusion, Sharcnet is best used when the algorithm is complex and has high memory usage. AWS is best used when immediate processing is required. Author Keywords: AWS, cloud, HPC, parallelism, Sharcnet
Predicting the Pursuit of Post-Secondary Education
Trait Emotional Intelligence (EI) includes competencies and dispositions related to identifying, understanding, using and managing emotions. Higher trait EI has been implicated in post-secondary success, and better career-related decision-making. However, there is no evidence for whether it predicts the pursuit of post-secondary education (PSE) in emerging adulthood. This study investigated the role of trait EI in PSE pursuit using a large, nationally-representative sample of Canadian young adults who participated in the National Longitudinal Survey for Children and Youth (NLSCY). Participants in this dataset reported on their PSE status at three biennial waves (age 20-21, 22-23, and 24-25), and completed a four-factor self-report scale for trait EI (Emotional Quotient Inventory: Mini) at ages 20-21 and 24-25. Higher trait EI subscale scores were significantly associated with greater likelihood of PSE participation both concurrently, and at 2- and 4-year follow-ups. Overall, these associations were larger for men than women. Trait EI scores also showed moderate levels of temporal stability over four years, including full configural and at least partial metric invariance between time points. This suggests that the measure stays conceptually consistent over the four years of emerging adulthood, and that trait EI is a relatively malleable attribute, susceptible to change with interventions during this age period. Author Keywords: Emerging Adulthood, Longitudinal, Post-Secondary Pursuit, Trait Emotional Intelligence
An Ethical Analysis of Bell's Targeted Ad Prorgram
Online behavioural advertising (OBA) is an advertising technique which relies on collected customer information and online activity to serve people with more relevant ads. On November 16th, 2013, Bell Canada launched their first OBA program via Bell Mobility: the Bell Targeted Ads Program, or BTAP. My thesis provides an ethical analysis of BTAP and shows that Bell undermined and violated customer privacy, stifled customer autonomy, and harmed customer identity. Relevant moral problems include typification, a disrespecting of customer autonomy, and identity commodification. I show that BTAP was unethical by grounding my arguments within the moral framework of Information Ethics (IE). IE is an ethical system which focuses on the role of information in the ethical dilemmas. IE also justifies the self-constitutive theory of privacy (SCP) which argues that our information and privacy are entangled with our identities. This gives us strong reason to defend our privacy/identity within BTAP. After making several arguments which demonstrate that BTAP was unethical, I will then turn my attention to showing how it is possible to rectify and mitigate many of BTAP’s ethical problems by installing a two-stage opt-in (TSOI) which provides customers with a greater deal of autonomy, and the ability to remove themselves from BTAP. Author Keywords: Bell Canada, Ethics, Identity, Online Behavioural Advertising, Privacy, Targeted Advertising
Range-Based Component Models for Conditional Volatility and Dynamic Correlations
Volatility modelling is an important task in the financial markets. This paper first evaluates the range-based DCC-CARR model of Chou et al. (2009) in modelling larger systems of assets, vis-à-vis the traditional return-based DCC-GARCH. Extending Colacito, Engle and Ghysels (2011), range-based volatility specifications are then employed in the first-stage of DCC-MIDAS conditional covariance estimation, including the CARR model of Chou et al. (2005). A range-based analog to the GARCH-MIDAS model of Engle, Ghysels and Sohn (2013) is also proposed and tested - which decomposes volatility into short- and long-run components and corrects for microstructure biases inherent to high-frequency price-range data. Estimator forecasts are evaluated and compared in a minimum-variance portfolio allocation experiment following the methodology of Engle and Colacito (2006). Some consistent inferences are drawn from the results, supporting the models proposed here as empirically relevant alternatives. Range-based DCC-MIDAS estimates produce efficiency gains over DCC-CARR which increase with portfolio size. Author Keywords: asset allocation, DCC MIDAS, dynamic correlations, forecasting, portfolio risk management, volatility
Agro-Ecological Zoning (AEZ) of Southern Ontario and the Projected Shifts Caused by Climate Change in the Long-term Future
This thesis proposes an agro-ecological zoning (AEZ) methodology of southern Ontario for the characterization and mapping of agro-ecological zones during the historical term (1981-2010), and their shifts into the long-term (2041-2070) projected climate period. Agro-ecological zones are homogenous areas with a unique combination of climate, soil, and landscape features that are important for crop growth. Future climate variables were derived from Earth System Models (EMSs) using a high emission climate forcing scenario from the Intergovernmental Panel on Climate Change 5th Assessment Report. The spatiotemporal shifts in agro-ecological zones with projected climate change are analyzed using the changes to the length of growing period (LGP) and crop heat units (CHU), and their manifestation in agro-climatic zones (ACZ). There are significant increases to the LGP and CHU into the long-term future. Two historical ACZs exist in the long-term future, and have decreased in area and shifted northward from their historical locations. Author Keywords: Agro-climatic Zones, Agro-ecological Zones, Agro-ecological Zoning, Climate Change, Crop Heat Units, Length of Growing Period
Disability-Mitigating Effects of Education on Post-Injury Employment Dynamics
Using data drawn from the Workplace Safety and Insurance Board’s (WSIB) Survey of Workers with Permanent Impairments, this thesis explores if and how the human capital associated with education mitigates the realized work-disabling effects of permanent physical injury. Using Cater’s (2000) model of post-injury adaptive behaviour and employment dynamics as the structural, theoretical, and interpretative framework, this thesis jointly studies, by injury type, the effects of education on both the post-injury probability of transitioning from non-employment into employment and the post-injury probability of remaining in employment once employed. The results generally show that, for a given injury type, other things being equal, higher levels of education are associated with higher probabilities of both obtaining and sustaining employment. Author Keywords: permanent impairment, permanent injury, post-injury employment
Machine Learning Using Topology Signatures For Associative Memory
This thesis presents a technique to produce signatures from topologies generated by the Growing Neural Gas algorithm. The generated signatures have the following characteristics: The signature's memory footprint is smaller than the "real object" and it represents a point in the n x m multidimensional space. Signatures can be compared based on Euclidean distance and distances between signatures provide measurements of differences between models. Signatures can be associated with a concept and then be used as a learning step for a classification algorithm. The signatures are normalized and vectorized to be used in a multidimensional space clustering. Although the technique is generic in essence, it was tested by classifying alphabet and numerical handwritten characters and 2D figures obtaining a good accuracy and precision. It can be used for many other purposes related to shapes and abstract typologies classification and associative memory. Future work could incorporate other classifiers. Author Keywords: Associative memory, Character recognition, Machine learning, Neural gas, Topological signatures, Unsupervised learning


Search Our Digital Collections


Enabled Filters

  • (-) ≠ Reid
  • (-) ≠ Bowman
  • (-) ≠ Bell
  • (-) ≠ Weygang
  • (-) = Applied Modeling and Quantitative Methods
  • (-) ≠ Mathematics

Filter Results


2011 - 2031
Specify date range: Show
Format: 2021/10/16