Graduate Theses & Dissertations

Pages

Utilizing Class-Specific Thresholds Discovered by Outlier Detection
We investigated if the performance of selected supervised machine-learning techniques could be improved by combining univariate outlier-detection techniques and machine-learning methods. We developed a framework to discover class-specific thresholds in class probability estimates using univariate outlier detection and proposed two novel techniques to utilize these class-specific thresholds. These proposed techniques were applied to various data sets and the results were evaluated. Our experimental results suggest that some of our techniques may improve recall in the base learner. Additional results suggest that one technique may produce higher accuracy and precision than AdaBoost.M1, while another may produce higher recall. Finally, our results suggest that we can achieve higher accuracy, precision, or recall when AdaBoost.M1 fails to produce higher metric values than the base learner. Author Keywords: AdaBoost, Boosting, Classification, Class-Specific Thresholds, Machine Learning, Outliers
Time Series Algorithms in Machine Learning - A Graph Approach to Multivariate Forecasting
Forecasting future values of time series has long been a field with many and varied applications, from climate and weather forecasting to stock prediction and economic planning to the control of industrial processes. Many of these problems involve not only a single time series but many simultaneous series which may influence each other. This thesis provides methods based on machine learning of handling such problems. We first consider single time series with both single and multiple features. We review the algorithms and unique challenges involved in applying machine learning to time series. Many machine learning algorithms when used for regression are designed to produce a single output value for each timestamp of interest with no measure of confidence; however, evaluating the uncertainty of the predictions is an important component for practical forecasting. We therefore discuss methods of constructing uncertainty estimates in the form of prediction intervals for each prediction. Stability over long time horizons is also a concern for these algorithms as recursion is a common method used to generate predictions over long time intervals. To address this, we present methods of maintaining stability in the forecast even over large time horizons. These methods are applied to an electricity forecasting problem where we demonstrate the effectiveness for support vector machines, neural networks and gradient boosted trees. We next consider spatiotemporal problems, which consist of multiple interlinked time series, each of which may contain multiple features. We represent these problems using graphs, allowing us to learn relationships using graph neural networks. Existing methods of doing this generally make use of separate time and spatial (graph) layers, or simply replace operations in temporal layers with graph operations. We show that these approaches have difficulty learning relationships that contain time lags of several time steps. To address this, we propose a new layer inspired by the long-short term memory (LSTM) recurrent neural network which adds a distinct memory state dedicated to learning graph relationships while keeping the original memory state. This allows the model to consider temporally distant events at other nodes without affecting its ability to model long-term relationships at a single node. We show that this model is capable of learning the long-term patterns that existing models struggle with. We then apply this model to a number of real-world bike-share and traffic datasets where we observe improved performance when compared to other models with similar numbers of parameters. Author Keywords: forecasting, graph neural network, LSTM, machine learning, neural network, time series
THE PROPENSITY TOWARD EXTREMIST MIND-SET AS PREDICTED BY PERSONALITY, MOTIVATION, AND SELF-CONSTRUAL
ABSTRACT The Propensity Toward Extremist Mind-Set as Predicted by Personality, Motivation, and Self-Construal Nick Fauset Multivariate regression analyses were used to determine the effects of Personality (Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness), Motivation (External, Amotivation, Intrinsic, and Identified), and Self-Construal (Independent and Interdependent) on three domains of Extremist Mind-Set (Proviolence, Vile World, and Divine Power). Participants consisted of first year undergraduate students (209 females, 76 males) enrolled in Introductory Psychology (N=279) and/or Introductory Economics (N=7), whom participated for course credit. The Motivation measure was problematic for students to complete and this variable was dropped from the model due to missing data. Decreases in Neuroticism, Openness, Agreeableeness, and Interdependent were significantly correlated with increases in Proviolence. Decreases in Agreeableness were correlated with increases in Vile World. Decreases in Openness, and increases in Agreeableness and Interdependent were significantly correlated with increases in Divine Power. These observations provide an interesting perspective on the types of Canadian undergraduate students who are more likely to score highly on measures of Extremism. Keywords: Militant Extremist Mental Mind-Set, Extremism, Personality, Five Factor Model, Motivation, Intrinsic, Extrinsic, Self-Construal, Independent, Interdependent Author Keywords: Extremism, Militant Extremist Mental Mind-Set, Motivation, Personality, Self-Construal
Support Vector Machines for Automated Galaxy Classification
Support Vector Machines (SVMs) are a deterministic, supervised machine learning algorithm that have been successfully applied to many areas of research. They are heavily grounded in mathematical theory and are effective at processing high-dimensional data. This thesis models a variety of galaxy classification tasks using SVMs and data from the Galaxy Zoo 2 project. SVM parameters were tuned in parallel using resources from Compute Canada, and a total of four experiments were completed to determine if invariance training and ensembles can be utilized to improve classification performance. It was found that SVMs performed well at many of the galaxy classification tasks examined, and the additional techniques explored did not provide a considerable improvement. Author Keywords: Compute Canada, Kernel, SDSS, SHARCNET, Support Vector Machine, SVM
Stability Properties of Disease Models under Economic Expectations
Comprehending the dynamics of infectious diseases is very important in formulating public health policies to tackling their prevalence. Mathematical epidemiology (ME) has played a very vital role in achieving the above. Nevertheless, classical mathematical epidemiological models do not explicitly model the behavioural responses of individuals in the presence of prevalence of these diseases. Economic epidemiology (EE) as a field has stepped in to fill this gap by integrating economic and mathematical concepts within one framework. This thesis investigated two issues in this area. The methods employed are the standard linear analysis of stability of dynamical systems and numerical simulation. Below are the investigations and the findings of this thesis: Firstly, an investigation into the stability properties of the equilibria of EE models is carried out. We investigated the stability properties of modified EE systems studied by Aadland et al. [6] by introducing a parametric quadratic utility function into the model, thus making it possible to model the maximum number of contacts made by rational individuals to be determined by a parameter. This parameter in particular influences the level of utility of rational individuals. We have shown that if rational individuals have a range of possible contacts to choose from, with the maximum of the number of contacts allowable for these individuals being dependent on a parameter, the variation in this parameter tends to affect the stability properties of the system. We also showed that under the assumption of permanent recovery for disease coupled with individuals observing or not observing their immunity, death and birth rates can affect the stability of the system. These parameters also have effect on the dynamics of the EE SIS system. Secondly, an EE model of syphilis infectivity among &ldquo men who have sex with men &rdquo (MSM) in detention centres is developed in an attempt at looking at the effect of behavioural responses on the disease dynamics among MSM. This was done by explicitly incorporating the interplay of the biology of the disease and the behaviour of the inmates. We investigated the stability properties of the system under rational expectations where we showed that: (1) Behavioural responses to the prevalence of the disease affect the stability of the system. Therefore, public health policies have the tendency of putting the system on indeterminate paths if rational MSM have complete knowledge of the laws governing the motion of the disease states as well as a complete understanding on how others behave in the system when faced with risk-benefit trade-offs. (2) The prevalence of the disease in the long run is influenced by incentives that drive the utility of the MSM inmates. (3) The interplay between the dynamics of the biology of the disease and the behavioural responses of rational MSM tends to put the system at equilibrium quickly as compared to its counterpart (that is when the system is solely dependent on the biology of the disease) when subjected to small perturbation. Author Keywords: economic and mathematical epidemiology models, explosive path, indeterminate-path stability, numerical solution, health gap, saddle-path stability, syphilis,
Solving Differential and Integro-Differential Boundary Value Problems using a Numerical Sinc-Collocation Method Based on Derivative Interpolation
In this thesis, a new sinc-collocation method based upon derivative interpolation is developed for solving linear and nonlinear boundary value problems involving differential as well as integro-differential equations. The sinc-collocation method is chosen for its ease of implementation, exponential convergence of error, and ability to handle to singularities in the BVP. We present a unique method of treating boundary conditions and introduce the concept of the stretch factor into the conformal mappings of domains. The result is a method that achieves great accuracy while reducing computational cost. In most cases, the results from the method greatly exceed the published results of comparable methods in both accuracy and efficiency. The method is tested on the Blasius problem, the Lane-Emden problem and generalised to cover Fredholm-Volterra integro-differential problems. The results show that the sinc-collocation method with derivative interpolation is a viable and preferable method for solving nonlinear BVPs. Author Keywords: Blasius, Boundary Value Problem, Exponential convergence, Integro-differential, Nonlinear, Sinc
Smote and Performance Measures for Machine Learning Applied to Real-Time Bidding
In the context of Real-Time Bidding (RTB) the machine learning problems of imbalanced classes and model selection are investigated. Synthetic Minority Oversampling Technique (SMOTE) is commonly used to combat imbalanced classes but a shortcoming is identified. Use of a distance threshold is identified as a solution and testing in a live RTB environment shows significant improvement. For model selection, the statistical measure Critical Success Index (CSI) is modified to add emphasis on recall. This new measure (CSI-R) is empirically compared with other measures such as accuracy, lift, efficiency, true skill score, Heidke's skill score and Gilbert's skill score. In all cases CSI-R is shown to provide better application to the RTB industry. Author Keywords: imbalanced classes, machine learning, online advertising, performance measures, real-time bidding, SMOTE
Sinc-Collocation Difference Methods for Solving the Gross-Pitaevskii Equation
The time-dependent Gross-Pitaevskii Equation, describing the movement of parti- cles in quantum mechanics, may not be solved analytically due to its inherent non- linearity. Hence numerical methods are of importance to approximate the solution. This study develops a discrete scheme in time and space to simulate the solution defined in a finite domain by using the Crank-Nicolson difference method and Sinc Collocation Methods (SCM), respectively. In theory and practice, the time discretiz- ing system decays errors in the second-order of accuracy, and SCMs are decaying errors exponentially. A new SCM with a unique boundary treatment is proposed and compared with the original SCM and other similar numerical techniques in time costs and numerical errors. As a result, the new SCM decays errors faster than the original one. Also, to attain the same accuracy, the new SCM interpolates fewer nodes than the original SCM, which saves computational costs. The new SCM is capable of approximating partial differential equations under different boundary con- ditions, which can be extensively applied in fitting theory. Author Keywords: Crank-Nicolson difference method, Gross-Pitaevskii Equation, Sinc-Collocation methods
Self-Organizing Maps and Galaxy Evolution
Artificial Neural Networks (ANN) have been applied to many areas of research. These techniques use a series of object attributes and can be trained to recognize different classes of objects. The Self-Organizing Map (SOM) is an unsupervised machine learning technique which has been shown to be successful in the mapping of high-dimensional data into a 2D representation referred to as a map. These maps are easier to interpret and aid in the classification of data. In this work, the existing algorithms for the SOM have been extended to generate 3D maps. The higher dimensionality of the map provides for more information to be made available to the interpretation of classifications. The effectiveness of the implementation was verified using three separate standard datasets. Results from these investigations supported the expectation that a 3D SOM would result in a more effective classifier. The 3D SOM algorithm was then applied to an analysis of galaxy morphology classifications. It is postulated that the morphology of a galaxy relates directly to how it will evolve over time. In this work, the Spectral Energy Distribution (SED) will be used as a source for galaxy attributes. The SED data was extracted from the NASA Extragalactic Database (NED). The data was grouped into sample sets of matching frequencies and the 3D SOM application was applied as a morphological classifier. It was shown that the SOMs created were effective as an unsupervised machine learning technique to classify galaxies based solely on their SED. Morphological predictions for a number of galaxies were shown to be in agreement with classifications obtained from new observations in NED. Author Keywords: Galaxy Morphology, Multi-wavelength, parallel, Self-Organizing Maps
SPAF-network with Saturating Pretraining Neurons
In this work, various aspects of neural networks, pre-trained with denoising autoencoders (DAE) are explored. To saturate neurons more quickly for feature learning in DAE, an activation function that offers higher gradients is introduced. Moreover, the introduction of sparsity functions applied to the hidden layer representations is studied. More importantly, a technique that swaps the activation functions of fully trained DAE to logistic functions is studied, networks trained using this technique are reffered to as SPAF-networks. For evaluation, the popular MNIST dataset as well as all \(3\) sub-datasets of the Chars74k dataset are used for classification purposes. The SPAF-network is also analyzed for the features it learns with a logistic, ReLU and a custom activation function. Lastly future roadmap is proposed for enhancements to the SPAF-network. Author Keywords: Artificial Neural Network, AutoEncoder, Machine Learning, Neural Networks, SPAF network, Unsupervised Learning
Representation Learning with Restorative Autoencoders for Transfer Learning
Deep Neural Networks (DNNs) have reached human-level performance in numerous tasks in the domain of computer vision. DNNs are efficient for both classification and the more complex task of image segmentation. These networks are typically trained on thousands of images, which are often hand-labelled by domain experts. This bottleneck creates a promising research area: training accurate segmentation networks with fewer labelled samples. This thesis explores effective methods for learning deep representations from unlabelled images. We train a Restorative Autoencoder Network (RAN) to denoise synthetically corrupted images. The weights of the RAN are then fine-tuned on a labelled dataset from the same domain for image segmentation. We use three different segmentation datasets to evaluate our methods. In our experiments, we demonstrate that through our methods, only a fraction of data is required to achieve the same accuracy as a network trained with a large labelled dataset. Author Keywords: deep learning, image segmentation, representation learning, transfer learning
Relationship Between Precarious Employment, Behaviour Addictions and Substance Use Among Canadian Young Adults
This thesis utilized a unique data-set, the Quinte Longitudinal Survey, to explore relationships among precarious employment and a range of mental health problems in a representative sample of Ontario young adults. Study 1 focused on various behavioural addictions (such as problem gambling, video gaming, internet use, exercise, compulsive shopping, and sex) and precarious employment. The results showed that precariously employed men were preoccupied with gambling and sex while their female counterparts preferred shopping. Gambling and excessive shopping diminished over time while excessive sexual practices increased. Study 2 focused on the association between precarious employment and substance abuse (such as tobacco, alcohol, cannabis, hallucinogens, stimulants, and other substances). The results showed that men used cannabis more than women, and the non-precarious employed group abused alcohol more than individuals in the precarious group. This research has implications for both health care professionals and intervention program developers when working with young adults in precarious jobs. Author Keywords: Behaviour Addictions, Precarious Employment, Substance Abuse, Young Adults

Pages

Search Our Digital Collections

Query

Enabled Filters

  • (-) ≠ English literature
  • (-) = Applied Modeling and Quantitative Methods

Filter Results

Date

2004 - 2024
(decades)
Specify date range: Show
Format: 2024/03/19