Graduate Theses & Dissertations


Time Series Algorithms in Machine Learning - A Graph Approach to Multivariate Forecasting
Forecasting future values of time series has long been a field with many and varied applications, from climate and weather forecasting to stock prediction and economic planning to the control of industrial processes. Many of these problems involve not only a single time series but many simultaneous series which may influence each other. This thesis provides methods based on machine learning of handling such problems. We first consider single time series with both single and multiple features. We review the algorithms and unique challenges involved in applying machine learning to time series. Many machine learning algorithms when used for regression are designed to produce a single output value for each timestamp of interest with no measure of confidence; however, evaluating the uncertainty of the predictions is an important component for practical forecasting. We therefore discuss methods of constructing uncertainty estimates in the form of prediction intervals for each prediction. Stability over long time horizons is also a concern for these algorithms as recursion is a common method used to generate predictions over long time intervals. To address this, we present methods of maintaining stability in the forecast even over large time horizons. These methods are applied to an electricity forecasting problem where we demonstrate the effectiveness for support vector machines, neural networks and gradient boosted trees. We next consider spatiotemporal problems, which consist of multiple interlinked time series, each of which may contain multiple features. We represent these problems using graphs, allowing us to learn relationships using graph neural networks. Existing methods of doing this generally make use of separate time and spatial (graph) layers, or simply replace operations in temporal layers with graph operations. We show that these approaches have difficulty learning relationships that contain time lags of several time steps. To address this, we propose a new layer inspired by the long-short term memory (LSTM) recurrent neural network which adds a distinct memory state dedicated to learning graph relationships while keeping the original memory state. This allows the model to consider temporally distant events at other nodes without affecting its ability to model long-term relationships at a single node. We show that this model is capable of learning the long-term patterns that existing models struggle with. We then apply this model to a number of real-world bike-share and traffic datasets where we observe improved performance when compared to other models with similar numbers of parameters. Author Keywords: forecasting, graph neural network, LSTM, machine learning, neural network, time series
Assessing the Cost of Reproduction between Male and Female Sex Functions in Hermaphroditic Plants
The cost of reproduction refers to the use of resources for the production of offspring that decreases the availability of resources for future reproductive events and other biological processes. Models of sex-allocation provide insights into optimal patterns of resource investment in male and female sex functions and have been extended to include other components of the life history, enabling assessment of the costs of reproduction. These models have shown that, in general, costs of reproduction through female function should usually exceed costs through male function. However, those previous models only considered allocations from a single pool of shared resources. Recent studies have indicated that the type of resource currency can differ for female and male sex functions, and that this might affect costs of reproduction via effects on other components of the life history. Using multiple invasibility analysis, this study examined resource allocation to male and female sex functions, while simultaneously considering allocations to survival and growth. Allocation patterns were modelled using both shared and separate resource pools. Under shared resources, allocation patterns to male and female sex function followed the results of earlier models. When resource pools were separate, however, allocations to male function often exceeded allocations to female function, even if fitness gains increased less strongly with investment in male function than with investment in female function. These results demonstrate that the costs of reproduction are affected by (1) the types of resources needed for reproduction via female or male function and (2) via trade-offs with other components of the life history. Future studies of the costs of reproduction should examine whether allocations to reproduction via female versus male function usually entail the use of different types of resources. Author Keywords: Cost of Reproduction, Gain Curve, Life History, Resource Allocation Patterns, Resource Currencies
Modelling Depressive Symptoms in Emerging Adulthood
Depression during the transition into adulthood is a growing mental health concern, with overwhelming evidence linking the developmental risk for depressive symptoms with maternal depression. In addition, there is a lack of research on the protective role of socioemotional competencies in this context. This study examines independent and joint effects of maternal depression and trait emotional intelligence (TEI) on the longitudinal trajectory of depressive symptoms during emerging adulthood. A series of latent growth models was applied to three biennial cycles of data from a nationally representative sample (N=933) from the Canadian National Longitudinal Survey of Children and Youth. We assessed the trajectory of self-reported depressive symptoms from age 20 to 24 years, as well as whether it was moderated by maternal depression at age 10 to 11 and TEI at age 20, separately by gender. The results indicated that mean levels of depression declined during the emerging adulthood in females, but remained relatively stable in males. Maternal depressive symptoms significantly positively predicted depressive symptoms across the entire emerging adulthood in females, but only at age 20-21 for males. In addition, likelihood of developing depressive symptoms was attenuated by higher global TEI in both females and males, and additionally by higher interpersonal skills in males. Our findings suggest that interventions for depressive symptoms in emerging adulthood should consider development of socioemotional competencies. Author Keywords: Depression, Depressive Symptoms, Emerging Adulthood, Intergenerational Risk, Longitudinal, Trait Emotional Intelligence
Characteristics of Models for Representation of Mathematical Structure in Typesetting Applications and the Cognition of Digitally Transcribing Mathematics
The digital typesetting of mathematics can present many challenges to users, especially those of novice to intermediate experience levels. Through a series of experiments, we show that two models used to represent mathematical structure in these typesetting applications, the 1-dimensional structure based model and the 2-dimensional freeform model, cause interference with users' working memory during the process of transcribing mathematical content. This is a notable finding as a connection between working memory and mathematical performance has been established in the literature. Furthermore, we find that elements of these models allow them to handle various types of mathematical notation with different degrees of success. Notably, the 2-dimensional freeform model allows users to insert and manipulate exponents with increased efficiency and reduced cognitive load and working memory interference while the 1-dimensional structure based model allows for handling of the fraction structure with greater efficiency and decreased cognitive load. Author Keywords: mathematical cognition, mathematical software, user experience, working memory
Development of a Cross-Platform Solution for Calculating Certified Emission Reduction Credits in Forestry Projects under the Kyoto Protocol of the UNFCCC
This thesis presents an exploration of the requirements for and development of a software tool to calculate Certified Emission Reduction (CERs) credits for afforestation and reforestation projects conducted under the Clean Development Mechanism (CDM). We examine the relevant methodologies and tools to determine what is required to create a software package that can support a wide variety of projects involving a large variety of data and computations. During the requirements gathering, it was determined that the software package developed would need to support the ability to enter and edit equations at runtime. To create the software we used Java for the programming language, an H2 database to store our data, and an XML file to store our configuration settings. Through these choices, we can build a cross-platform software solution for the purpose outlined above. The end result is a versatile software tool through which users can create and customize projects to meet their unique needs as well as utilize the features provided to streamline the management of their CDM projects. Author Keywords: Carbon Emissions, Climate Change, Forests, Java, UNFCCC, XML
Educational Data Mining and Modelling on Trent University Students’ Academic Performance
Higher education is important. It enhances both individual and social welfare by improving productivity, life satisfaction, and health outcomes, and by reducing rates of crime. Universities play a critical role in providing that education. Because academic institutions face resource constraints, it is thus important that they deploy resources in support of student success in the most efficient ways possible. To inform that efficient deployment, this research analyzes institutional data reflecting undergraduate student performance to identify predictors of student success measured by GPA, rates of credit accumulation, and graduation rates. Using methods of cluster analysis and machine learning, the analysis yields predictions for the probabilities of individual success. Author Keywords: Educational data mining, Students’ academic performance modelling
Sinc-Collocation Difference Methods for Solving the Gross-Pitaevskii Equation
The time-dependent Gross-Pitaevskii Equation, describing the movement of parti- cles in quantum mechanics, may not be solved analytically due to its inherent non- linearity. Hence numerical methods are of importance to approximate the solution. This study develops a discrete scheme in time and space to simulate the solution defined in a finite domain by using the Crank-Nicolson difference method and Sinc Collocation Methods (SCM), respectively. In theory and practice, the time discretiz- ing system decays errors in the second-order of accuracy, and SCMs are decaying errors exponentially. A new SCM with a unique boundary treatment is proposed and compared with the original SCM and other similar numerical techniques in time costs and numerical errors. As a result, the new SCM decays errors faster than the original one. Also, to attain the same accuracy, the new SCM interpolates fewer nodes than the original SCM, which saves computational costs. The new SCM is capable of approximating partial differential equations under different boundary con- ditions, which can be extensively applied in fitting theory. Author Keywords: Crank-Nicolson difference method, Gross-Pitaevskii Equation, Sinc-Collocation methods
Combinatorial Collisions in Database Matching
Databases containing information such as location points, web searches and fi- nancial transactions are becoming the new normal as technology advances. Conse- quentially, searches and cross-referencing in big data are becoming a common prob- lem as computing and statistical analysis increasingly allow for the contents of such databases to be analyzed and dredged for data. Searches through big data are fre- quently done without a hypothesis formulated before hand, and as these databases grow and become more complex, the room for error also increases. Regardless of how these searches are framed, the data they collect may lead to false convictions. DNA databases may be of particular interest, since DNA is often viewed as significant evi- dence, however, such evidence is sometimes not interpreted in a proper manner in the court room. In this thesis, we present and validate a framework for investigating var- ious collisions within databases using Monte Carlo Simulations, with examples from DNA. We also discuss how DNA evidence may be wrongly portrayed in the court room, and the explanation behind this. We then outline the problem which may occur when numerous types of databases are searched for suspects, and framework to address these problems. Author Keywords: big data analysis, collisions, database searches, DNA databases, monte carlo simulation
Pathways to Innovation
Research and development activities conducted at universities and firms fuel economic growth and play a key role in the process of innovation. Specifically, prior research has investigated the widespread university-to-firm research development path and concluded that universities are better suited for early stage of research while firms are better positioned for later stages. This thesis aims to present a novel explanation for the pervasive university-to-firm research development path. The model developed uses game theory to visualize and analyze interactions between a firm and university under different strategies. The results reveal that as academic research signals knowledge it helps attract tuition paying students. Generating these tuition revenues is facilitated by university research discoveries, which, once published, a firm can build upon to make new innovative products. In an environment of weak intellectual property rights, moreover, the university-to-firm research development path enables firms to bypass the hefty costs that are involved in basic research activities. The model also provides a range of solution scenarios where a university and firm may find it viable to initiate a research line. Author Keywords: Game theory, Intellectual property rights, Nash equilibrium, Research and development, University to-firm research path
Automated Grading of UML Class Diagrams
Learning how to model the structural properties of a problem domain or an object-oriented design in form of a class diagram is an essential learning task in many software engineering courses. Since grading UML assignments is a cumbersome and time-consuming task, there is a need for an automated grading approach that can assist the instructors by speeding up the grading process, as well as ensuring consistency and fairness for large classrooms. This thesis presents an approach for automated grading of UML class diagrams. A metamodel is proposed to establish mappings between the instructor solution and all the solutions for a class, which allows the instructor to easily adjust the grading scheme. The approach uses a grading algorithm that uses syntactic, semantic and structural matching to match a student's solutions with the instructor's solution. The efficiency of this automated grading approach has been empirically evaluated when applied in two real world settings: a beginner undergraduate class of 103 students required to create a object-oriented design model, and an advanced undergraduate class of 89 students elaborating a domain model. The experiment result shows that the grading approach should be configurable so that the grading approach can adapt the grading strategy and strictness to the level of the students and the grading styles of the different instructors. Also it is important to considering multiple solution variants in the grading process. The grading algorithm and tool are proposed and validated experimentally. Author Keywords: automated grading, class diagrams, model comparison
Solving Differential and Integro-Differential Boundary Value Problems using a Numerical Sinc-Collocation Method Based on Derivative Interpolation
In this thesis, a new sinc-collocation method based upon derivative interpolation is developed for solving linear and nonlinear boundary value problems involving differential as well as integro-differential equations. The sinc-collocation method is chosen for its ease of implementation, exponential convergence of error, and ability to handle to singularities in the BVP. We present a unique method of treating boundary conditions and introduce the concept of the stretch factor into the conformal mappings of domains. The result is a method that achieves great accuracy while reducing computational cost. In most cases, the results from the method greatly exceed the published results of comparable methods in both accuracy and efficiency. The method is tested on the Blasius problem, the Lane-Emden problem and generalised to cover Fredholm-Volterra integro-differential problems. The results show that the sinc-collocation method with derivative interpolation is a viable and preferable method for solving nonlinear BVPs. Author Keywords: Blasius, Boundary Value Problem, Exponential convergence, Integro-differential, Nonlinear, Sinc
Fraud Detection in Financial Businesses Using Data Mining Approaches
The purpose of this research is to apply four methods on two data sets, a Synthetic dataset and a Real-World dataset, and compare the results to each other with the intention of arriving at methods to prevent fraud. Methods used include Logistic Regression, Isolation Forest, Ensemble Method and Generative Adversarial Networks. Results show that all four models achieve accuracies between 91% and 99% except Isolation Forest gave 69% accuracy for the Synthetic dataset. The four models detect fraud well when built on a training set and tested with a test set. Logistic Regression achieves good results with less computational eorts. Isolation Forest achieve lower results accuracies when the data is sparse and not preprocessed correctly. Ensemble Models achieve the highest accuracy for both datasets. GAN achieves good results but overts if a big number of epochs was used. Future work could incorporate other classiers. Author Keywords: Ensemble Method, GAN, Isolation forest, Logistic Regression, Outliers


Search Our Digital Collections


Enabled Filters

  • (-) ≠ Holdsworth
  • (-) = Applied Modeling and Quantitative Methods

Filter Results


2011 - 2031
Specify date range: Show
Format: 2021/09/19