Graduate Theses & Dissertations

Machine Learning for Aviation Data
This thesis is part of an industry project which collaborates with an aviation technology company on pilot performance assessment. In this project, we propose utilizing the pilots' training data to develop a model that can recognize the pilots' activity patterns for evaluation. The data will present as a time series, representing a pilot's actions during maneuvers. In this thesis, the main contribution is focusing on a multivariate time series dataset, including preprocessing and transformation. The main difficulties in time series classification is the data sequence of the time dimension. In this thesis, I developed an algorithm which formats time series data into equal length data. Three classification and two transformation methods were used. In total, there are six models for comparison. The initial accuracy was 40%. By optimization through resampling, we increased the accuracy to 60%. Author Keywords: Data Mining, K-NN, Machine Learning, Multivariate Time Series Classification, Time Series Forest
Particulate Matter Component Analyses in Relation to Public Health in Canada
This thesis explores the shot-term relationship between exposure to ambient air pollution and human health through metrics such as mortality and hospitalization in Canada. We begin by detailing the organization and interpolation of air pollution data from its partially quality-controlled source form. Analyses of seasonal, regional and temporal trends of all major components of PM2.5, was performed, showing a seasonal variation across most regions and validating the dataset. A one-pollutant statistical Generalized Additive Model was applied to the data, estimating the health risk associated with exposure to thirteen different components of PM2.5. The selected components were based on those that compromised the majority of the mass and included: sulphate, nitrate, zinc, silicon, iron, nickel, vanadium, potassium, organic carbon, organic matter, elemental carbon, total carbon. Trends based on annual estimates of the association for PM2.5, and its constituents,were compared, showing that carbonaceous compounds, sulphate and nitrate had similar estimates of association. Many estimates, as is common in population ecologic epidemiology, had association estimates statistically indistinguishable from zero, but with clear features of interest, including evident differences between cold and warm season associations in Canada's temperate climate. A method to model two correlated pollutants (in this case, PM2.5 and O3) was developed using thin plate splines. In this approach, the location of the response surface (after accounting for the temperature, a smooth function of time and day of week) that corresponds to the average pollutant concentration and the average plus one unit was used as the estimate of the joint contribution of pollutants due to a unit increase. The estimates from the thin plate spline (TPS) approach were compared to the single pollutant models, with large increases and decreases in PM2.5 and O3 being captured in the TPS estimates. However, this approach indicated significantly larger error in the estimates than would be expected, indicating a possible future area for refinement. Author Keywords: Air pollution, Environmental Epidemiology, Generalized Additive Models, Human Health, Multivariate Models, Thin Plate Splines
Prescription Drugs
Medication used to treat human illness is one of the greatest developments in human history. In Canada, prescription drugs have been developed and made available to treat a wide variety of illnesses, from infections to heart disease and so on. Records of prescription drug fulfillment at coarse Canadian geographic scales were obtained from Health Canada in order to track the use of these drugs by the Canadian population. The obtained prescription drug fulfillment records were in a variety of inconsistent formats, including a large selection of years for which only paper tabular records were available (hard copies). In this work, we organize, digitize, proof and synthesize the full available data set of prescription drug records, from paper to final database. Extensive quality control was performed on the data before use. This data was then analyzed for temporal and spatial changes in prescription drug use across Canada from 1990-2013. In addition, one of major research areas in environmental epidemiological studies is the study of population health risk associated with exposure to ambient air pollution. Prescription drugs can moderate public health risk, by reducing the drug user's physiological symptoms and preventing acute health effects (e.g., strokes, heart attacks, etc.). The cleaned prescription drug data was considered in the context of a common model to examine its influence on the association between air pollution exposure and various health outcomes. Since, prescription drug data were available only at the provincial level, a Bayesian hierarchical model was employed to include the prescription drugs as a covariate at regional level, which were then combined to estimate the association at national level. Although further investigations are required, the study results suggest that the prescription drugs influenced the air pollution related public health risk. Author Keywords: Data, Error checking, Population health, Prescriptions
Modelling Depressive Symptoms in Emerging Adulthood
Depression during the transition into adulthood is a growing mental health concern, with overwhelming evidence linking the developmental risk for depressive symptoms with maternal depression. In addition, there is a lack of research on the protective role of socioemotional competencies in this context. This study examines independent and joint effects of maternal depression and trait emotional intelligence (TEI) on the longitudinal trajectory of depressive symptoms during emerging adulthood. A series of latent growth models was applied to three biennial cycles of data from a nationally representative sample (N=933) from the Canadian National Longitudinal Survey of Children and Youth. We assessed the trajectory of self-reported depressive symptoms from age 20 to 24 years, as well as whether it was moderated by maternal depression at age 10 to 11 and TEI at age 20, separately by gender. The results indicated that mean levels of depression declined during the emerging adulthood in females, but remained relatively stable in males. Maternal depressive symptoms significantly positively predicted depressive symptoms across the entire emerging adulthood in females, but only at age 20-21 for males. In addition, likelihood of developing depressive symptoms was attenuated by higher global TEI in both females and males, and additionally by higher interpersonal skills in males. Our findings suggest that interventions for depressive symptoms in emerging adulthood should consider development of socioemotional competencies. Author Keywords: Depression, Depressive Symptoms, Emerging Adulthood, Intergenerational Risk, Longitudinal, Trait Emotional Intelligence
Framework for Testing Time Series Interpolators
The spectrum of a given time series is a characteristic function describing its frequency properties. Spectrum estimation methods require time series data to be contiguous in order for robust estimators to retain their performance. This poses a fundamental challenge, especially when considering real-world scientific data that is often plagued by missing values, and/or irregularly recorded measurements. One area of research devoted to this problem seeks to repair the original time series through interpolation. There are several algorithms that have proven successful for the interpolation of considerably large gaps of missing data, but most are only valid for use on stationary time series: processes whose statistical properties are time-invariant, which is not a common property of real-world data. The Hybrid Wiener interpolator is a method that was designed for repairing nonstationary data, rendering it suitable for spectrum estimation. This thesis work presents a computational framework designed for conducting systematic testing on the statistical performance of this method in light of changes to gap structure and departures from the stationarity assumption. A comprehensive audit of the Hybrid Wiener Interpolator against other state-of-the art algorithms will also be explored. Author Keywords: applied statistics, hybrid wiener interpolator, imputation, interpolation, R statistical software, time series
Population-Level Ambient Pollution Exposure Proxies
The Air Health Trend Indicator (AHTI) is a joint Health Canada / Environment and Climate Change Canada initiative that seeks to model the Canadian national population health risk due to acute exposure to ambient air pollution. The common model in the field uses averages of local ambient air pollution monitors to produce a population-level exposure proxy variable. This method is applied to ozone, nitrogen dioxide, particulate matter, and other similar air pollutants. We examine the representative nature of these proxy averages on a large-scale Canadian data set, representing hundreds of monitors and dozens of city-level populations. The careful determination of temporal and spatial correlations between the disparate monitors allows for more precise estimation of population-level exposure, taking inspiration from the land-use regression models commonly used in geography. We conclude this work with an examination of the risk estimation differences between the original, simplistic population exposure metric and our new, revised metric. Author Keywords: Air Pollution, Population Health Risk, Spatial Process, Spatio-Temporal, Temporal Process, Time Series
Range-Based Component Models for Conditional Volatility and Dynamic Correlations
Volatility modelling is an important task in the financial markets. This paper first evaluates the range-based DCC-CARR model of Chou et al. (2009) in modelling larger systems of assets, vis-à-vis the traditional return-based DCC-GARCH. Extending Colacito, Engle and Ghysels (2011), range-based volatility specifications are then employed in the first-stage of DCC-MIDAS conditional covariance estimation, including the CARR model of Chou et al. (2005). A range-based analog to the GARCH-MIDAS model of Engle, Ghysels and Sohn (2013) is also proposed and tested - which decomposes volatility into short- and long-run components and corrects for microstructure biases inherent to high-frequency price-range data. Estimator forecasts are evaluated and compared in a minimum-variance portfolio allocation experiment following the methodology of Engle and Colacito (2006). Some consistent inferences are drawn from the results, supporting the models proposed here as empirically relevant alternatives. Range-based DCC-MIDAS estimates produce efficiency gains over DCC-CARR which increase with portfolio size. Author Keywords: asset allocation, DCC MIDAS, dynamic correlations, forecasting, portfolio risk management, volatility

Search Our Digital Collections

Query

Enabled Filters

  • (-) ≠ Reid
  • (-) ≠ Conolly
  • (-) ≠ Freeland
  • (-) ≠ Sustainability Studies
  • (-) ≠ Nol
  • (-) = Applied Modeling and Quantitative Methods
  • (-) = Statistics