Graduate Theses & Dissertations

Pages

: Utilizing Class-Specific Thresholds Discovered by Outlier Detection; We investigated if the performance of selected supervised machine-learning techniques could be improved by combining univariate outlier-detection techniques and machine-learning methods. We developed a framework to discover class-specific thresholds in class probability estimates using univariate outlier detection and proposed two novel techniques to utilize these class-specific thresholds. These proposed techniques were applied to various data sets and the results were evaluated. Our experimental results suggest that some of our techniques may improve recall in the base learner. Additional results suggest that one technique may produce higher accuracy and precision than AdaBoost.M1, while another may produce higher recall. Finally, our results suggest that we can achieve higher accuracy, precision, or recall when AdaBoost.M1 fails to produce higher metric values than the base learner. Author Keywords: AdaBoost, Boosting, Classification, Class-Specific Thresholds, Machine Learning, Outliers

: An Investigation of Load Balancing in a Distributed Web Caching System; With the exponential growth of the Internet, performance is an issue as bandwidth is often limited. A scalable solution to reduce the amount of bandwidth required is Web caching. Web caching (especially at the proxy-level) has been shown to be quite successful at addressing this issue. However as the number and needs of the clients grow, it becomes infeasible and inefficient to have just a single Web cache. To address this concern, the Web caching system can be set up in a distributed manner, allowing multiple machines to work together to meet the needs of the clients. Furthermore, it is also possible that further efficiency could be achieved by balancing the workload across all the Web caches in the system. This thesis investigates the benefits of load balancing in a distributed Web caching environment in order to improve the response times and help reduce bandwidth. Author Keywords: adaptive load sharing, Distributed systems, Load Balancing, Simulation, Web Caching

: ADAPT; This thesis focuses on the design of a modelling framework consisting of loose-coupling of a sequence of spatial and process models and procedures necessary to predict future flood events for the years 2030 and 2050 in Tabasco Mexico. Temperature and precipitation data from the Hadley Centers Coupled Model (HadCM3), for those future years were downscaled using the Statistical Downscaling Model (SDSM4.2.9). These data were then used along with a variety of digital spatial data and models (current land use, soil characteristics, surface elevation and rivers) to parameterize the Soil Water Assessment Tool (SWAT) model and predict flows. Flow data were then input into the Hydrological Engineering Centers-River Analysis System (HEC-RAS) model. This model mapped the areas that are expected to be flooded based on the predicted flow values. Results from this modelling sequence generate images of flood extents, which are then ported to an online tool (ADAPT) for display. The results of this thesis indicate that with current prediction of climate change the city of Villahermosa, Tabasco, Mexico, and the surrounding area will experience a substantial amount of flooding. Therefore there is a need for adaptation planning to begin immediately. Author Keywords: Adaptation Planning, Climate Change, Extreme Weather Events, Flood Planning, Simulation Modelling

: An Investigation of the Impact of Big Data on Bioinformatics Software; As the generation of genetic data accelerates, Big Data has an increasing impact on the way bioinformatics software is used. The experiments become larger and more complex than originally envisioned by software designers. One way to deal with this problem is to use parallel computing. Using the program Structure as a case study, we investigate ways in which to counteract the challenges created by the growing datasets. We propose an OpenMP and an OpenMP-MPI hybrid parallelization of the MCMC steps, and analyse the performance in various scenarios. The results indicate that the parallelizations produce significant speedups over the serial version in all scenarios tested. This allows for using the available hardware more efficiently, by adapting the program to the parallel architecture. This is important because not only does it reduce the time required to perform existing analyses, but it also opens the door to new analyses, which were previously impractical. Author Keywords: Big Data, HPC, MCMC, parallelization, speedup, Structure

In collections

Trent University Digital Collections

Graduate Theses & Dissertations

Pages

Pages

Description

In collections

Search Our Digital Collections

Query

Enabled Filters

Sort Results

Filter Results

Date

Collection

Author Name

Name (Any)

Degree

Subject (Topic)

Display

Trent University Library & Archives

About

Accessibility

Copyright

Graduate Theses & Dissertations

You are here

Pages

Pages

Description

In collections

Search Our Digital Collections

Query

Enabled Filters

Sort Results

Filter Results

Date

Collection

Author Name

Name (Any)

Degree

Subject (Topic)

Display

Trent University Library & Archives

About

Accessibility

Copyright