Graduate Theses & Dissertations

Pages

Utilizing Class-Specific Thresholds Discovered by Outlier Detection
We investigated if the performance of selected supervised machine-learning techniques could be improved by combining univariate outlier-detection techniques and machine-learning methods. We developed a framework to discover class-specific thresholds in class probability estimates using univariate outlier detection and proposed two novel techniques to utilize these class-specific thresholds. These proposed techniques were applied to various data sets and the results were evaluated. Our experimental results suggest that some of our techniques may improve recall in the base learner. Additional results suggest that one technique may produce higher accuracy and precision than AdaBoost.M1, while another may produce higher recall. Finally, our results suggest that we can achieve higher accuracy, precision, or recall when AdaBoost.M1 fails to produce higher metric values than the base learner. Author Keywords: AdaBoost, Boosting, Classification, Class-Specific Thresholds, Machine Learning, Outliers
An Investigation of Load Balancing in a Distributed Web Caching System
With the exponential growth of the Internet, performance is an issue as bandwidth is often limited. A scalable solution to reduce the amount of bandwidth required is Web caching. Web caching (especially at the proxy-level) has been shown to be quite successful at addressing this issue. However as the number and needs of the clients grow, it becomes infeasible and inefficient to have just a single Web cache. To address this concern, the Web caching system can be set up in a distributed manner, allowing multiple machines to work together to meet the needs of the clients. Furthermore, it is also possible that further efficiency could be achieved by balancing the workload across all the Web caches in the system. This thesis investigates the benefits of load balancing in a distributed Web caching environment in order to improve the response times and help reduce bandwidth. Author Keywords: adaptive load sharing, Distributed systems, Load Balancing, Simulation, Web Caching
ADAPT
This thesis focuses on the design of a modelling framework consisting of loose-coupling of a sequence of spatial and process models and procedures necessary to predict future flood events for the years 2030 and 2050 in Tabasco Mexico. Temperature and precipitation data from the Hadley Centers Coupled Model (HadCM3), for those future years were downscaled using the Statistical Downscaling Model (SDSM4.2.9). These data were then used along with a variety of digital spatial data and models (current land use, soil characteristics, surface elevation and rivers) to parameterize the Soil Water Assessment Tool (SWAT) model and predict flows. Flow data were then input into the Hydrological Engineering Centers-River Analysis System (HEC-RAS) model. This model mapped the areas that are expected to be flooded based on the predicted flow values. Results from this modelling sequence generate images of flood extents, which are then ported to an online tool (ADAPT) for display. The results of this thesis indicate that with current prediction of climate change the city of Villahermosa, Tabasco, Mexico, and the surrounding area will experience a substantial amount of flooding. Therefore there is a need for adaptation planning to begin immediately. Author Keywords: Adaptation Planning, Climate Change, Extreme Weather Events, Flood Planning, Simulation Modelling
An Investigation of the Impact of Big Data on Bioinformatics Software
As the generation of genetic data accelerates, Big Data has an increasing impact on the way bioinformatics software is used. The experiments become larger and more complex than originally envisioned by software designers. One way to deal with this problem is to use parallel computing. Using the program Structure as a case study, we investigate ways in which to counteract the challenges created by the growing datasets. We propose an OpenMP and an OpenMP-MPI hybrid parallelization of the MCMC steps, and analyse the performance in various scenarios. The results indicate that the parallelizations produce significant speedups over the serial version in all scenarios tested. This allows for using the available hardware more efficiently, by adapting the program to the parallel architecture. This is important because not only does it reduce the time required to perform existing analyses, but it also opens the door to new analyses, which were previously impractical. Author Keywords: Big Data, HPC, MCMC, parallelization, speedup, Structure

Pages

Search Our Digital Collections

Query

Enabled Filters

  • (-) ≠ Reid
  • (-) ≠ Bowman
  • (-) = Computer science
  • (-) ≠ Weygang
  • (-) = Applied Modeling and Quantitative Methods
  • (-) ≠ Mathematics
  • (-) ≠ McIntyre, Gregory