Data Science

Big Data Analytics Capstone: Air Quality Analysis

This project examines 600,000 EPA air quality measurements taken over the span of 2000 - 2021. The report and presentation contain time-series visualization, geospatial mapping, non-parametric hypothesis testing, and machine learning methods such as cluster analysis, regression analysis, and classification.

This project demonstrates my skills in:

  • Data Mining & Visualization
  • Machine Learning (Supervised & Unsupervised)
  • Statistical Analysis
  • Distribution Analysis
Report Slides Interactive Map (2010)

CS-453 Final Project: What Makes a Song Popular?

Abstract:

A song can be described as a set of numerical attributes representing different features of the sound. The goal of this project is to use a dataset of 50,000 songs and machine learning techniques to determine whether a song’s popularity or genre can be predicted from those numerical attributes. The project found that predicting a song’s popularity using the given numerical attributes is not possible with the data, nor is classification from a set of ten genres. However, it was found that machine learning techniques can be used to differentiate between Rock and Jazz songs

This project demonstrates my skills in:

  • Data Mining & Visualization
  • Classification
  • Data Preprocessing
  • Distribution Analysis
Report Slides

Geospatial Data

CS-483 Assignment: Create a leaflet interactive map

Using the R programming language and the leaflet library, I constructed an interactive map displaying the estimated food insecurity, unemployment rate, and population among all census tracts in the Washington, DC area. The data was provided by Open Data DC.

This visualization and the corresponding code demonstrate my skills in:

  • R (Leaflet, sp libraries)
  • Layer Control
  • Data Visualization & UI Customization
  • Clean Coding Practices
Interactive Map R Code

MA-304 Final Project: Effect of Cigarettes on Birth Weight

Abstract:

An investigation into the effects of cigarettes and alcohol on birthweight was approved by the North Carolina State Center for Health and Environmental Statistics following the collection of 1000 pieces of data. The investigation will examine whether smoking cigarettes or drinking alcohol while pregnant corresponds to a lower birthweight. The investigation concluded that while birthweight tended to be lower for children of mothers who drank alcohol or smoked cigarettes during pregnancy, only cigarettes correlated with a significantly lower birthweight of the child when all other factors were accounted for.

This project demonstrates my skills in:

  • Permutation Tests
  • Hypothesis Tests
  • Data Visualization
Report