Data Science, Analysis & Machine Learning Projects

Discover my latest projects, where I explore innovative ideas and review practical solutions in data, technology, and automation.

Data Cleaning Automation Framework

This project introduces a scalable and adaptable framework for cleaning large, real-world datasets. Using the Yelp dataset as a case study, it demonstrates reproducible methods for handling missing values, removing duplicates and inconsistencies, standardizing formats such as dates, text, and categories, and validating data quality before analysis.

2025 Developers Salary Prediction App

This mini project is a simple yet effective Salary Prediction ML App built using Python and Streamlit. The app predicts a developer’s estimated salary based on their country, education level, and years of experience. It’s an interactive tool designed to provide quick insights into expected salary ranges in 2025 for developers in various regions.

Predicting customer churn or attrition with call center dataset

This project uses the customer service call dataset to predict customer churn and identify the key features influencing this behavior. The dataset is sourced from the model data package in R, an MLC++ machine learning software challenge problem. It contains 19 input variables and 1 binary outcome highlighting customer characteristics and company interactions.

Linear Prediction Model with Trend Analysis and Uncertainty Estimation

This project involves generating and using random numbers as training data to illustrate how such a model can be constructed and analyzed. Define Relationships by establishing a linear relationship between the input and output data. Also modeling the output behavior by visualizing how the model’s output behaves and changes with different inputs.

Analysing NYPD 911 Calls Data for Enhanced Public Safety 

This data analysis project aims to explore, analyze, and derive actionable insights from the NYPD 911 system data, compiled from the Integrated Computer Assisted Dispatch (ICAD) system. The ICAD system is an essential tool used by call takers and dispatchers to manage interactions with the public and coordinate the NYPD’s response.

Historical Bank Stock Data Analysis

In this exploratory data analysis of stock prices using Pandas.
I will get and analyze stock information for the following banks:

  • Bank of America
  • CitiGroup
  • Goldman Sachs
  • JPMorgan Chase
  • Morgan Stanley
  • Wells Fargo

Beyond Stars Sentiment Analysis

This study use computational methods to detect and classify the sentiments, experiences, and opinions of buyers. Examining how these factors influence check-ins and purchase decisions. It involves categorizing business’s patronage based on reviews (sentiments), and reviews star ratings to uncover the underlying emotions driving buyers decision-making and business patronage.

CAPABILITIES

Tools and Program Language

ABOUT ME

A few words about me and my works

MY SKILLS

What I can do best & my experience

  • Dr. Mike Schmoker, Author
    Things get done only if the data we gather can inform and inspire those in a position to make a difference.
  • Stephen M. Stigler,  Professor of Statistics 
    Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion.
  • Andrew Ng, Computer Scientist and Global Leader in AI
    It is difficult to think of a major industry that AI will not transform. This includes healthcare, education, transportation, retail, communications, and agriculture. There are surprisingly clear paths for AI to make a big difference in all of these industries.