Data Science, Analysis & Machine Learning Projects
Discover my latest projects, where I explore innovative ideas and review practical solutions in data, technology, and automation.
Data Cleaning Automation Framework
This project introduces a scalable and adaptable framework for cleaning large, real-world datasets. Using the Yelp dataset as a case study, it demonstrates reproducible methods for handling missing values, removing duplicates and inconsistencies, standardizing formats such as dates, text, and categories, and validating data quality before analysis.
2025 Developers Salary Prediction App
This mini project is a simple yet effective Salary Prediction ML App built using Python and Streamlit. The app predicts a developer’s estimated salary based on their country, education level, and years of experience. It’s an interactive tool designed to provide quick insights into expected salary ranges in 2025 for developers in various regions.
Predicting customer churn or attrition with call center dataset
This project uses the customer service call dataset to predict customer churn and identify the key features influencing this behavior. The dataset is sourced from the model data package in R, an MLC++ machine learning software challenge problem. It contains 19 input variables and 1 binary outcome highlighting customer characteristics and company interactions.
Linear Prediction Model with Trend Analysis and Uncertainty Estimation
This project involves generating and using random numbers as training data to illustrate how such a model can be constructed and analyzed. Define Relationships by establishing a linear relationship between the input and output data. Also modeling the output behavior by visualizing how the model’s output behaves and changes with different inputs.
Analysing NYPD 911 Calls Data for Enhanced Public Safety
This data analysis project aims to explore, analyze, and derive actionable insights from the NYPD 911 system data, compiled from the Integrated Computer Assisted Dispatch (ICAD) system. The ICAD system is an essential tool used by call takers and dispatchers to manage interactions with the public and coordinate the NYPD’s response.
Historical Bank Stock Data Analysis
In this exploratory data analysis of stock prices using Pandas.
I will get and analyze stock information for the following banks:
- Bank of America
- CitiGroup
- Goldman Sachs
- JPMorgan Chase
- Morgan Stanley
- Wells Fargo
Beyond Stars Sentiment Analysis
This study use computational methods to detect and classify the sentiments, experiences, and opinions of buyers. Examining how these factors influence check-ins and purchase decisions. It involves categorizing business’s patronage based on reviews (sentiments), and reviews star ratings to uncover the underlying emotions driving buyers decision-making and business patronage.

CAPABILITIES
Tools and Program Language






WHAT, WHY, WHEN, WHERE…
Case Studies

Things get done only if the data we gather can inform and inspire those in a position to make a difference.
Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion.
It is difficult to think of a major industry that AI will not transform. This includes healthcare, education, transportation, retail, communications, and agriculture. There are surprisingly clear paths for AI to make a big difference in all of these industries.