Machine Learning

Predicting traffic using neural networks  

In this project I attempt to predict NYC traffic using LightGBM, RNN, GRU and LSTM models.

Tools: Python, Pandas, NumPy, PyTorch, Scikit-learn, Neural networks, Seaborn, Matplotlib

Models used: LightGBM, RNN, GRU, LSTM

Creating AI powered WhatsApp chatbot

In this project I created basic chatbot which identifies audio messages, converts them to .wav format with 16khz frequency. In addition, the bot identifies pictures images and saves only those that contain human face. 

Tools: Python, PyTorch, Flask, Twilio, Ngrok  

Models used: Single Shot MultiBox Detector

Experimenting with Diffusion models

Here I experiment with diffusion model using Denoising Diffusion Probabilistic Models (DDPM) and Denoising Diffusion Implicit Models (DDIM) 

Tools: Python, PyTorch, Diffusion models

Models used: DDPM and DDIM

Building an AutoEncoder for denoising handwritten images.

In this project we will be creating the Autoencoder model for denoising handwritten images using PyTorch.

Tools: Python, Pandas, NumPy, PyTorch, Matplotlib

Energy Forecasting Hackathon

This repository contains a data processing workflow for the Energy Forecasting Hackathon. For this forecasting task, the ARIMA (AutoRegressive Integrated Moving Average) model was selected. 

Tools: Python, Pandas, NumPy, PyTorch, Matplotlib

Models used: ARIMA (AutoRegressive Integrated Moving Average) 

Binary Classification Problem

In this project I created a model helping us to estimate credit card approval rate based on the features provided. I used the dataset from Kaggle named Credit card Details Binary Classification Problem.

Tools: Python, Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib

Models used: Decision Tree, Random Forest, XGBClassifier, Logistic Regression

Predicting car prices using KNN

In this project, I explore the fundamentals of machine learning using the k-nearest neighbors (KNN) algorithm for univariate and multivariate models. In addition, I apply hyperparameter optimization and various cross-validation methods to assess the performance of the model.

Tools: Python, Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib

Models used: k-nearest neighbors (KNN)

Data Visualisation

Star Wars Survey Visualisation

In this project, using mainly Matplotlib, I attempt to shed the light on some interesting data regarding Star Wars episodes, characters, fans and provide answers to some important questions, including 'Who shot first, Han or Greedo?'

Tools: Data Visualisation, Python, Pandas, NumPy, Matplotlib

Data Visualisation on Exchange Rates

During this project I applied both exploratory data visualization (graphs used for ourselves to better understand and explore data) as well as explanatory data visualization (creating graphs for others to inform, make a point, or tell a story) techniques. I maximized the data-ink ratio, created visual patterns with Gestalt principles and used pre-attentive attributes. 

Tools: Data Visualisation, Python, Pandas, NumPy, Matplotlib

Probability

Building a Spam Filter with Naive Bayes Algorithm

In this project, I am going to study the practical side of the multinomial Naive Bayes algorithm by building a spam filter for SMS messages. 

Tools: Python, Pandas, regular expression (RE)

Methods used: Naive Bayes algorithm 

Hypothesis Testing

In this project, I work with a dataset of Jeopardy questions to identify patterns that could potentially improve one's chances of winning this famous US TV show. I will be using the chi-squared test for categorical data, which allows me to determine the statistical significance of observing a set of categorical values.

Tools: Python, Pandas, NumPy, SciPy

Methods used: Chi-squared test

Mobile App for Lottery Addiction

In this project I intend to write functions required for the mobile app which helps lottery addicts better estimate their chances of winning a lottery. I write functions to calculate factorials and combinations, which will be used repeatedly throughout the project. 

Tools: Python, Pandas

Methods used: calculating factorials, calculating number of combinations

Statistics

Finding the Best Markets for Advertising

In this project, I analyse survey data to find the best two markets to advertise in. An e-learning company offering programming courses wants to identify the two best markets for advertising their subscription product. To minimize costs, data-driven approaches should be explored before considering expensive market surveys.

Tools: Python, Pandas, Matplotlib, Seaborn

Investigating Fandango Movie Ratings

In this project I apply all the knowledge learned about sampling, variables, scales of measurement, and frequency distributions. I learned about each of these topics in isolation. In this guided project, I attempt to make one step further and learn to combine all these skills to perform practical data analysis.

Tools: Python, Pandas, NumPy, Matplotlib

Web Scraping

Scraping Premier League site

In this project I attempt to scrape a sports stats website using BeautifulSoup and Requests libraries in order to get data needed for building a prediction model on winning football games in the Premier League.

Tools: Python, Pandas, Requests, BeautifulSoup

Data Cleaning

Analyzing NYC High School Data

In this project I attempt to investigate the correlations between SAT scores and demographic factors in New York City public schools including race, gender, AP exams and more. 

Tools: Python, Pandas, NumPy, Matplotlib, regular expression (RE)

Clean and Analyse Employee Exit Surveys

This project is aimed at applying data cleaning skills with pandas to practice.

Tools: Python, Pandas, NumPy, Matplotlib

Cleaning The COUGHVID crowdsourcing dataset

In this repository I clean and prepare the data for the project aimed at identifying the severity of cough due to air pollution in New Delhi using audio analysis and Machine Learning.

Tools: Python, Pandas, NumPy, Matplotlib, Seaborn, Librosa

All images by Microsoft Designer