Machine Learning
Predicting traffic using neural networks
In this project I attempt to predict NYC traffic using LightGBM, RNN, GRU and LSTM models.
Tools: Python, Pandas, NumPy, PyTorch, Scikit-learn, Neural networks, Seaborn, Matplotlib
Models used: LightGBM, RNN, GRU, LSTM
Creating AI powered WhatsApp chatbot
In this project I created basic chatbot which identifies audio messages, converts them to .wav format with 16khz frequency. In addition, the bot identifies pictures images and saves only those that contain human face.
Tools: Python, PyTorch, Flask, Twilio, Ngrok
Models used: Single Shot MultiBox Detector
Experimenting with Diffusion models
Here I experiment with diffusion model using Denoising Diffusion Probabilistic Models (DDPM) and Denoising Diffusion Implicit Models (DDIM)
Tools: Python, PyTorch, Diffusion models
Models used: DDPM and DDIM
Building an AutoEncoder for denoising handwritten images.
In this project we will be creating the Autoencoder model for denoising handwritten images using PyTorch.
Tools: Python, Pandas, NumPy, PyTorch, Matplotlib
Energy Forecasting Hackathon
This repository contains a data processing workflow for the Energy Forecasting Hackathon. For this forecasting task, the ARIMA (AutoRegressive Integrated Moving Average) model was selected.
Tools: Python, Pandas, NumPy, PyTorch, Matplotlib
Models used: ARIMA (AutoRegressive Integrated Moving Average)
Binary Classification Problem
In this project I created a model helping us to estimate credit card approval rate based on the features provided. I used the dataset from Kaggle named Credit card Details Binary Classification Problem.
Tools: Python, Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib
Models used: Decision Tree, Random Forest, XGBClassifier, Logistic Regression
Predicting car prices using KNN
In this project, I explore the fundamentals of machine learning using the k-nearest neighbors (KNN) algorithm for univariate and multivariate models. In addition, I apply hyperparameter optimization and various cross-validation methods to assess the performance of the model.
Tools: Python, Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib
Models used: k-nearest neighbors (KNN)
Data Visualisation
Star Wars Survey Visualisation
In this project, using mainly Matplotlib, I attempt to shed the light on some interesting data regarding Star Wars episodes, characters, fans and provide answers to some important questions, including 'Who shot first, Han or Greedo?'
Tools: Data Visualisation, Python, Pandas, NumPy, Matplotlib
Data Visualisation on Exchange Rates
During this project I applied both exploratory data visualization (graphs used for ourselves to better understand and explore data) as well as explanatory data visualization (creating graphs for others to inform, make a point, or tell a story) techniques. I maximized the data-ink ratio, created visual patterns with Gestalt principles and used pre-attentive attributes.
Tools: Data Visualisation, Python, Pandas, NumPy, Matplotlib
Probability
Building a Spam Filter with Naive Bayes Algorithm
In this project, I am going to study the practical side of the multinomial Naive Bayes algorithm by building a spam filter for SMS messages.
Tools: Python, Pandas, regular expression (RE)
Methods used: Naive Bayes algorithm
Hypothesis Testing
In this project, I work with a dataset of Jeopardy questions to identify patterns that could potentially improve one's chances of winning this famous US TV show. I will be using the chi-squared test for categorical data, which allows me to determine the statistical significance of observing a set of categorical values.
Tools: Python, Pandas, NumPy, SciPy
Methods used: Chi-squared test
Mobile App for Lottery Addiction
In this project I intend to write functions required for the mobile app which helps lottery addicts better estimate their chances of winning a lottery. I write functions to calculate factorials and combinations, which will be used repeatedly throughout the project.
Tools: Python, Pandas
Methods used: calculating factorials, calculating number of combinations
Statistics
Finding the Best Markets for Advertising
In this project, I analyse survey data to find the best two markets to advertise in. An e-learning company offering programming courses wants to identify the two best markets for advertising their subscription product. To minimize costs, data-driven approaches should be explored before considering expensive market surveys.
Tools: Python, Pandas, Matplotlib, Seaborn
Investigating Fandango Movie Ratings
In this project I apply all the knowledge learned about sampling, variables, scales of measurement, and frequency distributions. I learned about each of these topics in isolation. In this guided project, I attempt to make one step further and learn to combine all these skills to perform practical data analysis.
Tools: Python, Pandas, NumPy, Matplotlib
Web Scraping
Scraping Premier League site
In this project I attempt to scrape a sports stats website using BeautifulSoup and Requests libraries in order to get data needed for building a prediction model on winning football games in the Premier League.
Tools: Python, Pandas, Requests, BeautifulSoup
Data Cleaning
Analyzing NYC High School Data
In this project I attempt to investigate the correlations between SAT scores and demographic factors in New York City public schools including race, gender, AP exams and more.
Tools: Python, Pandas, NumPy, Matplotlib, regular expression (RE)
Clean and Analyse Employee Exit Surveys
This project is aimed at applying data cleaning skills with pandas to practice.
Tools: Python, Pandas, NumPy, Matplotlib
Cleaning The COUGHVID crowdsourcing dataset
In this repository I clean and prepare the data for the project aimed at identifying the severity of cough due to air pollution in New Delhi using audio analysis and Machine Learning.
Tools: Python, Pandas, NumPy, Matplotlib, Seaborn, Librosa