Some projects done in different stages of my academic career.

Cluster Analysis of Wine Types Data

Goal is to identify different types of wines from their chemical compositions. Plan is to start with basic EDA and then try different methods, starting from simple algorithms such as K-Means clustering, K-Medoids clustering to Gaussian Mixture Clustering, Hierarchical Clustering.

Ongoing

On a Nonparametric Test for Dependence

Literature review and discussion on a paper which utilizes two different correlation coefficients to aggregate them into one and come up with a distribution free test for independence betweeen two random variables.

Fall 2023

A Study on Non Gaussian Graphical Models

We first try to understand what are graphical models and why are they needed. Then we briefly discuss about the existing literature on one of the simplest cases which is when the nodes of the graph jointly follow a multivariate normal distribution. Next, we move on to study what methods can be applied when the data is not normally distributed, which is often the case in practice. To study non gaussian scenarios, we do a comprehensive literature review on nonparametric methods for continuous data and also study discrete graphical models. At last, we try to see if we can have a common framework to study both discrete and continuous variables together. In most of the setups, we try to study the performance of the methods in different situations like for large sample size, for deviation from normality and for higher dimensions.

Spring 2023

On a Robust Correlation Coefficient

Review and discussion on a paper published in JRSSD which proposes a robust version of the usual correlation coefficient based on LMS regression. Simulations and applications on real data are demonstrated.

Spring 2023

Heart Disease Prediction using Supervised Machine Learning

Given different health measurements of an individual like cholesterol, blood presssure, Exercise Angina, ST Slope, etc we try to predict whether a particular individual is vulnerable to heart related disease. Logistic Regression, Tree based methods and KNN were used.

Fall 2022

A Study on Kernel Density Estimation

Starting with the simplest form of density estimation i.e. histograms, identifying their drawbacks and step by step trying to overcome them, we eventually illustrate KDE.Simulations from different distributions and taking different choices of kernels show how good the estimates holds.Asymptotic properties are verified.Using a similar approach via kernels, we try to estimate the CDF.Detailed simulations done for each case.Finally we do a case study.

Spring 2022

Regression Analysis of Petrol Consumption Data

Starting with the linear regression model with all predictors involved, performed regression diagnostics, did appropriate transformations to the data to remove unwanted stuff like heteroscedasticity,multicollinearity,non-normality of errors,etc.Also,variable selection was performed.Finally presented the model which is the best fit to the data.

Fall 2021

Modelling the UEFA Champions' League

Here we try analyse the number of goals scored in the UEFA Champions' League over the years.Naturally an application of time series analysis.We try to build a simple mathematical model for the goal ratio- which is more reasonable to work. After estimating the trend using Mathematical Curve Fitting, we try to model the remaining part using ARMA.

Spring 2020