Research-Lab Projects 👨🏽🔬
Image Inpainting using Probabilistic Models
Yoojung Choi Lab
In this project, we aim to reconstruct missing or corrupted parts of images using advanced probabilistic models. Leveraging Sum-Product Networks (SPNs) and Probabilistic Factor Graphs (PFGs), we develop robust methods for image inpainting. Our approach involves:
Data Preparation: We prepare a diverse dataset of images with artificially introduced missing sections to train and test our models.
Model Development: Utilizing SPNs, we capture complex probabilistic dependencies within the image data. We also employ PFGs to model the relationships between different image regions.
Training and Optimization: We train our models on the prepared dataset, optimizing them for accurate and efficient inpainting.
Evaluation: We evaluate our models using various metrics to assess their performance in reconstructing high-quality images.
This project showcases the application of probabilistic models in computer vision, demonstrating their potential for solving real-world problems in image restoration and enhancement.
AI-Enhanced Digital Observability Platform
Luminosity Lab - StateStreet
I am currently collaborating with State Street, a leading Fin-Tech company, on a transformative project to develop an AI-driven Observability and Digital Experience Framework. This initiative centers around the creation of an interactive dashboard leveraging Power BI. The primary objective is to deliver comprehensive real-time data analysis and actionable insights that significantly enhance End User Experience.
Our approach integrates cutting-edge AI and machine learning algorithms with interactive visualizations to revolutionize digital observability within State Street's operational landscape. By harnessing advanced analytics, we aim to empower stakeholders with the ability to monitor services in real-time, detect anomalies proactively, and optimize service delivery seamlessly. This framework not only supports robust service monitoring but also ensures continuous improvement through data-driven decision-making and personalized user experiences.
Through this collaboration, we aspire to set new benchmarks in digital observability and elevate service reliability and efficiency for State Street's clientele, reinforcing their position as an industry leader in financial technology solutions.
Zoom-Apple Workspace
Luminosity Lab - Zoom/Apple
I am currently collaborating with Zoom and Apple to implement a virtual learning space using Apple products. I work closely with the UI/UX design team to enhance the user interface of the VR space, making it more intuitive and user-friendly. Additionally, I assist in Quality Assurance Testing by documenting bugs and usability issues to ensure a seamless virtual learning experience.
Personal Projects 🎒
ICM Recommender
This project aims to develop a machine learning-based system that recommends music tracks based on the emotional state of the user. The system analyzes audio features of songs to categorize them into emotional categories such as "Happy," "Sad," "Devotional," "Party," and "Romantic."
It first gets the data of all the Indian Classical Music from Spotify it can find. Then the model is trained on the features using a pretrained music analysis model of Librosa. After it has extracted the features, it builds the neural network trained to categorize songs into different emotions given an emotional input. This can be scaled to be trained on more features with more data and then it can be generalized for its own purpose.
But my purpose was to show that technology can be used to spread Indian Classical Music.
https://github.com/aarora80/ICMReccomender.gitSpotifyCompatibility
I developed an application that uses Spotify's API to authenticate two users and retrieve their music data, including their favorite artists, playlists, and tracks. I created and thoroughly tested an algorithm within this application to analyze and compare the musical preferences of both users. This algorithm determines their compatibility based on shared tastes and listening habits, with the potential for incorporation into future business needs, such as personalized music recommendations or social features.
https://github.com/aarora80/SpotifyCompatibility.git
Distributed Hash Table (DHT) and Hot Potato Query Processing Protocol Project
This project implements a Distributed Hash Table (DHT) and a Hot Potato Query Processing Protocol in C. It was developed as part of the CSE 434 (Computer Networks) course at Arizona State University. The main goal of this project is to demonstrate the principles of socket programming, distributed systems, and efficient query processing in a networked environment.
Features:
Distributed Hash Table (DHT): Implements a DHT for storing and retrieving key-value pairs efficiently across a distributed network of nodes.
Hot Potato Query Processing Protocol: A unique query processing protocol where queries are forwarded to neighboring nodes in a "hot potato" manner until the target node is found or a timeout occurs.
Fault Tolerance: The system is designed to handle node failures and ensures data redundancy and recovery.
Scalability: The architecture allows for dynamic addition and removal of nodes without significant performance degradation.
https://github.com/aarora80/SocketProject.git
Paper: https://drive.google.com/file/d/1AiUv7d1dYNGCI8McVV1I8_w3Zl7tlpVd/view?usp=sharing
Machine Learning 📉📈
Kaggle Competitions
House Prices - Advanced Regression Techniques (Kaggle Competition)
Participating in the "House Prices: Advanced Regression Techniques" competition on Kaggle has been a thrilling and educational experience for me. The challenge was to predict the final sale prices of homes using a dataset that includes a wide array of features, from the size and condition of the house to the neighborhood it’s located in.
I delved into advanced regression techniques to build models that could make accurate predictions. This competition allowed me to apply and sharpen my skills in feature engineering, regression analysis, and model evaluation. I experimented with various preprocessing techniques to handle both numeric and categorical data, and I explored different machine learning methods to find the best performing models.
Overall, this project has been a significant learning experience, pushing me to improve my data science and machine learning capabilities while working on a real-world problem.
https://github.com/aarora80/AdvancedRegression.git
Digit Recognizer
Participating in the "Digit Recognizer" competition on Kaggle has been an exciting journey into the fundamentals of computer vision. The challenge involved recognizing handwritten digits using the famous MNIST dataset, which consists of 70,000 grayscale images of digits from 0 to 9.
This project provided me with an opportunity to dive deep into image processing and classification techniques. I explored various methods for preprocessing the images, such as normalization and augmentation, to enhance the performance of my models. I also experimented with different machine learning algorithms, from traditional methods like k-nearest neighbors and support vector machines to more advanced techniques involving neural networks.
Working on this competition allowed me to improve my skills in building and evaluating models for image recognition tasks. It was a rewarding experience that strengthened my understanding of computer vision and provided valuable insights into the challenges and solutions involved in recognizing handwritten digits.
https://github.com/aarora80/computerVision.git
Bayesian Network Probability Modeling
Disease Diagnosis Using Bayesian Networks In this project, I developed a Bayesian Network model to predict the likelihood of a disease based on various symptoms and demographic factors. The goal was to create a probabilistic model that could assist in medical diagnosis by considering multiple factors and their interdependencies.
Key Components: Model Structure:
The Bayesian Network includes nodes for various symptoms (Fever, Cough, Fatigue, Difficulty Breathing), demographic factors (Age, Gender), and health indicators (Blood Pressure, Cholesterol Level). The central node, Disease, is influenced by these factors, which collectively determine the probability of a particular outcome. Conditional Probability Distributions (CPDs):
Each node is parameterized with CPDs, which define the probability of each state given its parents. For simplicity, the initial CPDs were set to equal probabilities, but these can be refined with real data. For instance, the CPD for the Disease node considers the combined influence of symptoms and demographic factors. Inference:
Using the Variable Elimination algorithm, the model performs exact inference to compute the posterior probabilities. This allows us to predict the likelihood of an outcome (e.g., presence or absence of a disease) given observed evidence. In the example, the posterior probability of the outcome is computed given the presence of fever and cough.
https://github.com/aarora80/Bayesian.git
Text Classification with SVM Pipeline
In this project, I developed a text classification model using a Support Vector Machine (SVM) within a Scikit-learn pipeline. The model is designed to categorize text data into predefined categories, demonstrating the use of machine learning for natural language processing (NLP) tasks.
Key Components:
Pipeline Construction:
The pipeline includes three main steps:
CountVectorizer: Converts the raw text data into a matrix of token counts.
TfidfTransformer: Transforms the token counts into Term Frequency-Inverse Document Frequency (TF-IDF) features, which reflect the importance of terms in the text.
SGDClassifier: Uses Stochastic Gradient Descent to train an SVM with a hinge loss function, providing robust performance for text classification.
Model Training:
The pipeline is trained on the twenty_train dataset, which includes text samples and their corresponding target categories.
Prediction and Evaluation:
The trained model is used to predict the categories of the twenty_test dataset.
The performance of the model is evaluated by calculating the mean accuracy, which is the proportion of correctly classified samples
https://github.com/aarora80/NLP.git