Projects

A collection of all my personal and academic projects

llm-meta-analysis Preview

Automated Meta-analysis with LLMs

Jan 2024 – Aug 2024

Annotated and released a dataset of clinical trial reports and assessed seven LLMs in zero-shot settings. The findings indicate that while LLMs show promise, especially for binary outcomes, there remain significant challenges for complex and inference-required extractions. The paper was accepted to MLHC 2024.

Python PyTorch HuggingFace
Adversarial_attacks Preview

Adversarial Robustness of Neural Nets

Feb 2024 – May 2024

By training and testing models on the MNIST dataset using various attack methods like PGD, BIM, and FGSM, the study evaluates the robustness of different network architectures. The findings highlight the importance of network structure in enhancing resilience against adversarial perturbations, with deeper models like ResNet and GoogLeNet demonstrating significant robustness improvements through adversarial training techniques.

Python PyTorch Adversarial Robustness Toolbox
Author-Attribution Preview

Authorship Attribution: NLP

Mar 2024 – May 2024

This study evaluates various machine learning models for identifying the author of an article. Using Doc2Vec and BERT for feature extraction, the research tests models including Logistic Regression, Random Forest, SVM, Naïve Bayes, KNN, and Feed-Forward Neural Networks. Results indicate that the Feed-Forward Neural Network performs best, particularly with BERT embeddings.

Python PyTorch HuggingFace Scikit-learn
FinViz Preview

FinViz - Balance Sheet Visualizer

Jan 2023 – Apr 2023

Engineered interactive tool visualizing 10-K financial data using CSS, JS, and D3 library, for introductory finance and accounting class students. Implemented time-series and cross-sectional analysis with filters and tooltips to allow for comparisons across firms and years.

JavaScript D3.js HTML CSS
Business Recommend Preview

Business Recommendation Engine

Feb 2023 – Apr 2023

Created a business recommendation engine using Yelp data. The engine suggests similar businesses based on reviews, distance, and user preferences. Utilized MongoDB to build the database and PyMongo to write queries and conduct thorough analysis of reviews and credibility scores.

MongoDB Python pandas Plotly
Credit Default Preview

Supervised Learning for Credit Defaults

Feb 2023 – Apr 2023

Implemented KNN, decision tree, and logistic regression algorithms using Taiwanese credit card data from 2005, resulting in an accuracy of around 68% in predicting default. Utilized sklearn and keras libraries to train and evaluate a Random Forest and neural network models, resulting in accuracies above 70%.

Python Keras Scikit-learn
DaveML Preview

DaveML - Custom ML Library in C++

Nov 2022 – Dec 2022

This project involved the implementation of an ETL module and regression techniques (linear, ridge, and logistic) using C++ and principles of Linear Algebra and Calculus. The functionality was encapsulated into a Python module using PyBind11, allowing for easy experimentation with the models.

C++ Linear Algebra Python PyBind11
ReceipTrack Preview

ReceipTrack - Receipt Image Processor

Oct 2022 – Dec 2022

Developed dashboard web tool that utilizes OpenCV and PyTesseract to parse images of receipts and create spending habit reports and categorical breakdowns. Incorporated object-oriented design and Plotly Dash to visualize spending trends over time and across food groups.

Python Plotly OpenCV PyTesseract
Hemorrhage Classifier Preview

Brain Hemorrhage Classification with Neural Networks

Oct 2022 – Dec 2022

To improve the current diagnostic process, we developed and tested machine learning algorithms for identifying the occurrence and type of hemorrhage from CT scans. Utilizing a convolutional neural network, we achieved an accuracy of 70% over 1463 test images. Our work has the potential to make a significant impact in the field of healthcare.

Python Tensorflow pandas Scikit-learn
Forex Preview

Foreign Exchange Analysis

Nov 2022 – Dec 2022

Attempted to analyze seemingly random and unpredictable movements in foreign exchange rates. Gathered daily exchange rates through a Forex API. Implemented linear models to help explain the variance in currency movements and to assess feature importance. Future work includes investment simulation and time-series forecasting methods.

Python pandas Scikit-learn
Program Predictor Preview

Programming Language Predictor

Mar 2022 – Apr 2022

Tokenized a large set of GitHub Repository code snippets with Regex and TF-IDF to train a Multinomial Naive Bayes Classifier. Launched the machine learning model on Heroku with a Python framework, Flask, to predict the programming language of a user-inputted code snippet.

Python Scikit-learn Flask pandas
EZ Itinerary Preview

EZ Itinerary - Vacation Planner

Mar 2022 – Apr 2022

Implemented a 12-table relational database in MySQL to assist in vacation planning, harnessing Yelp's business dataset. Allows users to create travel groups and plan their hotels, restaurants, and excursions. Utilizes stored procedures to interact with the user-interface deployed with Python and Flask.

MySQL Python Flask
Executive Pay Analysis Image

Impact of Executive Pay on Firm Performance

Dec 2021 – Feb 2022

Evaluated correlation between increase in the success of a company and the compensation for top executives with Python. Wrangled stock data for over 500 companies using Pandas library on Compustat and CRSP databases. Found minimal evidence to prove executive pay is correlated with firm performance.

Python pandas NumPy Matplotlib
Finance Project Image

Financial Data and Stock Price Correlation

Nov 2021 – Dec 2021

Investigated the correlation between financial ratios and stock prices of Apple and Google. Utilized pandas to calculate ratios, build visualizations, and evaluate correlation between the stock price and the associated ratio. The analysis reveals that certain metrics and relationships are hard to predict, even across companies in the same industry.

Python pandas Matplotlib