Automated Meta-analysis with LLMs
Jan 2024 – Aug 2024
Annotated and released a dataset of clinical trial reports and assessed seven LLMs in zero-shot settings. The findings indicate that while LLMs show promise, especially for binary outcomes, there remain significant challenges for complex and inference-required extractions. The paper was accepted to MLHC 2024.
Adversarial Robustness of Neural Nets
Feb 2024 – May 2024
By training and testing models on the MNIST dataset using various attack methods like PGD, BIM, and FGSM, the study evaluates the robustness of different network architectures. The findings highlight the importance of network structure in enhancing resilience against adversarial perturbations, with deeper models like ResNet and GoogLeNet demonstrating significant robustness improvements through adversarial training techniques.
Authorship Attribution: NLP
Mar 2024 – May 2024
This study evaluates various machine learning models for identifying the author of an article. Using Doc2Vec and BERT for feature extraction, the research tests models including Logistic Regression, Random Forest, SVM, Naïve Bayes, KNN, and Feed-Forward Neural Networks. Results indicate that the Feed-Forward Neural Network performs best, particularly with BERT embeddings.
FinViz - Balance Sheet Visualizer
Jan 2023 – Apr 2023
Engineered interactive tool visualizing 10-K financial data using CSS, JS, and D3 library, for introductory finance and accounting class students. Implemented time-series and cross-sectional analysis with filters and tooltips to allow for comparisons across firms and years.
Business Recommendation Engine
Feb 2023 – Apr 2023
Created a business recommendation engine using Yelp data. The engine suggests similar businesses based on reviews, distance, and user preferences. Utilized MongoDB to build the database and PyMongo to write queries and conduct thorough analysis of reviews and credibility scores.
Supervised Learning for Credit Defaults
Feb 2023 – Apr 2023
Implemented KNN, decision tree, and logistic regression algorithms using Taiwanese credit card data from 2005, resulting in an accuracy of around 68% in predicting default. Utilized sklearn and keras libraries to train and evaluate a Random Forest and neural network models, resulting in accuracies above 70%.
DaveML - Custom ML Library in C++
Nov 2022 – Dec 2022
This project involved the implementation of an ETL module and regression techniques (linear, ridge, and logistic) using C++ and principles of Linear Algebra and Calculus. The functionality was encapsulated into a Python module using PyBind11, allowing for easy experimentation with the models.
ReceipTrack - Receipt Image Processor
Oct 2022 – Dec 2022
Developed dashboard web tool that utilizes OpenCV and PyTesseract to parse images of receipts and create spending habit reports and categorical breakdowns. Incorporated object-oriented design and Plotly Dash to visualize spending trends over time and across food groups.
Brain Hemorrhage Classification with Neural Networks
Oct 2022 – Dec 2022
To improve the current diagnostic process, we developed and tested machine learning algorithms for identifying the occurrence and type of hemorrhage from CT scans. Utilizing a convolutional neural network, we achieved an accuracy of 70% over 1463 test images. Our work has the potential to make a significant impact in the field of healthcare.
Foreign Exchange Analysis
Nov 2022 – Dec 2022
Attempted to analyze seemingly random and unpredictable movements in foreign exchange rates. Gathered daily exchange rates through a Forex API. Implemented linear models to help explain the variance in currency movements and to assess feature importance. Future work includes investment simulation and time-series forecasting methods.
Programming Language Predictor
Mar 2022 – Apr 2022
Tokenized a large set of GitHub Repository code snippets with Regex and TF-IDF to train a Multinomial Naive Bayes Classifier. Launched the machine learning model on Heroku with a Python framework, Flask, to predict the programming language of a user-inputted code snippet.
EZ Itinerary - Vacation Planner
Mar 2022 – Apr 2022
Implemented a 12-table relational database in MySQL to assist in vacation planning, harnessing Yelp's business dataset. Allows users to create travel groups and plan their hotels, restaurants, and excursions. Utilizes stored procedures to interact with the user-interface deployed with Python and Flask.
Impact of Executive Pay on Firm Performance
Dec 2021 – Feb 2022
Evaluated correlation between increase in the success of a company and the compensation for top executives with Python. Wrangled stock data for over 500 companies using Pandas library on Compustat and CRSP databases. Found minimal evidence to prove executive pay is correlated with firm performance.
Financial Data and Stock Price Correlation
Nov 2021 – Dec 2021
Investigated the correlation between financial ratios and stock prices of Apple and Google. Utilized pandas to calculate ratios, build visualizations, and evaluate correlation between the stock price and the associated ratio. The analysis reveals that certain metrics and relationships are hard to predict, even across companies in the same industry.