I'm a Data Science graduate student at the University at Buffalo, focused on turning complex data into clear, actionable insights. My work combines machine learning, analytics, and cloud tools to solve real-world problems whether it's detecting anomalies, predicting outcomes, or building data products that support smarter decisions.
Get In TouchDeveloped a system to predict diabetes using Streamlit that allows users to input health-related parameters and predicts the likelihood of diabetes using a pre-trained machine learning model.
Developed an automated Reddit ETL pipeline using Apache Airflow, AWS Glue, Redshift, and Python to extract and analyze vendor-related discussions from Reddit. The pipeline pulls data via the Reddit API, performs sentiment analysis and topic tagging, and stores the results in Redshift for reporting. I built dashboards in Tableau to help compliance and procurement teams monitor vendor perception in real time, reducing manual review by 80% and enabling faster, data-driven decisions. This project demonstrates my ability to integrate cloud services, orchestrate complex workflows, and turn unstructured data into actionable business insights.
Football Database Management System (DBMS) developed to store, manage, and retrieve football-related data such as player details, team information, matches, statistics, and more. The system utilizes SQL (MySQL) for database creation and management, along with a Python-based command-line interface for interaction.
This project focuses on detecting phishing websites using multiple machine learning models. Various algorithms, including Logistic Regression, K-Nearest Neighbors (KNN), Decision Trees, Random Forest, Support Vector Machines (SVM), and more, are used to classify websites as phishing or legitimate based on several features.
This project applies machine learning techniques to classify landmines using environmental and sensor data from the UCI Machine Learning Repository. Several classification models, including Logistic Regression, Decision Trees, Random Forest, K-Nearest Neighbors, Naive Bayes, and Support Vector Machines (SVM), were tested. The best-performing model, SVM with an RBF kernel, achieved an accuracy of 61.62%.
This project is part of the Healthcare Analytics course, where we explore hospital inpatient data during the COVID-19 pandemic. The analysis includes preprocessing datasets, building predictive models to estimate hospital length of stay (HLOS), and conducting trials using both MLP and RNN models.
Developed an interactive dashboard using Tableau for British Airways Data.
Developed an interactive dashboard using Tableau for Customer Sales Data.
Currently exploring the world of data science as I pursued my Master's in Engineering Science (Data Science) at the University at Buffalo (UB). With a background in software engineering, I’ve always been captivated by the intersection of technology and real-world problem-solving. My journey is fueled by a passion for using data to create meaningful change, whether it's driving equitable resource use or tackling societal challenges as an ELISS Fellow.
Beyond crunching numbers and building models, I’m a lifelong learner with a love for discovery, always on the lookout for new ways to expand my horizons. When I’m not immersed in code or analytics, you’ll probably find me on a fitness adventure or exploring new destinations. This portfolio is a glimpse into my world—where curiosity meets innovation!
University at Buffalo, Buffalo, NY, USA (January 2024 – May 2025)
GPA: 3.667/4.0
Jawaharlal Nehru Technological University, Anantapur, Andhra Pradesh, India (2018-2022)
GPA: 8.0/10
Hyderabad, Telangana (Oct 2022 – Dec 2023)
Contributed to building data pipelines in Databricks to process healthcare data from AWS S3 and on-premise sources, supporting unified access for analytics teams Cleaned and transformed large datasets using PySpark and SQL, and implemented SCD Type 2 logic under guidance to track historical changes accurately Assisted in automating daily ingestion workflows using AWS Glue and Airflow, which helped reduce manual data refresh time by ~40% Developed Python and Excel-based scripts to validate key metrics and flag anomalies across staging and curated layers Applied row- and column-level filters to safeguard PII data in compliance with HIPAA guidelines, working closely with senior engineers and QA teams
Hyderabad, India (Mar 2022 – July 2022)
Completed extensive training in Big Data and Data Engineering, covering key concepts like data wrangling, pipeline design, SQL querying, and cloud fundamentals Built and delivered a real-time team project involving integration and cleaning of large datasets, implementation of data quality checks, and creation of summary reports for analysis Gained hands-on experience in SQL and Python by working on data preprocessing tasks and performing exploratory analysis to identify patterns and inconsistencies Collaborated closely with teammates to evaluate basic machine learning models and contribute to project documentation, ensuring clear communication and structured delivery Demonstrated commitment to continuous learning by actively participating in use case-based learning modules and completing all assigned project milestones on time.
716-275-5673
Buffalo, NY 14226