I have completed my Masters in Computer Science from University at Buffalo. I have done my Bachelors in Computer Engineering from University of Pune, India. The field of Computer Science has intrigued me since a very early age. Our world revolves around technology and getting adapted to skills in computer science could help us shape the future of this world. I took an instant liking to it and went on to take the undergraduate program in Computer Engineering.
During my Bachelors in Computer Engineering I was introduced to many exciting courses in Computer Networks (learnt the exact functioning and interaction of the 7 layers of communication), Computer Forensics (learnt various encryption algorithms), Computer Graphics and Gaming (learnt various Clipping and Windowing techniques and Geometric Transformations) and Embedded Operating Systems (Programming Beagle Bone Black consisting of ARM cortex processor). Our final year project was to develop a system which would automate the process of medical diagnosis using association rules and pattern recognition. Working on this project strengthened my technical and team skills and gave me some valuable experience being the Project Head.
In my Masters program at University at Buffalo, I have done many interesting projects in the fields of Distributed Systems, Information Retreival, Data Science and Machine Learning. Implementing these projects has given me in depth knowledge about Java and Android Application development through the Distributed Systems course, Python and libraries like numpy, scikit-learn etc in the Machine Learning and Data Mining courses, R and Jupyter Working environment in the Data Intensive Computing course.
Apart from academics I have represented my college in my undergraduate program for soccer and have won several accolades. I am an avid reader and like to play the guitar. Musical influences range from Pink Floyd to Coldplay. Huge fan of Manchester United.
I have worked remotely as a Software Engineering Intern for Endorsify which is a company based in Los Angeles, CA.
Software Engineering Intern (June 2017 - August 2017 • 3 months)
I have worked remotely as a Software Engineering Intern for Endorsify. The company is based in Los Angeles, CA. It helps marketers manage user-generated content and establish relationships with content creators. I worked on integration of an API provided by Clarifai to build a tool using Python for generating tags from images on Instagram and build a classifier to help select appropriate influencers from the tags. I also worked on a project for building a data visualization dashboard, integrating Google Analytics and Heap Analytics with the company website and web scraping for required information.
Masters in Computer Science (GPA 3.62/4)
I have completed my Masters in Computer Science at University at Buffalo, The State University of New York. I have done my specialization in the focus area of Computer Software and Information Systems.
Fall 2016: Analysis of Algorithms, Information Retrieval, Software Engineering Concepts, Computer Security.
Spring 2017: Distributed Systems, Machine Learning, Data Intensive Computing, Seminar on Software Verification.
Fall 2017: Data Mining and Bio-informatics, Seminar on Hardware Fingerprinting using Acoustic Sensors
Bachelors in Computer Engineering (First Class with Distinction)
Completed 4 years of my undergraduate education at University of Pune, India and achieved Bachelor of Engineering degree in Computer Engineering.
Taking different courses in my undergraduate and graduate Education helped me get an insight into object oriented programming and gave me some basic proficiency in programming languages like Java and Python along with familiarity in many other software tools.
PROGRAMMING LANGUAGES: Java, Python, R
TOOLS AND FRAMEWORKS: Django Rest Framework, Tableau, Solr, MapReduce
DATABASES: MySQL, SQLite, DynamoDB
OPERATING SYSTEMS: Linux, Windows
The project required implementation of a content provider for each Android emulator instance. The content
provider was implemented using internal storage and consisted of key-value pairs. The second part of this project
required us to implement multicast between 5 AVDs. These multicasted messages had to be stored on each
emulator’s content provider.
The objectives of this lab were collecting data by querying the Twitter REST API and processing the data using
twitteR library package. Information had to be summarized for specific queries like finding trends of a particular
place. Geo spatial information extracted from tweets had to be used in order to plot the tweets related to a specific
hashtag, on a map for visualization.
Developed a QA system for answering what/who/where type questions on twitter data indexed in Solr. The project focused on determining answer types and extracting facts from the tweets which was done using Natural Language Processing (NLP). Main aim of this project was to answer the questions based on these facts. The project required the use of OpenNLP library for POS (Parts of speech) tagging along with entity detection and entity extraction using Google’s Cloud Natural Language API.OpenNLP, Google NLAPI
This project focused on developing a system which automates the process of medical diagnosis. The system learns
from previous input of the user in order to improve the efficiency of its diagnosis and ask further question
depending upon the provided input. This project included various concepts like pattern recognition, association
rules and decision making.
This project required construction of an inverted index from a given Lucene index generated using Reuters RCV2
multilingual corpus. It involved getting familiar with the Lucene index and interacting with the Lucene API. The
second part of this project required implementing two strategies for returning boolean query results namely, term-
at-a-time and document-at-a-time.
This project included interacting with the Twitter API to crawl tweets and index them properly in Solr. Tweets
had to be indexed according to the necessary fields and appropriate queries were executed for testing the indexing.
The project focused on various concepts covered in Information Retrieval including tokenization, stemming and
This project involved extraction and re purposing of the Kaggle Sqlite European Soccer Database to create csv
files for question answering. The next part involved using data from Pew Research Center for developing
questions and hypothesis. Plots were created to justify cleaning and development of new data frames.
Implemented a Multilayer Perceptron Neural Network and evaluated its
performance in classifying handwritten digits. Then used the same network to
analyze a more challenging face dataset and compared the performance of the neural network against a deep
neural network using the TensorFlow library.
The project required implementation of a content provider for each Android emulator instance for storing key
value pairs. Each message was multi-casted to all active AVDs. The project required us to implement an algorithm
to maintain Total and FIFO ordering guarantees under a randomized failure of any one AVD at any point of time.
Designed a simple distributed hash table based on Chord. The design consisted of at most 5 active emulator
instances at a single point of time. The project required us to handle ID space partitioning/ re-partitioning using
the SHA-1 hash function, ring based routing and node joins. Insert/query and delete operations also had to be
implemented for all the active nodes in the chord.
This assignment included implementing a simplified version of Amazon Dynamo DB including Replication, Partitioning and Failure Handling. The main goal is to provide linearizability and availability at the same time. In other words, the implementation should perform read and write operations successfully even under a failure.
This project required implementing word count and word co-occurrence on classic latin text using Hadoop Map Reduce framework. The word co-occurence had to be scaled from 2 to n documents for 2-grams and 3-grams and we needed to examine the scalability and do performance evaluation.
This project required implemeting word co-occurrence on classic latin text using Apache Spark framework. Each word had to be lemmatized and we needed to perform word co-occurrence for all lemmas of every pair of words. The performance evaluation included comparing Hadoop Map Reduce framework's performance with Apache Spark.
This project required performing exploratory data analysis on any popular data-set (World University rankings) and visualizing the results and interacting with them. Tableau was used for this project. The project involved bulding an interactive dashboard in order to understand the impact of funding, research and international diversity on the rankings of US Universities.Tableau, EDA, Prediction, Data Product
Performed K-means clustering on Pew research center's data-set. K-nn classification on German credit data and Linear Regression on National Hockey League Data.
This assignment included implementing LDA and QDA analysis to determine which one is better and why. It also involved implementing linear regression with basis function expansion to generate non linear curves and also understanding the effect of regularization on it. The results had to be plotted and analyzed.
Implemented three supervised classifiers namely K-Nearest Neighbor, Decision Tree and Naive Bayes Classifier. Further implemented Random Forests and Boosting (Ada-Boost) based on my own implementation of Decision Tree. Analyzed the performance of all 5 classifiers on different types of data and studied their pros and cons.
Implemented Principal Component Analysis to obtain a new reduced set of dimensions in which to represent the given data. Also implemented some clustering algorithms like HAC (Hierarchical Agglomerative Clustering) using MIN link. and K-Means clustering using MapReduce framework.
Github link (PCA)
Github link (HAC)
Github link (K-Means)