GADGET SVM -- A Gossip-bAseD sub-GradiEnT SVM solver
Overview
Distributed environments such as federated databases, wireless adhoc networks, sensor networks, Peer-to-Peer (P2P) systems have become popular in recent years; they are well-suited for designing scalable machine learning algorithms. The distributed setting is complex because network topologies are often dynamic and data available to algorithms changes frequently. In addition, nodes may have limited resources. Distributed Data Mining and Machine Learning algorithms created for these settings must have high utility, use little communication cost, work on dynamic networks and be computationally efficient.

In this project, our goal is to create consensus-based Support Vector Machine (SVM) solvers which achieve the above goals. The main features of GADGET SVM are the following:
  • A primal SVM solver for distributed environments
  • Works with dynamic network topologies
  • Uses an asynchronous gossip protocol for communication amongst nodes
  • Uses a distributed sub-gradient optimization algorithm
  • Can handle millions of training examples
  • Solves a binary classification problem in version 1.0 of the software
Description
In this algorithm, each node has a partition of the whole dataset. It trains on its data locally and gets the weight vector. Next, the node talks to (or gossips with) others and updates it's local weight vector. The process continues, until the information is dissipated in the network. Convergence of the algorithm is guaranteed and an approximate solution can be easily obtained.

In the PushSum protocol used by the GADGET algorithm, B is a doubly stochastic matrix which determines how the nodes are connected to each other and hence how the gossip spreads. For version 1.0, the B matrix has not been used; instead a predefined topology available from peersim has been used. Any future enhancement should be creating a new protocol based on the B matrix. A new protocol essentially will have to implement its own version of nextCycle method, which will decide which neighbor node a node will talk to.

Source Code and Binaries
  • Access Github Repo Here
Installation
Check the git repo above for instructions.
Getting Started: Some Example Problems
Publications
  • Chase Hensel and Haimonti Dutta, "GADGET SVM: a Gossip-bAseD sub-GradiEnT SVM Solver", International Conference on Machine Learning (ICML), Numerical Mathematics in Machine Learning Workshop, Montreal, Quebec, 2009.
Disclaimer
This software is free only for non-commercial use. It must not be distributed without prior permission of the author. The author is not responsible for implications from the use of this software.
Please send me an email at {haimonti}@buffalo.edu