Large Scale Machine Learning for Inductive Logic Programming (ILP) Applications
This project will be a joint initiative between the Center for Computational Learning Systems (CCLS), Columbia University, Indraprastha Institute of Information Technology, Delhi and Indian Institute of Technology, Mumbai.

Its goal is to develop an inter-disciplinary research team for studying problems pertaining to data-driven computation in industrial and scientific applications. The emphasis is on analyzing large amounts of data arising in biology (millions of nucleotide sequences), health (electronic health records, administrative and social information about patients), and telecommunications (call data records from telecom companies) applications; building large scale mathematical and statistical models and the use of high-performance computing. Interest in this type of data arises primarily from the fact that it no longer consists of simple values of known (pre-defined) features but occurs as observations of complex inter-related variables and is now available in very large quantities due to advances in automation and low cost of memory and storage devices.

This project will: (1) Develop pilot research projects involving students and faculty at all three institutions (2) Enable the exchange of students (short term 1 - 2 months) from IIT Mumbai and IIIT-Delhi to Columbia and vice-versa as interns (3) Support the development of an online learning course in large scale machine learning that will be tried out at the three institutes first and then re-evaluated and designed as a massive open online course (MOOC) (4) host guest lecturers in machine learning/statistics at the global center and (5) Enable participation of interested students / faculty in the winter school for large scale learning to be hosted at IIIT-Delhi in the later part of 2013.
Non-Intrusive Load Monitoring
With the new age of Internet of Things (IoT), sensors and actuators are becoming part of our daily lives. Continuous upload of detailed information gathered from such IoT devices to the cloud for detailed inference is not feasible due to the constrained bandwidth and the limited energy associated with such devices. Thus approximate inference algorithms need to be designed for local computation on such devices that account for their constrained nature. Such models will enable learning at the device level without having to store all the historical data on it. Furthermore, these approximate models (and possibly some synopses of data) can be communicated to the cloud for compute intensive global inference. The challenge, however, is to ensure that local models developed for the devices are a good estimate of the actual model developed if all the information is communicated to the cloud. This project is aimed at developing distributed machine learning algorithms for incremental sensing and inference approaches, optimizing local on-device and global in-cloud computation accounting for the device-cloud communication cost. The algorithms will be validated for the NP-complete load-disaggregation problem, critical for the application of smart grids. We envision that these distributed algorithms can be extended for distributed inference in other application domains, such as healthcare, possibly using the local computation of a low end device (or a mobile phone) connected to the health sensors.

[Images are obtained from http://homes.cs.washington.edu/~sidhant/slides/ElectriSense_PDF.pdf and www.wattseeker.com]