| THE PUTNAM WAY JOURNAL | ||||||||||||||||
|
||||||||||||||||
| p r e v |
Links to Softwares The AToM Framework I have started working on the AToM (Another Topic Model) framework that implements a framework for statistical topic modeling codes in C++. [ Gibbs-LDA download]Note: Due to lack of time and other commitments, this version implements a basic Gibbs sampler type LDA (Latent Dirichlet Allocation) only. The softwares here are intended as quick and dirty prototypes for students. Some additional comments:
|
Prototypes - requires polishing
Tag-Topic Model Codes Some C++ prototype codes from our 2011 CIKM paper - "Simultaneous Joint and Conditional Modeling of Documents Tagged from Two Perspectives"- Download TagLDA in C++ - Download MMLDA in C++ - Download CorrMMLDA in C++ - Download METag2LDA in C++ - Download CorrMETag2LDA in C++ The model names may be different in code than those in the paper. Will do some refactoring if time permits. Some Topic Model Codes from Earlier Times - Download Variational Bayesian LDA in C++ - Download Gibbs sampling base LDA in C++![]() Notes on Topic Model Codes NOTE: All ids for documents and observed data (terms, document level tags and word level tags) start from 0 and are sequential with an increment of 1.The comments in the source files can be misleading due copy-pasting. These will be fixed as time permits. ![]() |
TA Corner TA1. Adhoc Datastructures
[
This Eclipse CDT project] serves as a repository
for standard algorithms that are not found in the standard
C++ stl (except heaps). Source codes have been borrowed from
several sources. Being written entirely in C++, this package
currently includes implementation for Multi-Way-Merge or
K-Way-Merge, B+ Tree, a minimal Trie, Heaps using vectors
and a minimal on-disk binary search (often used for inverted
index searching based on query words).
WishList: include code from
Google's sparse hash,
SGI STL hash_map and
TPIE
See the readme file inside the tarball for compilation
instructions. The SGI STL hash_map can be accessed using
standard c++ library in most nix systems and can be used in
code as:
#include<ext/hash_map>
using namespace __gnu_cxx; TA2. CSE 4/535 IR course - Fall 2010
[
Warmup code] This is an adhoc implementatation for
the first project. The expected behavior of this code is to
extract internal wiki links from wiki markups. For more info on
how the markup files look like, please see the files under the
"Wiki" subdirectory under the data directory.
(+) First project. (+) Second project. (+) Third project. TA3. CSE 4/535 IR course - Fall 2009
[
Warmup notes on C++ STL] The content in this pdf was
geared towards use in
first project. The focus of the project was on
document language identification using character bi-grams. Stl
bitset was used for unicode extraction. Format for unicode is
well described in
the UTF-8 article in Wikipedia
(*) Find the UTF-8 code chart here ![]() |
n e x t |
||||||||||||
|
Please report bugs to me. My email can be found at the bottom right corner of this page with "university domain" being substituted with "buffalo . edu". Thanks! |
||||||||||||||||
|
|
||||||||||||||||