Xu SUN

The Wayback Machine - https://web.archive.org/web/20131224113826/http://klcl.pku.edu.cn/member/sunxu/code.htm

CRF-ADF Sequential Tagging Toolkit v1.0

This is a general purpose software for sequential tagging (or called sequential labelling, linear-chain structured classification). The CRF (Conditional Random Fields) model is described in (Lafferty et al., 2001) and the ADF (Adaptive stochastic gradient Decent based on Feature-frequency information) fast training algorithm is described in (Sun et al., ACL 2012).

Main features:

Developed with C#

High accuracy (72.3% on Bio-Entity Recognition Task at BioNLP/NLPBA 2004, and 97.5% on Chinese Word Segmentation MSR Task)

Fast training (faster convergence rate than traditional batch/online training methods, including LBFGS & SGD)

General purpose (it is task-independent & trainable using your own tagged corpus)

Support rich edge features (Sun et al., ACL 2012)

Support various training methods, including ADF training, SGD training, & Limited-memory BFGS training

Support automatic n-fold cross-validation for tuning hyper-parameters

Support various evaluation metrics, including token-accuracy, string-accuracy, & F-score

[Tutorial] [Download the source code]

Latent Structured Perceptron Toolkit v1.0

This is a general purpose software for sequential tagging with the emphasis on fast training speed. This toolkit includes Latent Structured Perceptron (LSP) model (Sun et al., IJCAI 2009, TKDE 2013). It also includes traditional Structured Perceptron (SP) model and with the averaged version (Collins, 2002).

Main features:

Developed with C#

Automatic modeling of hidden information (latent structures) in the data (Sun et al., IJCAI 2009, TKDE 2013)

Fast training (much faster than probabilistic models like CRFs)

General purpose (it is task-independent & trainable using your own tagged corpus)

Support rich edge features (Sun et al., ACL 2012)

Support various evaluation metrics, including token-accuracy, string-accuracy, & F-score

[Tutorial] [Download the source code]

Online Multi-Task Learning Toolkit (OMT) v1.0

This is a general purpose software for online multi-task learning. The online multi-task learning is mainly based on Conditional Random Fields (CRF) model and Stochastic Gradient Descent (SGD) training. The work is described in (Sun et al., TKDE 2013).

Main features:

Developed with C#

High accuracy on the human activity recognition tasks (Sun et al., TKDE 2013)

General purpose (it is task-independent & trainable using your own tagged corpus)

Support SGD training & Limited-memory BFGS training

Support various evaluation metrics, including token-accuracy, string-accuracy, & F-score

[Tutorial] [Download the source code]