Algorithms

JSAT as numerous algorithms available, some of which overlap / are related / building blocks for others. I'll attempt to provide a somewhat ordered list of the algorithms available in JSAT. I'll also provide the main reference for each method, but please see the JavaDocs for more details.

Some of these algorithms are more practical than others, and many are implemented mostly for comparison purposes. This allows one to do a comparison against many standard algorithms in the literature within one framework very easily, and no implementation will have an unfair advantage over the others by simply being implemented in a different language.

Other algorithms just simply make good benchmarks.

Data Transforms

Algorithm	Relation / INFO	References
AutoDeskew	Attempts to remove any skew-ness from each column independently
Linear Transform	A simple linear rescaling of the features to a specific range
ZeroMeanTransform	A simple transform that removes the mean from all elements
PNormNormalization	re-scales the numeric features in data points by their p-norm
PolynomialTransform	generates new numeric features for polynomial interaction terms
MutualInfoFS	Feature selection using mutual information
Nominal To Numeric	Converts nominal features to numeric ones using a one-hot encoding
NumericalToHistogram	Converts numeric features to nomninal ones using a simple histogram
Bidirectional Search (BDS)	a greedy method of feature selection for prediction
plus-L minus-R Selection (LRS)	a greedy method of feature selection for prediction
Sequential Backward Selection (SBS)	a greedy method of feature selection for prediction
Sequential Forward Selection (SFS)	a greedy method of feature selection for prediction
Principle Component Analysis (PCA)	A common dimensionality reduction technique
FastICA		Hyvärinen, a., & Oja, E. (2000). Independent component analysis: algorithms and applications. Neural Networks, 13(4-5), 411–430. doi:10.1016/S0893-6080(00)00026-5
Johnson-Lindenstrauss (JL)	A dimensionality reduction technique	Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4), 671–687. doi:10.1016/S0022-0000(03)00025-4
WhitenedPCA	An extension of PCA that performs whitening, which decorelates all columns
WhitenedZCA	An extension of Whitened PCA that roates the feature space after whitening
Kernel PCA	A kernelized version of PCA	Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 10(5), 1299–1319. doi:10.1162/089976698300017467
Nystrom	A generic method of creating an approximate feature space for any kernel	Williams, C., & Seeger, M. (2001). Using the Nyström Method to Speed Up Kernel Machines. Advances in Neural Information Processing Systems 13 (pp. 682–688). MIT Press. Retrieved from here
RFF RBF	Random Kitche Sinks for the RBF Kernel	Rahimi, A., & Recht, B. (2007). Random Features for Large-Scale Kernel Machines. Neural Information Processing Systems.
ReliefF	A method of determining feature importance and selection for numeric features	Kononenko, I., Simec, E., & Robnik-Sikonja, M. (1997). Overcoming the myopia of inductive learning algorithms with RELIEFF. Applied Intelligence, 7, 39–55.

Predictive Algorithms

Linear algorithms

Algorithm	Purpose	Relation / INFO	References
Logistic Regression	Classification	A simple implementation of Logistic Regression
Logistic Regression DCD	Classification	A fast implementation of Logistic Regression based on the DCD solver	Yu, H.-F., Huang, F.-L., & Lin, C.-J. (2010). Dual Coordinate Descent Methods for Logistic Regression and Maximum Entropy Models. Machine Learning, 85(1-2), 41–75. doi:10.1007/s10994-010-5221-8
StochasticMultinomialLogisticRegression	Classification	A direct implementation of Multinomial LR allowing for various priors	Carpenter, B. (2008). Lazy Sparse Stochastic Gradient Descent for Regularized Mutlinomial Logistic Regression.
DCDs	Classification, Regression	A linear SVM trained by DCD	Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S. S., & Sundararajan, S. (2008). A Dual Coordinate Descent Method for Large-scale Linear SVM. Proceedings of the 25th international conference on Machine learning - ICML ’08 (pp. 408–415). New York, New York, USA: ACM Press. doi:10.1145/1390156.1390208
DCD	Classification, Regression	A linear SVM trained by DCD, without the shrinkage optimization
Linear Batch	Classification, Regression	A generic loss based solver that uses batch optimization methods, supports L2 regularization
Linear SGD	Classification, Regression	A generic loss based solver that uses stochastic online methods, supporting elastic net regularization
NewGLMNET	Classification	An implementation of Elastic Net regularized Logistic Regression	Yuan, G., Ho, C.-H., & Lin, C. (2012). An improved GLMNET for L1-regularized logistic regression. Journal of Machine Learning Research, 13, 1999–2030. doi:10.1145/2020408.2020421
Adaptive Regularization of Weight Vectors (AROW)	Classification, Online Learning		Crammer, K., Kulesza, A., & Dredze, M. (2013). Adaptive regularization of weight vectors. Machine Learning, 91(2), 155–187. doi:10.1007/s10994-013-5327-x
ALMA2	Classification, Online Learning		Gentile, C. (2002). A New Approximate Maximal Margin Classification Algorithm. The Journal of Machine Learning Research, 2, 213–242.
ROMMA	Classification, Online Learning		Li, Y., & Long, P. M. (2002). The Relaxed Online Maximum Margin Algorithm. Machine Learning, 46(1-3), 361–387. doi:10.1023/A:1012435301888
Soft Confidence-Weighted (SCW),	Classification, Online Learning		Crammer, K., Fern, M., & Pereira, O. (2008). Exact Convex Confidence-Weighted Learning. In Advances in Neural Information Processing Systems 22 (pp. 345–352). Retrieved from here
Passive Aggressive	Classification, Regression, Online Learning		Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585.
Support Passive Aggressive	Classification, Online Learning	This is a multi-class generalization of Passive Aggressive	Matsushima, S., Shimizu, N., Yoshida, K., Ninomiya, T., & Nakagawa, H. (2010). Exact Passive-Aggressive Algorithm for Multiclass Classification Using Support Class. SIAM International Conference on Data Mining - SDM (pp. 303–314). Retrieved from here
Pegasos	Classification, Online Learning	An online linear SVM	Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos : Primal Estimated sub-GrAdient SOlver for SVM. 24th international conference on Machine learning (pp. 807–814). New York, NY: ACM. doi:10.1145/1273496.1273598
Normal Herd (NHERD)	Classification, Online Learning	Its related to AROW and Passive Aggressive	Crammer, K., & Lee, D. D. (2010). Learning via Gaussian Herding. Pre-proceeding of NIPS (pp. 451–459). Retrieved from here
Sparse Truncated Gradient Descent (STGD)	Classification, Regression, Online Learning	An online algorithm for L1 regularized models	Langford, J., Li, L., & Zhang, T. (2009). Sparse online learning via truncated gradient. The Journal of Machine Learning Research, 10, 777–801.
RidgeRegression	Regression	A simple batch implementation of Ridge Regression

Kernel Based

Algorithm	Purpose	Relation / INFO	References
Platt's SMO	Classification, Regression		Platt, J. C. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Advances in kernel methods (pp. 185 – 208). Retrieved from here
Kernel SGD	Classification, Regression, Online Learning	see KernelPoint building block. This provides a way to directly train a Kernel based model via SGD
Double Update Online Learning (DUOL)	Classification, Online Learning	This is a kernelized extension of the Passive Aggressive algorithm	Zhao, P., Hoi, S. C. H., & Jin, R. (2011). Double Updating Online Learning. Journal of Machine Learning Research, 12, 1587–1615. Retrieved from here
Online Sparse Kernel Learning by Sampling and Smooth Losses (OSKL)	Classification, Online Learning	This is a bounded method for kernel learning meant for smooth loss functions like the Logistic Loss.	Zhang, L., Yi, J., Jin, R., Lin, M., & He, X. (2013). Online Kernel Learning with a Near Optimal Sparsity Bound. In S. Dasgupta & D. Mcallester (Eds.), Proceedings of the 30th International Conference on Machine Learning (ICML-13) (Vol. 28, pp. 621–629). JMLR Workshop and Conference Proceedings.
PegasosK	Classification, Online Learning	A kernelized version of Pegasos	Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos : Primal Estimated sub-GrAdient SOlver for SVM. 24th international conference on Machine learning (pp. 807–814). New York, NY: ACM. doi:10.1145/1273496.1273598
Projectron	Classification, Online Learning	This is a kernalized version of the Perceptron that uses Projection as its method of bounding the number of SVs	Orabona, F., Keshet, J., & Caputo, B. (2009). Bounded Kernel-Based Online Learning. The Journal of Machine Learning Research, 10, 2643–2666.
Bounded Online Gradient Descent (BOGD)	Classification, Online Learning		Zhao, P., Wang, J., Wu, P., Jin, R., & Hoi, S. C. H. (2012). Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning. In Proceedings of the 29th International Conference on Machine Learning (pp. 169–176). Learning; Machine Learning. Retrieved from here
Least Squares Support Vector Machine (LS-SVM)	Classification, Regression	This related to the Kernel SVM, but simpler to implement. Its equivalent to Kernel Ridge Regression, but trained differently - allowing for potentially larger data sets	Suykens, J., & Vandewalle, J. (1999). Least Squares Support Vector Machine Classifiers. Neural processing letters, 9(3), 293–298. doi:10.1023/A:1018628609742
Kernel Ridge Regression	Regression	This is a direct kernelization of the Ridge Regression algorithm
Kernel Recursive Least Squares (KernelRLS)	Regression, Online Learning	This is a kernelizaion of the RLS algorithm, and the first paper to use projection for bounded learning	Engel, Y., Mannor, S., & Meir, R. (2004). The Kernel Recursive Least-Squares Algorithm. IEEE Transactions on Signal Processing, 52(8), 2275–2285. doi:10.1109/TSP.2004.830985
Conservative Stochastic Kernel Logistic Regression (CSKLR)	Classification, Online Learning		Zhang, L., Jin, R., Chen, C., Bu, J., & He, X. (2012). Efficient Online Learning for Large-Scale Sparse Kernel Logistic Regression. Twenty-Sixth AAAI Conference on Artificial Intelligence (pp. 1219–1225). Retrieved from here

Tree Based

Algorithm	Purpose	Relation / INFO	References
ID3	Classification	A simple decision tree for only nominal features
Decision Tree	Classification, Regression	This is a generic implementation, allowing the ability to mimic the behavior of many tree algorithms like C4.5 and CART
Random Tree	Classification , Regression	This is a decision tree that choses a random subset of features at each iteration to consider
Random Forest	Classification, Regression	This is a collection of random trees	Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. Retrieved from here
Extra Tree	Classification, Regression	This algorithm is an extremely randomized tree builder	Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees . Machine learning, 63(1), 3–42. doi:10.1007/s10994-006-6226-1
Extra Random Trees (ERTrees)	Classification, Regression	A collection of Extra Trees

Nearest Neighbor / Vector Quantization based

Algorithm	Purpose	Relation / INFO	References
Nearest Neighbor	Classification, Regression
Discriminant Adaptive Nearest Neighbor. DANN	Classification, Regression	An extension of Nearest Neighbor that uses a region adaptive distance metric	Hastie, T., & Tibshirani, R. (1996). Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6), 607–616. doi:10.1109/34.506411
Learning Vector Quantization (LVQ)	Classification
LVQ with Locally Learned Classifier (LVQ-LLC)	Classification	This is an origianl extension of LVQ
Self Organizing Map (SOM)	Classification, Clustering
Locally Weighted Learner (LWL)	Classification, Regression	This algorithm builds a local model for every query, and uses that local model to make predictions	Atkeson, C., Moore, A., & Schaal, S. (1997). Locally Weighted Learning. Artificial intelligence review, 11–73

Meta Algorithms

Boosting Based

Algorithm	Purpose	Relation / INFO	References
AdaBoostM1	Classification	The original boosting algorithm
AdaBoostM1PL	Classification	A parallel extension of AdaBoost	Scalable and Parallel Boosting with MapReduce, Indranil Palit and Chandan K. Reddy, IEEE Transactions on Knowledge and Data Engineering
SAMME	Classification	An extension of AdaBoostM1 that works better on multi-class problems	Multi-class AdaBoost by Ji Zhu, Saharon Rosset, Hui Zou, & Trevor Hasstie
Logit Boost	Classification	A boosting algorithm for classification that uses regressors as the weak learners	Special Invited Paper Additive Logistic Regression: A Statistical View of Boosting, By Jerome Friedman, Trevor Hastie and Robert Tibshirani. The Annals of Statistics 2000, Vol. 28, No. 2, 337–407
LogitBoostPL	Classification	A parallel extension of Logit Boost	Scalable and Parallel Boosting with MapReduce, Indranil Palit and Chandan K. Reddy, IEEE Transactions on Knowledge and Data Engineering
Modest AdaBoost	Classification	An extension of AdaBoost that reduces overfitting	Vezhnevets, A., & Vezhnevets, V. (2005). “Modest AdaBoost” – Teaching AdaBoost to Generalize Better. GraphiCon. Novosibirsk Akademgorodok, Russia. Retrieved from here
EmphasisBoost	Classification	A generalization of AdaBoos, with RealAdaBoost as a special case	Gómez-Verdejo, V., Ortega-Moral, M., Arenas-García, J., & Figueiras-Vidal, A. R. (2006). Boosting by weighting critical and erroneous samples. Neurocomputing, 69(7-9), 679–685. doi:10.1016/j.neucom.2005.12.011

Other

Algorithm	Purpose	Relation / INFO	References
Bagging	Classification, Regression	An ensembling technique for reducing variance	Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. doi:10.1007/BF00058655
Wagging	Classification, Regression	An ensembling technique for models that support weighted data instances	Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 38(1998), 1–38.
Arc-X4	Classification	A simple ensemble technique for models that support weighted data instances	Breiman, L. (1998). Arcing Classifiers. The Annals of Statistics, 26(3), 801–824.
Stacking	Classification, Regression		Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259.
UpdatableStacking	Classification, Regression, OnlineLearning	A generalization of Stacking for online models	Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259.
OneVsAll	Classification	Extends binary classifiers to multi-class ones
OneVsOne	Classification	Extends binary classifiers to multi-class ones
Decision Directed Acyclic Graph (DDAG)	Classification	Extends binary classifiers to multi-class ones, and is related to OneVsOne	Platt, J. C., Cristianini, N., & Shawe-Taylor, J. (2000). Large margin DAGs for multiclass classification. In Advances in Neural Information Processing Systems (pp. 547–553). MIT Press. Retrieved from http://www.weizmann.ac.il/mathusers/bagon/CVspring07/files/DAGSVM.pdf

Home

FAQ

Algorithms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Algorithms

Data Transforms

Predictive Algorithms

Linear algorithms

Kernel Based

Tree Based

Nearest Neighbor / Vector Quantization based

Meta Algorithms

Boosting Based

Other

Clone this wiki locally