Skip to content
EdwardRaff edited this page Mar 31, 2015 · 3 revisions

JSAT as numerous algorithms available, some of which overlap / are related / building blocks for others. I'll attempt to provide a somewhat ordered list of the algorithms available in JSAT. I'll also provide the main reference for each method, but please see the JavaDocs for more details.

Some of these algorithms are more practical than others, and many are implemented mostly for comparison purposes. This allows one to do a comparison against many standard algorithms in the literature within one framework very easily, and no implementation will have an unfair advantage over the others by simply being implemented in a different language.

Other algorithms just simply make good benchmarks.

Data Transforms

Algorithm Relation / INFO References
AutoDeskew Attempts to remove any skew-ness from each column independently
Linear Transform A simple linear rescaling of the features to a specific range
ZeroMeanTransform A simple transform that removes the mean from all elements
PNormNormalization re-scales the numeric features in data points by their p-norm
PolynomialTransform generates new numeric features for polynomial interaction terms
MutualInfoFS Feature selection using mutual information
Nominal To Numeric Converts nominal features to numeric ones using a one-hot encoding
NumericalToHistogram Converts numeric features to nomninal ones using a simple histogram
Bidirectional Search (BDS) a greedy method of feature selection for prediction
plus-L minus-R Selection (LRS) a greedy method of feature selection for prediction
Sequential Backward Selection (SBS) a greedy method of feature selection for prediction
Sequential Forward Selection (SFS) a greedy method of feature selection for prediction
Principle Component Analysis (PCA) A common dimensionality reduction technique
FastICA Hyvärinen, a., & Oja, E. (2000). Independent component analysis: algorithms and applications. Neural Networks, 13(4-5), 411–430. doi:10.1016/S0893-6080(00)00026-5
Johnson-Lindenstrauss (JL) A dimensionality reduction technique Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4), 671–687. doi:10.1016/S0022-0000(03)00025-4
WhitenedPCA An extension of PCA that performs whitening, which decorelates all columns
WhitenedZCA An extension of Whitened PCA that roates the feature space after whitening
Kernel PCA A kernelized version of PCA Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 10(5), 1299–1319. doi:10.1162/089976698300017467
Nystrom A generic method of creating an approximate feature space for any kernel Williams, C., & Seeger, M. (2001). Using the Nyström Method to Speed Up Kernel Machines. Advances in Neural Information Processing Systems 13 (pp. 682–688). MIT Press. Retrieved from here
RFF RBF Random Kitche Sinks for the RBF Kernel Rahimi, A., & Recht, B. (2007). Random Features for Large-Scale Kernel Machines. Neural Information Processing Systems.
ReliefF A method of determining feature importance and selection for numeric features Kononenko, I., Simec, E., & Robnik-Sikonja, M. (1997). Overcoming the myopia of inductive learning algorithms with RELIEFF. Applied Intelligence, 7, 39–55.

Predictive Algorithms

Linear algorithms

Algorithm Purpose Relation / INFO References
Logistic Regression Classification A simple implementation of Logistic Regression
Logistic Regression DCD Classification A fast implementation of Logistic Regression based on the DCD solver Yu, H.-F., Huang, F.-L., & Lin, C.-J. (2010). Dual Coordinate Descent Methods for Logistic Regression and Maximum Entropy Models. Machine Learning, 85(1-2), 41–75. doi:10.1007/s10994-010-5221-8
StochasticMultinomialLogisticRegression Classification A direct implementation of Multinomial LR allowing for various priors Carpenter, B. (2008). Lazy Sparse Stochastic Gradient Descent for Regularized Mutlinomial Logistic Regression.
DCDs Classification, Regression A linear SVM trained by DCD Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S. S., & Sundararajan, S. (2008). A Dual Coordinate Descent Method for Large-scale Linear SVM. Proceedings of the 25th international conference on Machine learning - ICML ’08 (pp. 408–415). New York, New York, USA: ACM Press. doi:10.1145/1390156.1390208
DCD Classification, Regression A linear SVM trained by DCD, without the shrinkage optimization
Linear Batch Classification, Regression A generic loss based solver that uses batch optimization methods, supports L2 regularization
Linear SGD Classification, Regression A generic loss based solver that uses stochastic online methods, supporting elastic net regularization
NewGLMNET Classification An implementation of Elastic Net regularized Logistic Regression Yuan, G., Ho, C.-H., & Lin, C. (2012). An improved GLMNET for L1-regularized logistic regression. Journal of Machine Learning Research, 13, 1999–2030. doi:10.1145/2020408.2020421
Adaptive Regularization of Weight Vectors (AROW) Classification, Online Learning Crammer, K., Kulesza, A., & Dredze, M. (2013). Adaptive regularization of weight vectors. Machine Learning, 91(2), 155–187. doi:10.1007/s10994-013-5327-x
ALMA2 Classification, Online Learning Gentile, C. (2002). A New Approximate Maximal Margin Classification Algorithm. The Journal of Machine Learning Research, 2, 213–242.
ROMMA Classification, Online Learning Li, Y., & Long, P. M. (2002). The Relaxed Online Maximum Margin Algorithm. Machine Learning, 46(1-3), 361–387. doi:10.1023/A:1012435301888
Soft Confidence-Weighted (SCW), Classification, Online Learning Crammer, K., Fern, M., & Pereira, O. (2008). Exact Convex Confidence-Weighted Learning. In Advances in Neural Information Processing Systems 22 (pp. 345–352). Retrieved from here
Passive Aggressive Classification, Regression, Online Learning Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585.
Support Passive Aggressive Classification, Online Learning This is a multi-class generalization of Passive Aggressive Matsushima, S., Shimizu, N., Yoshida, K., Ninomiya, T., & Nakagawa, H. (2010). Exact Passive-Aggressive Algorithm for Multiclass Classification Using Support Class. SIAM International Conference on Data Mining - SDM (pp. 303–314). Retrieved from here
Pegasos Classification, Online Learning An online linear SVM Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos : Primal Estimated sub-GrAdient SOlver for SVM. 24th international conference on Machine learning (pp. 807–814). New York, NY: ACM. doi:10.1145/1273496.1273598
Normal Herd (NHERD) Classification, Online Learning Its related to AROW and Passive Aggressive Crammer, K., & Lee, D. D. (2010). Learning via Gaussian Herding. Pre-proceeding of NIPS (pp. 451–459). Retrieved from here
Sparse Truncated Gradient Descent (STGD) Classification, Regression, Online Learning An online algorithm for L1 regularized models Langford, J., Li, L., & Zhang, T. (2009). Sparse online learning via truncated gradient. The Journal of Machine Learning Research, 10, 777–801.
RidgeRegression Regression A simple batch implementation of Ridge Regression

Kernel Based

Algorithm Purpose Relation / INFO References
Platt's SMO Classification, Regression Platt, J. C. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Advances in kernel methods (pp. 185 – 208). Retrieved from here
Kernel SGD Classification, Regression, Online Learning see KernelPoint building block. This provides a way to directly train a Kernel based model via SGD
Double Update Online Learning (DUOL) Classification, Online Learning This is a kernelized extension of the Passive Aggressive algorithm Zhao, P., Hoi, S. C. H., & Jin, R. (2011). Double Updating Online Learning. Journal of Machine Learning Research, 12, 1587–1615. Retrieved from here
Online Sparse Kernel Learning by Sampling and Smooth Losses (OSKL) Classification, Online Learning This is a bounded method for kernel learning meant for smooth loss functions like the Logistic Loss. Zhang, L., Yi, J., Jin, R., Lin, M., & He, X. (2013). Online Kernel Learning with a Near Optimal Sparsity Bound. In S. Dasgupta & D. Mcallester (Eds.), Proceedings of the 30th International Conference on Machine Learning (ICML-13) (Vol. 28, pp. 621–629). JMLR Workshop and Conference Proceedings.
PegasosK Classification, Online Learning A kernelized version of Pegasos Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos : Primal Estimated sub-GrAdient SOlver for SVM. 24th international conference on Machine learning (pp. 807–814). New York, NY: ACM. doi:10.1145/1273496.1273598
Projectron Classification, Online Learning This is a kernalized version of the Perceptron that uses Projection as its method of bounding the number of SVs Orabona, F., Keshet, J., & Caputo, B. (2009). Bounded Kernel-Based Online Learning. The Journal of Machine Learning Research, 10, 2643–2666.
Bounded Online Gradient Descent (BOGD) Classification, Online Learning Zhao, P., Wang, J., Wu, P., Jin, R., & Hoi, S. C. H. (2012). Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning. In Proceedings of the 29th International Conference on Machine Learning (pp. 169–176). Learning; Machine Learning. Retrieved from here
Least Squares Support Vector Machine (LS-SVM) Classification, Regression This related to the Kernel SVM, but simpler to implement. Its equivalent to Kernel Ridge Regression, but trained differently - allowing for potentially larger data sets Suykens, J., & Vandewalle, J. (1999). Least Squares Support Vector Machine Classifiers. Neural processing letters, 9(3), 293–298. doi:10.1023/A:1018628609742
Kernel Ridge Regression Regression This is a direct kernelization of the Ridge Regression algorithm
Kernel Recursive Least Squares (KernelRLS) Regression, Online Learning This is a kernelizaion of the RLS algorithm, and the first paper to use projection for bounded learning Engel, Y., Mannor, S., & Meir, R. (2004). The Kernel Recursive Least-Squares Algorithm. IEEE Transactions on Signal Processing, 52(8), 2275–2285. doi:10.1109/TSP.2004.830985
Conservative Stochastic Kernel Logistic Regression (CSKLR) Classification, Online Learning Zhang, L., Jin, R., Chen, C., Bu, J., & He, X. (2012). Efficient Online Learning for Large-Scale Sparse Kernel Logistic Regression. Twenty-Sixth AAAI Conference on Artificial Intelligence (pp. 1219–1225). Retrieved from here

Tree Based

Algorithm Purpose Relation / INFO References
ID3 Classification A simple decision tree for only nominal features
Decision Tree Classification, Regression This is a generic implementation, allowing the ability to mimic the behavior of many tree algorithms like C4.5 and CART
Random Tree Classification , Regression This is a decision tree that choses a random subset of features at each iteration to consider
Random Forest Classification, Regression This is a collection of random trees Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. Retrieved from here
Extra Tree Classification, Regression This algorithm is an extremely randomized tree builder Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees . Machine learning, 63(1), 3–42. doi:10.1007/s10994-006-6226-1
Extra Random Trees (ERTrees) Classification, Regression A collection of Extra Trees

Nearest Neighbor / Vector Quantization based

Algorithm Purpose Relation / INFO References
Nearest Neighbor Classification, Regression
Discriminant Adaptive Nearest Neighbor. DANN Classification, Regression An extension of Nearest Neighbor that uses a region adaptive distance metric Hastie, T., & Tibshirani, R. (1996). Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6), 607–616. doi:10.1109/34.506411
Learning Vector Quantization (LVQ) Classification
LVQ with Locally Learned Classifier (LVQ-LLC) Classification This is an origianl extension of LVQ
Self Organizing Map (SOM) Classification, Clustering
Locally Weighted Learner (LWL) Classification, Regression This algorithm builds a local model for every query, and uses that local model to make predictions Atkeson, C., Moore, A., & Schaal, S. (1997). Locally Weighted Learning. Artificial intelligence review, 11–73

Meta Algorithms

Boosting Based

Algorithm Purpose Relation / INFO References
AdaBoostM1 Classification The original boosting algorithm
AdaBoostM1PL Classification A parallel extension of AdaBoost Scalable and Parallel Boosting with MapReduce, Indranil Palit and Chandan K. Reddy, IEEE Transactions on Knowledge and Data Engineering
SAMME Classification An extension of AdaBoostM1 that works better on multi-class problems Multi-class AdaBoost by Ji Zhu, Saharon Rosset, Hui Zou, & Trevor Hasstie
Logit Boost Classification A boosting algorithm for classification that uses regressors as the weak learners Special Invited Paper Additive Logistic Regression: A Statistical View of Boosting, By Jerome Friedman, Trevor Hastie and Robert Tibshirani. The Annals of Statistics 2000, Vol. 28, No. 2, 337–407
LogitBoostPL Classification A parallel extension of Logit Boost Scalable and Parallel Boosting with MapReduce, Indranil Palit and Chandan K. Reddy, IEEE Transactions on Knowledge and Data Engineering
Modest AdaBoost Classification An extension of AdaBoost that reduces overfitting Vezhnevets, A., & Vezhnevets, V. (2005). “Modest AdaBoost” – Teaching AdaBoost to Generalize Better. GraphiCon. Novosibirsk Akademgorodok, Russia. Retrieved from here
EmphasisBoost Classification A generalization of AdaBoos, with RealAdaBoost as a special case Gómez-Verdejo, V., Ortega-Moral, M., Arenas-García, J., & Figueiras-Vidal, A. R. (2006). Boosting by weighting critical and erroneous samples. Neurocomputing, 69(7-9), 679–685. doi:10.1016/j.neucom.2005.12.011

Other

Algorithm Purpose Relation / INFO References
Bagging Classification, Regression An ensembling technique for reducing variance Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. doi:10.1007/BF00058655
Wagging Classification, Regression An ensembling technique for models that support weighted data instances Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 38(1998), 1–38.
Arc-X4 Classification A simple ensemble technique for models that support weighted data instances Breiman, L. (1998). Arcing Classifiers. The Annals of Statistics, 26(3), 801–824.
Stacking Classification, Regression Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259.
UpdatableStacking Classification, Regression, OnlineLearning A generalization of Stacking for online models Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259.
OneVsAll Classification Extends binary classifiers to multi-class ones
OneVsOne Classification Extends binary classifiers to multi-class ones
Decision Directed Acyclic Graph (DDAG) Classification Extends binary classifiers to multi-class ones, and is related to OneVsOne Platt, J. C., Cristianini, N., & Shawe-Taylor, J. (2000). Large margin DAGs for multiclass classification. In Advances in Neural Information Processing Systems (pp. 547–553). MIT Press. Retrieved from http://www.weizmann.ac.il/mathusers/bagon/CVspring07/files/DAGSVM.pdf