Title: | Stream Suitable Online Support Vector Machines |
---|---|
Description: | Soft-margin support vector machines (SVMs) are a common class of classification models. The training of SVMs usually requires that the data be available all at once in a single batch, however the Stochastic majorization-minimization (SMM) algorithm framework allows for the training of SVMs on streamed data instead Nguyen, Jones & McLachlan(2018)<doi:10.1007/s42081-018-0001-y>. This package utilizes the SMM framework to provide functions for training SVMs with hinge loss, squared-hinge loss, and logistic loss. |
Authors: | Andrew Thomas Jones, Hien Duy Nguyen, Geoffrey J. McLachlan |
Maintainer: | Andrew Thomas Jones <[email protected]> |
License: | GPL-3 |
Version: | 0.2.1 |
Built: | 2024-11-11 04:15:14 UTC |
Source: | https://github.com/andrewthomasjones/ssosvm |
Generate simple simulations for testing of the algorithms.
generateSim(NN = 10^4, DELTA = 2, DIM = 2, seed = NULL)
generateSim(NN = 10^4, DELTA = 2, DIM = 2, seed = NULL)
NN |
Number of observations. Default is 10^4 |
DELTA |
Separation of three groups in standard errors. Default is 2. |
DIM |
Number of dimensions in data. Default is 2. |
seed |
Random seed if desired. |
A list containing:
XX |
Coordinates of the simulated points. |
YY |
Cluster membership of the simulated points. |
YMAT |
YY and XX Combined as a single matrix. |
#100 points of dimension 4. generateSim(NN=100, DELTA=2, DIM=4)
#100 points of dimension 4. generateSim(NN=100, DELTA=2, DIM=4)
Fit SVM with Hinge loss function.
Hinge(YMAT, DIM = 2L, EPSILON = 1e-05, returnAll = FALSE, rho = 1)
Hinge(YMAT, DIM = 2L, EPSILON = 1e-05, returnAll = FALSE, rho = 1)
YMAT |
Data. First column is -1 or 1 indicating the class of each observation. The remaining columns are the coordinates of the data points. |
DIM |
Dimension of data. Default value is 2. |
EPSILON |
Small perturbation value needed in calculation. Default value is 0.00001. |
returnAll |
Return all of theta values? Boolean with default value FALSE. |
rho |
Sensitivity factor to adjust the level of change in the SVM fit when a new observation is added. Default value 1.0 |
A list containing:
THETA |
SVM fit parameters. |
NN |
Number of observation points in YMAT. |
DIM |
Dimension of data. |
THETA_list |
THETA at each iteration (new point observed) as YMAT is fed into the algorithm one data point at a time. |
OMEGA |
Intermediate value OMEGA at each iteration (new point observed). |
YMAT <- generateSim(10^4) h1<-Hinge(YMAT$YMAT,returnAll=TRUE)
YMAT <- generateSim(10^4) h1<-Hinge(YMAT$YMAT,returnAll=TRUE)
Fit SVM with Logistic loss function.
Logistic(YMAT, DIM = 2L, EPSILON = 1e-05, returnAll = FALSE, rho = 1)
Logistic(YMAT, DIM = 2L, EPSILON = 1e-05, returnAll = FALSE, rho = 1)
YMAT |
Data. First column is -1 or 1 indicating the class of each observation. The remaining columns are the coordinates of the data points. |
DIM |
Dimension of data. Default value is 2. |
EPSILON |
Small perturbation value needed in calculation. Default value is 0.00001. |
returnAll |
Return all of theta values? Boolean with default value FALSE. |
rho |
Sensitivity factor to adjust the level of change in the SVM fit when a new observation is added. Default value 1.0 |
A list containing:
THETA |
SVM fit parameters. |
NN |
Number of observation points in YMAT. |
DIM |
Dimension of data. |
THETA_list |
THETA at each iteration (new point observed) as YMAT is fed into the algorithm one data point at a time. |
CHI |
Intermediate value CHI at each iteration (new point observed). |
YMAT <- generateSim(10^4) l1<-Logistic(YMAT$YMAT,returnAll=TRUE)
YMAT <- generateSim(10^4) l1<-Logistic(YMAT$YMAT,returnAll=TRUE)
Fit SVM with Square Hinge loss function.
SquareHinge(YMAT, DIM = 2L, EPSILON = 1e-05, returnAll = FALSE, rho = 1)
SquareHinge(YMAT, DIM = 2L, EPSILON = 1e-05, returnAll = FALSE, rho = 1)
YMAT |
Data. First column is -1 or 1 indicating the class of each observation. The remaining columns are the coordinates of the data points. |
DIM |
Dimension of data. Default value is 2. |
EPSILON |
Small perturbation value needed in calculation. Default value is 0.00001. |
returnAll |
Return all of theta values? Boolean with default value FALSE. |
rho |
Sensitivity factor to adjust the level of change in the SVM fit when a new observation is added. Default value 1.0 |
A list containing:
THETA |
SVM fit parameters. |
NN |
Number of observation points in YMAT. |
DIM |
Dimension of data. |
THETA_list |
THETA at each iteration (new point observed) as YMAT is fed into the algorithm one data point at a time. |
PSI |
Intermediate value PSI at each iteration (new point observed). |
YMAT <- generateSim(10^3,DIM=3) sq1<-SquareHinge(YMAT$YMAT, DIM=3, returnAll=TRUE)
YMAT <- generateSim(10^3,DIM=3) sq1<-SquareHinge(YMAT$YMAT, DIM=3, returnAll=TRUE)
The SSOSVM package allows for the online training of Soft-margin support vector machines (SVMs) using the Stochastic majorization–minimization (SMM) algorithm.
SquareHinge
,Hinge
and Logistic
The function generateSim
can also be used to generate simple test sets.
Andrew T. Jones, Hien D. Nguyen, Geoffrey J. McLachlan
Hien D. Nguyen, Andrew T. Jones and Geoffrey J. McLachlan. (2018). Stream-suitable optimization algorithms for some soft-margin support vector machine variants, Japanese Journal of Statistics and Data Science, vol. 1, Issue 1, pp. 81-108.
This is the primary function for uses to fit SVMs using this package.
SVMFit(YMAT, method = "logistic", EPSILON = 1e-05, returnAll = FALSE, rho = 1)
SVMFit(YMAT, method = "logistic", EPSILON = 1e-05, returnAll = FALSE, rho = 1)
YMAT |
Data. First column is -1 or 1 indicating the class of each observation. The remaining columns are the coordinates of the data points. |
method |
Choice of function used in SVM. Choices are 'logistic', 'hinge' and 'squareHinge'. Default value is 'logistic" |
EPSILON |
Small perturbation value needed in calculation. Default value is 0.00001. |
returnAll |
Return all of theta values? Boolean with default value FALSE. |
rho |
Sensitivity factor to adjust the level of change in the SVM fit when a new observation is added. Default value 1.0 |
A list containing:
THETA |
SVM fit parameters. |
NN |
Number of observation points in YMAT. |
DIM |
Dimension of data. |
THETA_list |
THETA at each iteration (new point observed) as YMAT is fed into the algorithm one data point at a time. |
PSI , OMEGA , CHI
|
Intermediate value for PSI, OMEGA, or CHI (depending on method choice) at each iteration (new point observed). |
Sim<- generateSim(10^4) m1<-SVMFit(Sim$YMAT)
Sim<- generateSim(10^4) m1<-SVMFit(Sim$YMAT)