A Working Set Algorithm
for Support Vector Regression Problems
Abstract.
Support vector regression (SVR) is widely applied to nonlinear regression tasks but remains computationally demanding due to its dual quadratic formulation and structural constraints. In this paper, we propose a working set algorithm, termed WSA–SVR, that extends the primal simplex method recently developed for SVM classification to the regression framework. WSA–SVR iteratively generates a sequence of basic feasible solutions converging to the optimal solution. Its key feature lies in preserving primal feasibility while employing efficient rank-one updates, thereby avoiding null-space and reduced-Hessian projections. This guarantees both numerical stability and scalability. Extensive experiments on benchmark datasets demonstrate that WSA–SVR converges more rapidly than state-of-the-art solvers while maintaining competitive or superior predictive accuracy.
Key words and phrases:
Support vector regression, convex quadratic programming, simplex method, working basis, kernel methods.2005 Mathematics Subject Classification:
90C20, 90C25, 65K05, 68T05, 90C901. Introduction
Support Vector Regression (SVR) is a powerful machine learning algorithm widely used for regression tasks, offering a principled framework for estimating real-valued functions [26, 21]. Beyond its theoretical foundations, SVR has been successfully applied in a wide range of domains, such as financial forecasting [24], bioinformatics and medical imaging [22, 28], meteorology [6], and more recently, environmental and hydrological modeling [15]. This shows that the method is versatile and reliable, even when dealing with complex, noisy, or nonlinear data.
The classical SVR problem is formulated by introducing an -insensitive region, called the -tube, symmetrically surrounding the regression function. Points outside this tube are penalized, while those inside incur no penalty [1, 9, 21]. Training an SVR model consists of solving a convex optimization problem that balances model complexity with predictive accuracy [15, 28, 12, 23]. This objective seeks to minimize model complexity while controlling errors larger than , thus ensuring robust generalization even in the presence of noise or nonlinearities.
However, solving the primal problem directly is often computationally challenging. The dual formulation, expressed as a convex quadratic program (QP) with an equality constraint and box constraints on pairs of dual variables, is better suited to kernel methods but introduces specific algorithmic difficulties due to the -insensitive loss and coupling constraints [17, 20, 29].
Several approaches have been developed to tackle this optimization framework. Exact QP solvers ensure convergence but scale poorly to large datasets [16]. Decomposition methods, especially sequential minimal optimization (SMO) and its variants as implemented in LIBSVM [5], are widely adopted for large-scale problems [18, 5]. Active-set strategies, initially studied for SVM classification, have motivated extensions to SVR, offering efficient rank-one updates and improved numerical stability [27, 14]. For large-scale regression problems, primal-stochastic techniques such as coordinate descent and stochastic gradient descent, combined with kernel approximations (Nyström, Random Fourier Features), enable scalability [11, 19]. Furthermore, variants such as least squares SVR (LS-SVR), smooth -SVR, linear programming formulations, and online-based algorithms have been developed to extend SVR to noisy and streaming data [22, 12, 11, 13].
In this context, we propose a novel approach, the Working Set Algorithm for Support Vector Regression (WSA–SVR). While inspired by the active-set framework recently introduced for SVM classification [4] and support methods [10, 3] for convex quadratic programming, our method provides a specific extension to the regression setting. The method explicitly addresses the structural constraints of the dual formulation of an –SVR, and generates a sequence of feasible basic solutions converging to an optimal solution. Each iteration of WSA–SVR involves four steps: identifying the most violating variable, computing a descent direction, selecting a step size that decreases the objective while maintaining feasibility, and changing the working basis. A key feature of our approach is that it directly handles the dual variables in pairs as originally formulated, guarantees the non-singularity of the basis matrix throughout the iterations, and avoids the costly computation of the reduced Hessian. Consequently, the proposed WSA–SVR remains simple to implement and easy to understand.
The proposed method is empirically evaluated on several regression datasets issue from the UCI repository [8], using both linear and RBF kernels. Numerical experiments are encouraging and indicate that WSA-SVR converges faster than conventional solvers such as SMO and LIBSVM, while achieving comparable or superior predictive accuracy. A theoretical study of convergence and complexity is also conducted to validate the robustness of the algorithm.
The remainder of this paper is organized as follows. Section 2 reviews the SVR framework, where the primal formulation and its associated dual problem are presented. Section 3 presents the suggested WSA–SVR algorithm, and Section 4 details the iterative update scheme. Convergence guarantees and complexity computation of WSA–SVR are discussed in Section 5. Experimental results and comparisons with standard solvers (SMO-MAT and LIBSVM) are reported in Section 6, and Section 7 concludes the paper.
2. Background
Let be a training set, where are input features and are scalar outputs. Here, and represent respectively the number of examples and features. The primary objective of an SVR model [26, 22] is to approximate a nonlinear function that best fits the training data within an -insensitive margin while remaining as flat as possible. Its form is as follows:
where is the weight vector, the scalar is called bias, and the application is a feature mapping to a reproducing kernel Hilbert space . In practice, the kernel function allows the implicit computation of dot products in , assuming that satisfies Mercer’s condition [22, 21]. The symbol (′) is used to the transposition of vectors or matrices.
To tolerate small deviations, the SVR uses an -insensitive loss function. Slack variables and are introduced to handle violations beyond the margin . The primal formulation of an SVR problem is the following convex optimization problem:
| (1) | ||||
| s.t. | ||||
Here, denotes the regularization parameter that governs the trade-off between model flatness and error tolerance. Let and be the -dimensional vectors of Lagrange multipliers associated with the pair of general constraints in the primal problem. By introducing these multipliers, the dual formulation can be expressed as:
| (2) | ||||
| s.t. | ||||
For numerical implementation, we define:
where is a vector of ones of an appropriate dimension. Accordingly, the dual problem can be expressed in matrix form as:
| (3) | ||||
| s.t. | ||||
where is the Gram matrix associated with the kernel function. Being a combination of positive semi-definite kernel matrices, is itself symmetric and positive semi-definite, which ensures the convexity of the dual problem (3).
As in classical SVM classification, the optimal solution of an SVR model is typically sparse, meaning that the estimated regression function is determined by only a small subset of training examples, known as support vectors. This set is formally defined as
The final regression function can be expressed as follows:
Since the dual problem (3) is convex, the Karush–Kuhn–Tucker (KKT) conditions [16] are both necessary and sufficient for the optimality. In particular, a feasible point is optimal if and only if it satisfies the following KKT conditions:
| (4) |
where and are the Lagrange multipliers associated with the lower and upper box constraints, respectively. This yields:
-
•
If , then ,
-
•
If , then ,
-
•
If , then .
3. Principle of the new method
Unlike conventional strategies based on decomposition or interior point methods, we propose a direct Working Set Algorithm for solving the dual SVR quadratic program (3). The proposed method, termed WSA–SVR, is inspired by the primal simplex framework recently introduced in [4], and has been significantly adapted to address the specific structure and constraints of support vector regression. Our formulation avoids the use of null-space projections and reduced Hessians.
The WSA–SVR algorithm constructs a sequence of support feasible solutions (SFS) each defined by a working basis for which the associated Hessian submatrix is nonsingular. The SFS monotonically reduces the objective function value, with each update respecting both the box and equality constraints. The main process includes:
-
•
Identifying the most violating non-basic variable according to its reduced cost.
-
•
Computing a feasible descent direction by finding a KKT conditions.
-
•
Finding the smallest stepsize, , that guarantees descent and maintains feasibility.
-
•
Updating the working set accordingly to keep the basis nonsingular.
Notations and Problem Structure
Let denote the full index set for the decision variables vector . We decompose into two disjoint subsets: the basic index set and the non-basic index set , with . Accordingly, we partition the vector and other relevant vectors and matrices as follows:
where:
-
•
and denote, respectively, basic and non-basic variables;
-
•
and are the basic and non-basic components of the responses, respectively;
-
•
and denote the corresponding partitions of the linear term in the quadratic program;
-
•
and denote the corresponding partitions of the constraint vector ;
-
•
is the principal submatrix of associated with the basic variables;
-
•
;
-
•
is the submatrix for non-basic variables.
We begin by formalizing the structural components that underpin the WSA–SVR framework.
Definition 1 (Working basis [4]).
A nonempty subset of indices is called a working-basis (support) for problem (3), if the associated basic matrix is nonsingular. Its complement is the non-basic set which is noted .
Definition 2 (Support feasible solution (SFS)).
A pair is called a support feasible solution (SFS) if the vector is feasible for the dual problem (3), and forms a working basis. Furtehermore, it is said to be non-degenerate if all the basic variables lie strictly within the bounds:
Definition 3 (Reduced costs vector).
Given an SFS for the QP (3), the reduced costs vector is defined as
where the bias is chosen such that the basic component satisfies , i.e.,
The optimality conditions (4) are reformulated with respect to the support set and can be established in a manner analogous to the arguments in [10, 3, 4].
Theorem 4 (Optimality conditions).
Let be an SFS for the QP (3). Then, the following relations:
| (5) |
are sufficient for the optimality of the point and are also necessary if the SFS is nondegenerate.
Note that the key difference between the optimality conditions stated above and the standard KKT conditions (4) is that our conditions apply only to the non-basic variables, whereas the KKT conditions must hold for all variables. This constitutes a significant simplification of the optimality verification process.
4. Iteration Scheme
The aim of the proposed working set method, named WSA–SVR, is to solve the dual formulation of an -SVR problem. WSA–SVR generates a sequence of support feasible solutions that converges globally to an optimal solution of the QP (3). As any optimization method, our WSA-SVR algorithm needs a first feasible solution to start the resolution process. For our case, this solution can be calculated simply by choosing the initial working-basis as and the feasible point . Thus, the corresponding bias and the reduced costs vector are, respectively, given by the following explicit formulas.
| (6) |
An iteration of WSA-SVR algorithm consists to calculate a new solution from the current iterate using the update formula:
| (7) |
where is the search direction and is the stepsize. The associated bias and its reduced costs vector are updated as follows:
| (8) |
where , and the vector is defined by:
| (9) |
The first step of WSA–SVR algorithm is to test the optimality of the current SFS . If the optimality relations (5) of Theorem 4 are satisfied, then is an optimal solution of the problem and the algorithm terminates. Otherwise, the algorithm selects the non-basic index that exhibits the largest violation.
| (10) |
where denote the subset of non-basic indices for which the optimality conditions (5) are not satisfied.
Selecting thus fulfills two roles: it detects the most critical violation of the KKT conditions and specifies the entering variable used to construct the descent direction.
4.1. Search direction
Let be the non-basic index determined by rule (10). The search direction is then constructed as follows.
| (11) |
This defines the non-basic component .
The basic component is then computed to satisfy the feasibility and stationarity conditions. Its explicit expression is given by:
| (12) |
where the scalar is computed to enforce the equality constraint . Its value is:
| (13) |
4.2. Stepsize selection
The stepsize is chosen to ensure that the updated solution remains feasible and the value of the objective function decreases at . Thus, it is defined as follows:
| (14) |
where each term corresponds respectively to:
- the minimum feasible step allowed by the non-basic variable ,
- the minimum permissible step before any basic variable violates its bounds,
- the stepsize that minimizes the objective function along the non-basic component .
To ensure feasibility, the novel solution must satisfy the following box constraints:
where denotes a non-basic index and is the set of basic indices.
Calculate and
The maximum feasible step for the non-basic index is computed as:
| (15) |
For the basic index , the corresponding step is:
| (16) |
Calculate :
The step size is computed to achieve the maximal reduction in the objective function when moving along the direction . Using the exact second-order Taylor expansion of at point , we obtain:
By substituting the expressions and , and using the feasibility condition , the expression simplifies to:
| (17) |
The optimal value is obtained by solving:
Assuming and using only the non-basic component:
which gives the final expression:
| (18) |
4.3. Update the working-basis
After determining the new feasible solution , the support set is updated according to the active constraint that has become binding.
-
•
Case : The non-basic variable reaches a bound:
No change is made to the basis:
-
•
Case : A basic variable reaches a bound:
Two subcases arise:
-
–
If , remove from the basis:
-
–
If , replace it with :
-
–
-
•
Case : The non-basic index becomes basic:
After this step, the WSA-SVR algorithm proceeds to the next iteration until convergence to the optimal solution.
4.4. Proposed WSA-SVR algorithm
The main steps of the proposed WSA–SVR algorithm are summarized in the following pseudocode.
5. Convergence Analysis and Computational Complexity
Below, we present a proof establishing the finite convergence of WSA–SVR and analyze its computational complexity. The argument follows the same logic as the analysis given for primal simplex-type schemes (see, in particular, [4] and [10]).
We begin with the following proposition, which demonstrates that the direction is a feasible descent direction for the dual formulation (3).
Proposition 5.
At an iteration of WSA–SVR method, let be the search direction, calculated similarly by relations (11) and (12), respectively, and let the step size calculated by formulas (14). Then, the following statements hold:
-
1)
The -vector is a descent feasible direction at .
-
2)
For any other solution , we have:
(19)
Proof.
a) For the first part, it is clear that is a feasible direction. Indeed, is constructed such that and remain feasible, i.e., and . Moreover, by construction of the optimal step size via Eq. (14), satisfies the box constraints in (3). Hence, is a feasible direction.
Now, we establish the finite termination of the proposed WSA–SVR algorithm in the following theorem.
Theorem 6.
The proposed WSA–SVR algorithm generates a sequence of support feasible solutions to problem (3) and terminates in a finite number of iterations at an optimal solution.
Proof.
The dual SVR problem (3) is bounded below, so the sequence of feasible iterates generated by WSA–SVR remains bounded. Each update has the form
where is a feasible search direction and is the maximal step length defined previously.
By Proposition 5, if the current basic feasible solution is non-degenerate, then and the objective function decreases strictly:
In the degenerate case, , so the objective value does not decrease, but the working-basis is updated (see Section 4.3). To avoid cycling, one may invoke a standard anti-cycling rule such as Bland’s rule (see [10, 16]) or rely on simplex-type convergence arguments as in [4], which guarantee progress in the sequence of working sets.
Since the number of possible working-basis is finite, the algorithm cannot generate an infinite sequence of distinct supports without either decreasing the objective function or revisiting a previous working-basis. Consequently, WSA–SVR produces a sequence of iterates that is strictly decreasing in the objective value, except possibly for finitely many degenerate steps. Termination therefore occurs in finite iterations, precisely when no entering variable can be found, i.e., when the KKT conditions of (3) are satisfied and an optimal solution is reached. ∎
In this part, we analyze the computational complexity of our WSA–SVR algorithm. Let denote the number of samples and the size of the working-basis at a typical iteration . The dominant per-iteration cost arises from computing the search directions and , defined in (12) and (9), respectively.
Computing requires solving a linear system with the basis matrix . This is typically performed using the Cholesky factorization, which incurs a cost of if recomputed from scratch. However, when employing rank-one updates of the Cholesky factors, this cost can be reduced to , as shown in [4]. Step-size computation and primal–dual updates involve comparatively negligible costs of and , respectively. In contrast, evaluating the non-basic reduced costs component requires matrix–vector multiplications with a complexity of .
Hence, the overall per-iteration complexity is bounded by
This demonstrates that the WSA–SVR algorithm achieves polynomial-time complexity with respect to the size of the training set, in addition to ensuring finite convergence.
6. Experimental Results
In this section, the proposed WSA-SVR method is empirically evaluated and compared against two widely used SVR solvers: LIBSVM and SMO-MAT. LIBSVM [5] is a standard library for support vector machines, extensively optimized for both classification and regression tasks, and coded in the C++ Language that is widely regarded as the reference implementation for SVR. SMO-MAT refers to MATLAB’s built-in SVR solver, implemented through the fitrsvm function, which relies on a variant of Platt’s SMO algorithm. The fitrsvm interface offers efficient routines for both linear and nonlinear kernels and is fully integrated within MATLAB’s machine learning toolbox. The experiments were conducted with both RBF and linear kernels on five benchmark datasets: Mpg, Housing, Mg, Space_ga, and Abalone.
All experiments were run on a personal computer, without GPU acceleration. The implementation of WSA-SVR, along with LIBSVM (via the MATLAB interface) and SMO-MAT, was carried out in MATLAB R2025a.
Prior to training, all datasets were standardized to zero mean and unit variance. This preprocessing step ensures that each feature contributes equally to the training process, thereby improving both convergence and numerical stability of the SVR algorithms.
To ensure fair comparison, a five-fold cross-validation technique was used to select hyper-parameters for each dataset. For models that used the RBF kernel, a grid search was conducted across and . For the linear kernel, was selected from the range .
The RBF (Gaussian) kernel is defined as:
| (20) |
where is the kernel width parameter. The linear kernel corresponds to the inner product:
| (21) |
The optimal hyper-parameters obtained for all datasets through five-fold cross-validation are reported in Table 1 for both kernels. These configurations were subsequently used for training and evaluation across all SVR methods to guarantee a consistent and unbiased comparison between solvers.
| Dataset | Inputs | Samples | RBF Kernel | Linear Kernel | |
|---|---|---|---|---|---|
| Mpg | 7 | 392 | 64 | 0.125 | 16 |
| Housing | 13 | 506 | 64 | 0.0625 | 4 |
| Mg | 6 | 1385 | 1 | 0.5 | 4 |
| Space_ga | 6 | 3107 | 128 | 0.125 | 16 |
| Abalone | 8 | 4177 | 16 | 0.0625 | 64 |
The performance of the models was evaluated using several criteria: the number of iterations until convergence, training time (in seconds), the number of support vectors (NSV), and two regression accuracy metrics: the mean squared error (MSE) and the coefficient of determination .
The MSE is defined as:
| (22) |
where is the true output, the predicted output, and the number of samples.
The coefficient of determination is given by:
| (23) |
where is the mean of the observed data.
| (24) |
Table 2 and Table 3 present numerical results that show the effectiveness and robustness of the proposed WSA–SVR algorithm across diverse benchmark datasets and kernel types. WSA–SVR shows good convergence behavior and attains predictive accuracy equal to, and sometimes greater than, that of state-of-the-art solvers including SMO-MAT and LIBSVM.
| Dataset | Method | Iterations | Time(s) | NSV | MSE | |
|---|---|---|---|---|---|---|
| Mpg | WSA-SVR | 586 | 0.0966 | 375 | 4.4902 | 0.9261 |
| SMO-MAT | 3305 | 0.0229 | 377 | 4.4956 | 0.9260 | |
| LIBSVM | 6120 | 0.0378 | 375 | 4.4903 | 0.9263 | |
| Housing | WSA-SVR | 887 | 0.2241 | 480 | 4.7924 | 0.9432 |
| SMO-MAT | 4287 | 0.0384 | 486 | 4.7909 | 0.9432 | |
| LIBSVM | 7328 | 0.0658 | 480 | 4.7924 | 0.9445 | |
| Mg | WSA-SVR | 841 | 0.3932 | 474 | 0.0118 | 0.7796 |
| SMO-MAT | 1960 | 0.0445 | 493 | 0.0118 | 0.7795 | |
| LIBSVM | 2258 | 0.3581 | 479 | 0.0118 | 0.7806 | |
| Space-ga | WSA-SVR | 1529 | 3.3463 | 831 | 0.0089 | 0.7877 |
| SMO-MAT | 4897 | 0.2308 | 871 | 0.0089 | 0.7866 | |
| LIBSVM | 4388 | 1.7257 | 842 | 0.0089 | 0.7881 | |
| Abalone | WSA-SVR | 3351 | 3.0348 | 3914 | 4.4024 | 0.5864 |
| SMO-MAT | 2391 | 1.1571 | 3899 | 4.4024 | 0.5863 | |
| LIBSVM | 3319 | 3.2790 | 3917 | 4.4024 | 0.5965 |
| Dataset | Method | Iterations | Time(s) | NSV | MSE | |
|---|---|---|---|---|---|---|
| Mpg | WSA-SVR | 1238 | 0.2642 | 389 | 11.5610 | 0.8097 |
| SMO-MAT | 4146 | 0.3912 | 392 | 11.4183 | 0.8121 | |
| LIBSVM | 20920 | 0.0388 | 389 | 11.5606 | 0.8130 | |
| Housing | WSA-SVR | 1136 | 0.1040 | 506 | 24.6955 | 0.7075 |
| SMO-MAT | 2495 | 0.1718 | 506 | 24.6734 | 0.7077 | |
| LIBSVM | 10449 | 0.0224 | 506 | 24.6852 | 0.7174 | |
| Mg | WSA-SVR | 9907 | 1.2648 | 1371 | 0.0219 | 0.5732 |
| SMO-MAT | 54324 | 1.5300 | 1385 | 0.0218 | 0.5740 | |
| LIBSVM | 235970 | 0.4469 | 1374 | 0.0219 | 0.5767 | |
| Space-ga | WSA-SVR | 15235 | 4.2888 | 3084 | 0.0164 | 0.5818 |
| SMO-MAT | 719847 | 47.2932 | 3107 | 0.0558 | 0.5820 | |
| LIBSVM | 2604864 | 9.6216 | 3094 | 0.0164 | 0.5823 | |
| Abalone | WSA-SVR | 34214 | 13.2915 | 4155 | 5.0608 | 0.5149 |
| SMO-MAT | 276643 | 17.1626 | 4177 | 5.2831 | 0.4917 | |
| LIBSVM | 2785316 | 7.8118 | 4158 | 5.0485 | 0.5320 |
In terms of convergence, the number of iterations necessary for WSA–SVR was generally lower than for its competitors to compute the optimal solution. For example, on the Space-ga dataset with an RBF kernel and , WSA–SVR converged in 1529 iterations (see Table 2), but SMO-MAT and LIBSVM needed 4897 and 4388 iterations, respectively. Similar trends were observed with the linear kernel, where WSA–SVR consistently and stably converged. For instance, when using the Mg dataset with the linear kernel and , WSA–SVR achieved similar prediction accuracy as the other methods while requiring fewer iterations (see Table 3).
The principal reason why WSA-SVR typically converges in fewer iterations than SMO-type methods lies in the nature of the subproblems solved at each step. In the case of an SVR problem, SMO updates two components for each two pairs of dual variables per iteration, which amounts to solving analytically a four-dimensional QP, without the need for an external solver. In contrast, our approach solves at each iteration a QP whose dimension is equal to the size of the working basis .
Concerning execution time, WSA–SVR demonstrates clear efficiency gains on most datasets when using the linear kernel, consistently outperforming or matching SMO-MAT and LIBSVM. This advantage is due to its strong convergence properties combined with the relatively low computational cost of kernel operations in the linear case. In contrast, for the RBF kernel, WSA–SVR’s runtime is sometimes higher despite requiring fewer iterations. For example, on the Space-ga dataset, WSA–SVR’s runtime reached 3.3463 s, compared to 0.2308 s for SMO-MAT and 1.7257 s for LIBSVM. This can be explained not only by the higher per–iteration cost when the size of the working-basis is large, but also by the way the methods are implemented: WSA–SVR is currently coded in Matlab (a high-level language), while SMO-MAT and LIBSVM are implemented in optimized C++ (a low-level language).
Concerning predictive performance, WSA–SVR achieved results comparable to those of the two SMO algorithms in all experiments. This can be explained by the fact that the compared algorithms are equally accurate and yield the same number of support vectors (). For instance, on the Mg dataset with and an RBF kernel, WSA–SVR obtained an score of 0.7796 and an MSE of 0.0113 (see Table 2), which are nearly identical to LIBSVM’s values of and MSE = 0.0113. These findings indicate that WSA–SVR provides a favorable balance between accuracy and computational efficiency for both RBF and linear kernels.
The compact models that WSA–SVR can generate are another significant benefit. In many regression applications, the algorithm frequently chose a comparable or somewhat fewer number of support vectors when compared to LIBSVM and SMO-MAT, resulting in models that are simpler to understand and more useful for implementation.
7. Conclusion
In this paper, we presented WSA–SVR, a new algorithm for solving the dual form of the -SVR problem. Our method extends the primal simplex approach [4] to SVR by taking into account the specific structure of the problem, especially the paired variables and constraints. WSA–SVR algorithm proceeds iteratively by generating a sequence of feasible solutions that converges to an optimal solution. Unlike existing methods, it does not depend on the reduced Hessian matrices and can directly handle positive semidefinite kernels. We also proved that the algorithm converges in a finite number of steps and we calculated its computational complexity.
Numerical experiments on several benchmark datasets demonstrated both the effectiveness and robustness of the proposed algorithm. From a predictive perspective, WSA–SVR achieves accuracy comparable to state-of-the-art solvers, such as SMO-based methods [18, 9] and LIBSVM [5]. From an optimization viewpoint, WSA–SVR typically converges in fewer iterations than SMO-type algorithms. However, as highlighted in Section 5, the per-iteration computational cost depends strongly on the kernel structure: for dense RBF kernels, solving linear systems with a large basis matrix increases runtime, whereas for linear kernels, where matrix–vector operations are inexpensive and the basis size is bounded by the feature dimension, WSA–SVR exhibits superior performance in both convergence speed and CPU time.
Overall, WSA–SVR achieves an effective trade-off between predictive accuracy, convergence guarantees, and computational efficiency, offering a principled alternative to existing SMO-type algorithms. Future work will focus on developing optimized implementations (e.g., leveraging sparse linear algebra and low-rank kernel approximations) and extending the approach to large-scale learning tasks and other kernel-based models.
Acknowledgements.
We would like to thank the editor and reviewers for their valuable and constructive comments, which helped us to improve this paper.
References
- [1] C. C. Aggarwal, Machine Learning for Text, Springer International Publishing (2018).
-
[2]
M. Awad and R. Khanna,
Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers,
Apress, Berkeley, CA, (2015).
https://doi.org/10.1007/978-1-4302-5990-9
- [3] B. Brahmi and M. O. Bibi, Dual Support Method for Solving Convex Quadratic Programs, Optimization, 59(6) (2010), pp. 851–872.
-
[4]
B. Brahmi,
An Efficient Primal Simplex Method for Solving Large-Scale Support Vector Machines,
Neurocomputing, 599 (2024), Art. 128109.
https://doi.org/10.1016/j.neucom.2024.128109
-
[5]
C. C. Chang and C. J. Lin,
LIBSVM: A Library for Support Vector Machines,
ACM Transactions on Intelligent Systems and Technology, 2(3) (2011), pp. 1–27.
https://doi.org/10.1145/1961189.1961199
-
[6]
R. F. Chevalier, G. Hoogenboom, R. W. McClendon and J. A. Paz,
Support vector regression with reduced training sets for air temperature prediction: a comparison with artificial neural networks,
Neural Comput. & Applic., 20 (2011), pp. 151–159.
https://doi.org/10.1007/s00521-010-0363-y
-
[7]
K. L. Du and M. N. S. Swamy,
Support Vector Machines,
in: Neural Networks and Statistical Learning, Springer, London, (2014), pp. 469–524.
https://doi.org/10.1007/978-1-4471-5571-3_16
- [8] D. Dua and C. Graff, UCI Machine Learning Repository, University of California, Irvine, (2017). Available at: http://archive.ics.uci.edu/ml.
-
[9]
G. W. Flake and S. Lawrence,
Efficient SVM Regression Training with SMO,
Machine Learning, 46(1–3) (2002), pp. 271–290.
https://doi.org/10.1023/A:1012474916001
- [10] R. Gabasov, F. M. Kirillova, V. M. Raketsky and O. Kostyukova, Constructive Methods of Optimization, Volume 4: Convex Problems, University Press, Minsk, (1987) (in Russian).
- [11] C. H. Ho and C. J. Lin, Large-scale Linear Support Vector Regression, Journal of Machine Learning Research, 13(1) (2012), pp. 3323–3348.
-
[12]
Y. J. Lee, W. F. Hsieh and C. M. Huang,
-SSVR: a smooth support vector machine for -insensitive regression,
IEEE Transactions on Knowledge and Data Engineering, 17(5) (2005), pp. 678–685.
https://doi.org/10.1109/tkde.2005.77
-
[13]
J. Ma, J. Theiler and S. Perkins,
Accurate On-line Support Vector Regression,
Neural Computation, 15(11) (2003), pp. 2683–2703.
https://doi.org/10.1162/089976603322385117
-
[14]
D. R. Musicant and A. Feinberg,
Active Set Support Vector Regression,
IEEE Transactions on Neural Networks, 15(2) (2004), pp. 268–275.
https://doi.org/10.1109/tnn.2004.824259
-
[15]
M. Najafzadeh and S. Niazmardi,
A Novel Multiple-Kernel Support Vector Regression Algorithm for Estimation of Water Quality Parameters,
Natural Resources Research, 30 (2021), pp. 3761–3775.
https://doi.org/10.1007/s11053-021-09895-5
-
[16]
J. Nocedal and S. J. Wright,
Numerical Optimization,
Springer, New York, NY, (2006).
https://doi.org/10.1007/978-0-387-40065-5
-
[17]
X. Peng and D. Xu,
Projection support vector regression algorithms for data regression,
Knowledge-Based Systems, 112 (2016), pp. 54–66.
https://doi.org/10.1016/j.knosys.2016.08.030
-
[18]
J. C. Platt,
Fast Training of Support Vector Machines Using Sequential Minimal Optimization,
in: B. Schölkopf, C. J. Burges and A. J. Smola (eds.),
Advances in Kernel Methods: Support Vector Learning,
MIT Press, Cambridge, MA, (1998), pp. 185–208.
https://doi.org/10.7551/mitpress/1130.003.0016
- [19] A. Rahimi and B. Recht, Random Features for Large-Scale Kernel Machines, In Advances in Neural Information Processing Systems (NIPS), (2007), pp. 1177–1184.
-
[20]
P. R. Perea and J. C. Ruiz,
An Algorithm for Training a Large Scale Support Vector Machine for Regression Based on Linear Programming and Decomposition Methods,
Pattern Recognition Letters, 34(4) (2013), pp. 439–451.
https://doi.org/10.1016/j.patrec.2012.10.026
- [21] B. Schölkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, (2002).
-
[22]
A. J. Smola and B. Schölkopf,
A Tutorial on Support Vector Regression,
Statistics and Computing, 14(3) (2004), pp. 199–222.
https://doi.org/10.1023/b:stco.0000035301.49549.88
-
[23]
G. Steidl,
Supervised Learning by Support Vector Machines,
in: O. Scherzer (ed.), Handbook of Mathematical Methods in Imaging, Springer, New York, (2015), pp. 1393–1453.
https://doi.org/10.1007/978-1-4939-0790-8_22
-
[24]
F. E. H. Tay and L. Cao,
Application of Support Vector Machines in Financial Time Series Forecasting,
Omega, 29(4) (2001), pp. 309–317.
https://doi.org/10.1016/S0305-0483(01)00026-3
-
[25]
Y. Torii and S. Abe,
Decomposition Techniques for Training Linear Programming Support Vector Machines,
Neurocomputing, 72(4–6) (2009), pp. 973–984.
https://doi.org/10.1016/j.neucom.2008.04.008
- [26] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, (1995).
-
[27]
M. Vogt and V. Kecman,
Active-Set Methods for Support Vector Machines,
in: L. Wang (ed.), Support Vector Machines: Theory and Applications,
Springer, (2005), pp. 133–158.
https://doi.org/10.1007/10984697_6
-
[28]
F. Zhang and L. J. O’Donnell,
Support Vector Regression,
in: A. Mechelli and S. Vieira (eds.), Machine Learning,
Academic Press, (2020), pp. 123–140.
http://dx.doi.org/10.1016/B978-0-12-815739-8.00007-9
-
[29]
Y. Zhao and J. Sun,
A fast method to approximately train hard support vector regression,
Neural Networks, 23(10) (2010), pp. 1276–1285.
https://doi.org/10.1016/j.neunet.2010.08.001








