An Exact Penalty Approach for Equality Constrained
Optimization over a Convex Set
Nachuan Xiao
MOS 2025
May 18, 2025
School of Data Science
The Chinese University of Hong Kong, Shenzhen
China
Joint work with Tianyun Tang (NUS), Shiwei Wang (NUS), and Kim-Chuan Toh (NUS)
Basic Formulation
   General Constrained Optimization
                                     min      f ( x ),
                                     x ∈Rn                                                     (NCP)
                                      s. t.   c( x ) = 0,   x ∈ X.
     • X : closed convex subset of Rn , E := aff(X ) − X .
     • Locally Lipschitz smooth f : Rn 7−→ R & c : Rn 7−→ R p .
     • For any given x ∈ K, there exists r > 0 (independent of x) and τx > 0 such that
         • dim({∇     ⊤
               n c(y) d : d ∈ E }) = r holds for any
                                                  o  y ∈ {y ∈ Y ∩ aff(X ) : ∥y − x ∥ ≤ τx }.
         • dim ∇c(y)⊤ d : d ∈ TX (y) ∩ −TX (y)        = r holds for any y ∈ {y ∈ X : ∥y − x ∥ ≤ τx }.
     • Generalization of constraint nondegeneracy.
     • Generality: u( x ) ∈ Z ⇔ u( x ) − s = 0, s ∈ Z .
     • Nonconvex feasible region K := { x ∈ X : c( x ) = 0} & No Riemannian structure available.
                                                                                                        1
On Constraint Qualifications for (NCP)
   CQs
                                                         
     • Nondegeneracy: dim ∇c(y)⊤ d : d ∈ TX (y) ∩ −TX (y)    = p.
                                             
     • Robinson CQ: dim ∇c(y)⊤ d : d ∈ TX (y)    = p.
                                                                           
     • Relaxed constant rank CQ (rCRCQ): dim ∇c(y)⊤ d : d ∈ TX (y) ∩ −TX (y)   = r.
     • Nondegeneracy implies Robinson CQ and rCRCQ.
   Example on the relationship between Robinson CQ and rCRCQ
     • Consider the constraints x1 + x2 = 0, 2x1 + 2x2 = 0, x1 ≥ 0.
                                ∇ c ( y ) ⊤ d1
                           nh                 i                                      o
     • rCRCQ holds: rank         d +d
                                                 : d 1 ∈ R n , d ∈ T ( y ) ∩ −T ( y )
                                                                2   X          X         = 3.
                                     1   2
     • Robinson CQ fails.
    ⇒ rCRCQ is neither weaker or stronger than the Robinson CQ.
     • Can be achieved for c( x ) = u, x ∈ X with almost every u ∈ R p Drusvyatskiy et.al. 2016.
                                                                                                   2
Optimization Approaches for Constrained Optimization
   Augmented Lagrangian method
     • Augmented Lagrangian penalty function: L( x, λ) := f ( x ) − ⟨λ, c( x )⟩ +                ∥c( x )∥2 .
                                                                                             β
                                                                                             2
     • Augmented Lagrangian method:
                           xk+1 ≈ arg min L( x, λk ),     λ k +1 = λ k + ρ k c ( x k +1 ).
                                        x ∈X
     • Complex schemes for dual descent & stopping criteria for subproblems.
   Interior-point method
     • B( x ): Barrier function for X
     • Interior-point method:
                               xk+1 ≈ arg min f ( x ) + ε k B( x ),       ε k → 0.
                                               x ∈X
     • Challenges to design schemes for adjusting ε k & stopping criteria.
   Optimization over stratifiable sets
   Olikier et.al. 2023; Tang-Toh 2024; Yang-Gao-Yuan 2025;......
     • Require detailed characterization of K ⇒ Challenge to implement for general constraints.                3
                                                  Embedded as subroutines
       Approaches for optimization over X                                     Optimization approaches for (NCP)
                         Existing works                                                        Case-by-case analysis
                                                        Multi-level schemes
  Convergence guarantees on optimization over X                               Convergence guarantees on (NCP)
Could approaches for optimization over X have a straightforward implementation for
the general constrained optimization problem (NCP)?
                                                                                                                       4
Penalty Function Approaches
   Replaces constrained optimization problems by problems over X
     • Quadratic penalty function: f ( x ) + β ∥c( x )∥2
       Smooth but inexact ⇒ β → +∞, ill-conditioned.
                                        p
     • ℓ1 penalty function: f ( x ) + β ∑i=1 |ci ( x )|
       Exact but nonsmooth ⇒ Complicated algorithms.
     • Augmented Lagrangian penalty function: L( x, λ) := f ( x ) − ⟨λ, c( x )⟩ +       ∥c( x )∥2
                                                                                    β
                                                                                    2
       Involves multipliers ⇒ Alternating descent-ascent update scheme.
     • No direct implementation through existing penalty function approaches.
                                                                                                    5
Constraint Dissolving Function
   Constraint dissolving function
                                                                    β
                                min        h( x ) := f (A( x )) +     ∥c( x )∥2 .     (CDP)
                                x ∈X                                2
   Constraint dissolving mapping A:
     • A : Rn → Rn is locally Lipschitz smooth in Rn , independent of f .
     • A( x ) = x holds for any x ∈ M.
     • ∇A( x )∇c( x ) = 0 holds for any x ∈ M, where ∇A( x ) ∈ Rn×n refers to the Jacobian of
       A at x.
     • There exists a locally Lipschitz continuous mapping RA : Y → Rn×n such that
         • RA ( x ) = 0 holds for all x ∈ K;
         • for any x ∈ K, we have
                                       [                            
                                                ∇A(y) − RA (y) − I p NX (y) = {0}
                                  ∥y− x ∥≤ωx
            holds for some ω x > 0.
                                                                                                6
First-order Optimality Condition
     • x is a first-order stationary point of (NCP), if there exists λ̃ ∈ R p that satisfies
                                     (
                                       0 ∈ ∇ f ( x ) + ∇c( x )λ̃ + NX ( x ),
                                       c( x ) = 0.
     • x is a ε-first-order stationary point of (NCP), if
                             (
                               dist (0, ∇ f ( x ) + Range(∇c( x )) + NX ( x )) ≤ ε,
                               ∥c( x )∥ ≤ ε.
     • x is a first-order stationary point of (CDP), if 0 ∈ ∇h( x ) + NX ( x ).
     • x is an ε-first-order stationary point of (CDP), if dist (0, ∇h( x ) + NX ( x )) ≤ ε.
                                                                                               7
Basic Properties of A
   For any given x ∈ K:
     • ∇A( x )2 = ∇A( x ).
     • The inclusion ∇A( x )⊤ d ∈ Null(∇c( x )⊤ ) holds for all d ∈ Rn . Moreover, for any
       d ∈ Null(∇c( x )⊤ ), it holds that ∇A( x )⊤ d = d.
     • The equality 0 ∈ ∇A( x )d + NX ( x ) holds if and only if 0 ∈ d + Range(∇c( x )) + NX ( x ).
          0 ∈ ∇A( x )∇ f ( x ) + NX ( x )   ⇐⇒    0 ∈ ∇ f ( x ) + Range(∇c( x )) + NX ( x ).
                                                                                                      8
Equivalence on First-order Stationary Points
   Constraint dissolving approaches
   min     f (x)                                                                       β
   x ∈Rn                          (NCP)        ⇐⇒         min h( x ) := f (A( x )) +     ∥c( x )∥2    (CDP).
   s. t.   c( x ) = 0,   x∈X                              x ∈X                         2
   Equivalence between (NCP) and (CDP)
     • Same first-order stationary points over K for any β ∈ R.
     • For any x ∈ K, any β ≥ β x , over a neighborhood Ω x ⊂ Rn :
           • Same first-order stationary points.
           • Same strict complementarity points.
           • ε-first-order stationary point for (CDP) ⇒ 2ε-first-order stationary points for (CDP).
     • Finding first-order stationary points of (NCP) ⇒ Finding first-order stationary points of
       (CDP).
     • Solving (CDP) ⇒ Minimizing locally Lipschitz smooth h over X .
           • Direct implementations of various existing solvers.
           • Wider range of possible choices for optimization methods.
     • Equivalence on second-order stationary points?
                                                                                                               9
Second-order Optimality Condition
    • M1 ( x ) := {Γ ∈ Rn : ∇h( x ) + Γ = 0, x ∈ X , Γ ∈ NX ( x )},
      C1 ( x ) := {d ∈ Rn : d ∈ TX ( x ), ⟨d, ∇h( x )⟩ = 0}.
    • x is a second-order stationary point of (CDP) if for any d ∈ C1 ( x ), we have
                                      D              E
                                sup     d, ∇2 h( x )d − ζ (Γ, TX2 ( x, d))≥ 0.
                               Γ∈M1 ( x )
    • ζ (d, S) := supy∈S ⟨d, y⟩
    • M2 ( x ) := {(λ, Γ) ∈ R p × Rn : ∇ f ( x ) + ∇c( x )λ + Γ = 0, c( x ) = 0, x ∈ X , Γ ∈
      NX ( x )}, C2 ( x ) := {d ∈ Rn : d ∈ TX ( x ), ⟨∇ f ( x ), d⟩ = 0, d⊤ ∇c( x ) = 0}.
    • x is a second-order stationary point of (NCP) if for any d ∈ C2 ( x ), we have
                               D                             E
                       sup       d, ∇2 f ( x ) − ∇2 c( x )[λ] d − ζ (Γ, TX2 ( x, d))≥ 0.
                   (λ,Γ)∈M2 ( x )
                                                                                               10
Second-order Optimality Condition (Cont’d)
    • M1 ( x ) := {Γ ∈ Rn : ∇h( x ) + Γ = 0, x ∈ X , Γ ∈ NX ( x )},
      C1 ( x ) := {d ∈ Rn : d ∈ TX ( x ), ⟨d, ∇h( x )⟩ = 0}.
    • x is a second-order sufficient condition (SOSC) point of (CDP) if for any d ∈ C1 ( x ), we
      have                             D              E
                                sup      d, ∇2 h( x )d − ζ (Γ, TX2 ( x, d))> 0.
                               Γ∈M1 ( x )
    • M2 ( x ) := {(λ, Γ) ∈ R p × Rn : ∇ f ( x ) + ∇c( x )λ + Γ = 0, c( x ) = 0, x ∈ X , Γ ∈
      NX ( x )}, C2 ( x ) := {d ∈ Rn : d ∈ TX ( x ), ⟨∇ f ( x ), d⟩ = 0, d⊤ ∇c( x ) = 0}.
    • x is a SOSC point of (NCP) if for any d ∈ C2 ( x ), we have
                             D                            E
                     sup      d, ∇2 f ( x ) − ∇2 c( x )[λ] d − ζ (Γ, TX2 ( x, d))> 0.
                   (λ,Γ)∈M2 ( x )
                                                                                                   11
Import Properties of A
   For any given x ∈ K, and any d ∈ TM ( x ):
     • For any i ∈ [ p], 0 = d, ∇A( x )∇2 ci ( x )∇A( x )⊤ d + d, ∇2 A( x )[∇ci ( x )]d .
     • For any w ∈ TK ( x ), ∇2 A( x )[w − PE w]d = 0 and PE ∇2 A( x )[∇A( x )w]d = 0.
     • A( x ) ∈ X satisfies constraint nondegeneracy.
    ⇒ For any w ∈ range(TX ( x )), and Γ ∈ NX ( x ), let w = w1 + w2 for
      w1 ∈ null(∇c( x )⊤ ) ∩ E and w2 ∈ range(∇c( x )) ∩ E , it holds that
                D                        E D              E D                     E
                  Γ, TX2 ( x, ∇A( x )⊤ w) = Γ, TX2 ( x, w) − w2 , ∇2 A( x )[Γ], w2 .
                                                                                            12
Equivalence on Second-order Stationary Points
   Equivalence between (NCP) and (CDP)
     • Twice-differentiability of f , A, & c.
     • The set X is second-order directionally differentiable.
     • For sufficiently large but finite β > 0:
          • Same second-order stationary points over K.
          • Same SOSC points over K.
          • Same strong SOSC points over K.
     • Preserve desirable theoretical properties.
                                                                 13
Theoretical Analysis
   min     f (x)                                                                        β
   x ∈Rn                          (NCP)         ⇐⇒         min h( x ) := f (A( x )) +     ∥c( x )∥2   (CDP).
   s. t.   c( x ) = 0,   x∈X                               x ∈X                         2
   Theoretical properties
     • Optimization over X ⇒ No subproblems required.
     • Same smoothness as f ⇒ Simple gradient & Hessian calculation.
     • Equivalence on stationary points.
    ⇒ Optimization approaches over X
           • Easy implementation to NCP
           • Guaranteed convergence properties:
             global convergence, iteration complexity, local convergence rate, ...
                                                                                                               14
Example
  Proximal gradient descent for optimization over X
    • Basic scheme: xk+1 = ΠX ( xk − ηk ∇h( xk )).
    • Established O(ε−2 )-worst-case complexity Lan et.al. 2025.
   Theorem 1
   Suppose X is compact, the constants Lh,u ≥ 0 and Lh,l ≥ 0 be chosen such that
                   Lh,l                                             L
               −        ∥ x − y∥2 ≤ h( x ) − h(y) − ⟨∇h(y), x − y⟩ ≤ h,u ∥ x − y∥2 .
                    2                                                2
   and choose β sufficiently large. Then let the sequence { xk } be generated by proximal
   gradient method with ηk = L1 , for any K > 1, there exists k ≤ K such that xk is a
                                    h,u
   ι K -first-order stationary point of (NCP), where
                                                         s
                                 max{ Lh,u , Lh,l }        2L2h,u L2X   2Lh,u Lh,l L2X
                                                      
                       ιK = 4                       +1                +                .
                                        Lh,u               K ( K − 1)      K−1
                                                                                            15
Designing Constraint Dissolving Mappings
   Projective mapping Q( x )
     • Q is continuously differentiable over X ;
     • Q( x ) is positive semi-definite for any x ∈ X ;
     • Null( Q( x )) = span(NX ( x )) for all x ∈ X .
   General schemes for A
                                                                                   †
              AQ ( x ) := x − Q( x )∇c( x ) ∇c( x )⊤ Q( x )∇c( x ) + σ ∥c( x )∥2 I p c( x ).
     • Explicit formulation once Q is prefixed.
     • How to construct Q?
                                                                                               16
Choices of Q for Popular X
        Name of constraints                      Formulation of X                  Possible choices of projective mapping
        Box constraints                          { x ∈ Rn : l ≤ x ≤ u }            Q( x ) = Diag(q1 ( x ), . . . , qn ( x ))
                                                                                   where {qi } are Lipschitz smooth, qi (li ) = qi (ui ) = 0, and qi (s) > 0 for
                                                                                   s ∈ ( li , u i )
                                                                                                       ⊤
        Norm ball                                { x ∈ Rn : ∥ x ∥ ≤ u }            Q( x ) = In − xx2
                                                                                                     u
                                                                                                                               ∑n | x |2q−2
                                                                                   Q( x ) = In − 2q Φ sign( x ) ⊙ | x |q−1 x ⊤ + i =1 2q  i                                                                                                         
        Norm ball (ℓq norm with q > 1)           { x ∈ Rn : ∥ x ∥ q ≤ u }                                                                        xx ⊤
                                                                                                    u                                   u
                                                                                                                  2
        Probability simplex                      { x ∈ Rn : x ≥ 0, ∥ x ∥1 = 1}     Q( x ) = Diag( x ) − xx ⊤
        General linear constraints               { x ∈ Rn : A ⊤ x ≤ b }            Q( x ) = In + ( A† )⊤ ((max{b − A⊤ x, 0})⊙2 − 1n ) A†
                                                                                                                                       xx ⊤
                                                                                                                                    "                #
                                                                                                                                              − xy
        Second-order cone                        {( x, y) ∈ Rn × R : ∥ x ∥ ≤ y}    Q( x ) = (∥ x ∥2 + y2 ) In+1 − exp(∥ x ∥2 − y2 )
                                                                                                                                      −yx ⊤   y2
        Spectral constraints                     { X ∈ Rm × s : ∥ X ∥2 ≤ 1}        Q( X ) : Y 7→ Y − XΦ( X ⊤ Y )
        PSD cone                                 { X ∈ Rs × s : X ⪰ 0}             Q( X ) : Y 7→ XYX
        PSD matrices with spectral constraints   { X ∈ Rs×s : X ⪰ 0, ∥ X ∥2 ≤ 1}   Q( X ) : Y 7→ XYX − X 2 YX 2
   Table 1: Some constraints and their corresponding projective mappings. Here Φ is the symmetrization
                                                 ⊤
   of the square matrix, defined as Φ( M) := M+2M .
                                                                                                                                                                   17
Possible Choices of A for Special Constraints
    Formulation of X         Formulations of c                     Constraint dissolving mapping
    { x ∈ Rn : x ≥ 0 }       c( x ) = x ⊤ Hx − 1 with H ∈ S n      A( x ) = x − 12 x ( x ⊤ Hx − 1)
    { x ∈ Rn : x ≥ 0 }       c( x ) = ∥ x ∥qq − 1 with q > 1       A( x ) = x/(1 + 1q (∥ x ∥qq − 1))
    { X ∈ Rs × s : X ⪰ 0 }   c( X ) = Diag( X ) − In               A( X ) = Φ( X (2In − Diag( X )))
    { X ∈ Rm × s : X ≥ 0 }   c( X ) = Diag( X ⊤ X − I p )          A( X ) = X − 12 XDiag( X ⊤ X − I p )
           Table 2: The possible constraint dissolving mappings for some specific constraints.
                                                                                                          18
Numerical Experiment
   Nonnegative PCA
                                                1        2
                                    minn    −     B⊤ x       + ρ ∥ x ∥1
                                    x ∈R        2
                                    s. t.   ∥ x ∥ = 1,   x ≥ 0.
    • B ∈ Rn×q : data matrix.
    • ρ > 0: penalty parameter for sparsity.
   Compared solvers
    • Solvers for (CDP):
         • Self-implemented proximal gradient descent with Barzilai-Borwein stepsizes.
         • Limited-memory BFGS method (L-BFGS-B).
    • Solvers for (NCP):
         • Sequential Least Squares Programming (SLSQP).
         • Byrd-Omojokun Trust-Region SQP method (TRCON).
                                                                                         19
Numerical Results
    Test instances    Solvers    Function value   Feasibility   Stationarity   CPU time (s)
       n = 100        PG-BB     -4.6454325e+01     3.08e-08      1.92e-07          0.03
        q = 50       L-BFGS-B   -4.6454324e+01     1.45e-09      5.55e-08         0.02
      ρ = 0.00        SLSQP     -4.6454324e+01     6.57e-10      6.08e-06         0.12
                      TRCON     -4.6454265e+01     1.45e-13      1.67e-05         4.50
      n = 200         PG-BB     -3.5536922e+01     1.42e-08      8.86e-08          0.04
       q = 50        L-BFGS-B   -3.5536921e+01     5.45e-09      5.88e-08         0.03
      ρ = 0.00        SLSQP     -3.5536981e+01     1.68e-06      7.75e-06         1.27
                      TRCON     -3.5536801e+01     1.30e-12      9.74e-06         11.50
      n = 500         PG-BB     -2.6016263e+01     1.67e-09      9.82e-07          0.07
       q = 50        L-BFGS-B   -2.6016263e+01     3.61e-09      1.18e-07         0.05
      ρ = 0.00        SLSQP     -2.6016282e+01     7.52e-07      5.02e-05         23.40
                      TRCON     -2.6014736e+01     1.42e-11      2.48e-04         56.98
      n = 100         PG-BB     -4.5837277e+01     3.58e-11      7.85e-09          0.04
       q = 50        L-BFGS-B   -4.5837277e+01     4.47e-09      4.74e-08         0.02
      ρ = 0.10        SLSQP     -4.5837277e+01     6.97e-10      8.50e-06         0.12
                      TRCON     -4.5837217e+01     2.99e-13      1.55e-05         5.07
      n = 200         PG-BB     -3.4705156e+01     9.66e-11      1.17e-07          0.04
       q = 50        L-BFGS-B   -3.4705156e+01     8.36e-09      6.16e-08         0.03
      ρ = 0.10        SLSQP     -3.4705156e+01     1.37e-11      1.89e-06         1.27
                      TRCON     -3.4705031e+01     1.74e-12      1.43e-05         10.31
      n = 500         PG-BB     -2.4741473e+01     6.29e-10      8.04e-08          0.07
       q = 50        L-BFGS-B   -2.4741473e+01     7.30e-09      9.07e-08         0.05
      ρ = 0.10        SLSQP     -2.4741474e+01     3.43e-08      9.92e-05         23.23
                      TRCON     -2.4739895e+01     1.35e-11      2.29e-04         57.23
                                                                                              20
Nonconvex Quadratic Programming with Ball Constraints
   Nonconvex QP
                                            1
                                   min        ⟨ x, Qx ⟩ + ⟨c, x ⟩
                                   x ∈Rn    2
                                    s. t.   ∥ x − d∥2 = 1, ∥ x ∥2 ≤ 1.
    • Q ∈ Rn×n : symmetric and indefinite.
    • c ∈ Rn , d = [1/2, 0, . . . , 0]⊤ ∈ Rn .
    • Fit into (NCP) with X = { x ∈ Rn : ∥ x ∥ ≤ 1}
         • L-BFGS-B cannot be applied.
         • Treat ∥ x ∥ ≤ 1 as general nonlinear constraints in SLSQP and TRCON.
                                                                                  21
Numerical Results
    Test instances   Solvers   Function value   Feasibility   Stationarity   CPU time (s)
                     PG-BB     -9.5779118e-01    6.98e-10      8.21e-09          0.03
                     SLSQP     -9.5779118e-01    8.01e-10      7.73e-07          0.01
       n = 100
                     TRCON     -9.5779118e-01    1.10e-06      4.82e-06          0.03
                     PG-BB     -9.6301560e-01    4.00e-10      5.97e-09          0.02
                     SLSQP     -9.6301560e-01    1.60e-10      9.90e-08          0.03
       n = 200
                     TRCON     -9.6301560e-01    5.11e-06      2.12e-05          0.05
                     PG-BB     -9.6802779e-01    4.43e-10      3.63e-09          0.03
                     SLSQP     -9.6802779e-01    4.16e-10      4.15e-08          0.45
       n = 500
                     TRCON     -9.6802779e-01    4.54e-06      1.75e-05          0.12
                     PG-BB     -9.6541928e-01    2.74e-13      1.14e-12          0.01
                     SLSQP     -9.6541928e-01    9.02e-10      5.20e-08          3.48
      n = 1000
                     TRCON     -9.6541928e-01    2.34e-05      9.19e-05          0.46
                     PG-BB     -9.6991471e-01    6.32e-11      1.33e-09          0.04
                     SLSQP     -9.6991471e-01    8.09e-10      4.60e-08         28.03
      n = 2000
                     TRCON     -9.6991471e-01    2.17e-05      8.11e-05          1.04
                     PG-BB     -9.6950753e-01    5.62e-11      2.90e-10          0.28
                     SLSQP     -9.6950753e-01    4.67e-10      3.26e-08        451.33
      n = 5000
                     TRCON     -9.6950753e-01    2.16e-05      8.08e-05         13.71
                                                                                            22
Fair Principal Component Analysis
   Basic formulation
                                                                                        2
                                                                                           
                                                           − Ai⊤ Ai , PP⊤ + c
                                                                            Ai             
                                                                                            
                       min       f fpca ( P) = max
                     P ∈Rn × d                 1≤ i ≤ k 
                                                                     mi                    
                                                                                            
                        s. t.    P⊤ P = Id .
     • Complicited objective function.
     • Introducing auxiliary variable t to simplify the optimization problem,
                            min            z
                     P∈Rn×d ,y∈Rk ,z∈R
                                                                                    2
                                                          − Ai⊤ Ai , PP⊤ + c
                                                                           Ai
                                   s. t.   ∀ i ∈ [ k ],                                 + yi = z
                                                                    mi
                                           ∥ P∥2F = d,        ∥ P∥2 ≤ 1,   y ≥ 0.
                                                                                                   23
Numerical Results
     Test instances    Solvers   Function value   Feasibility   Stationarity   CPU time (s)
     n = 100, k = 2    PG-BB     2.9399515e+00     7.21e-05      5.84e-06          0.41
     n = 100, k = 5    PG-BB     4.9882187e+00     2.31e-05      3.72e-05          1.51
    n = 100, k = 10    PG-BB     5.9220659e+00     7.91e-05      5.37e-05          4.52
     n = 500, k = 2    PG-BB     3.0785754e+00     2.14e-05      8.38e-05          2.23
     n = 500, k = 5    PG-BB     5.4557047e+00     9.13e-06      8.78e-05          9.95
    n = 500, k = 10    PG-BB     6.4795547e+00     2.83e-05      7.98e-05         20.18
    n = 1000, k = 2    PG-BB     3.1810773e+00     1.39e-06      8.06e-05          6.43
    n = 1000, k = 5    PG-BB     5.5793546e+00     1.62e-05      4.76e-05          4.34
    n = 1000, k = 10   PG-BB     6.6260901e+00     6.64e-05      2.62e-05         25.73
    n = 5000, k = 2    PG-BB     3.2227600e+00     4.88e-05      8.56e-05         66.61
    n = 5000, k = 5    PG-BB     5.6587667e+00     2.13e-05      8.60e-05        128.42
    n = 5000, k = 10   PG-BB     6.7409254e+00     1.52e-05      9.62e-05        292.78
                                                                                              24
Conclusion
                                                Constraint dissolving approach (NCP)
           Approaches for optimization over X                                          Optimization approaches for (NCP)
                             Existing works                                                             Direct inherit from existing works
                                                              No existing result
      Convergence guarantees on optimization over X                                    Convergence guarantees on (NCP)
    • Propose constraint dissolving approach:
             min     f (x)                                                                           β
              x ∈X                   (NCP)        =⇒              min h( x ) := f (A( x )) +           ∥c( x )∥2          (CDP)
             s. t.   c( x ) = 0.                                  x ∈X                               2
    • Relationship with (NCP)
         • Same first-order & second-order stationary points.
         • Same SOSC & strong SOSC points.
         • Same strict complementrarity points.
    • Approaches for optimization over X
         • Easy & direct implementation.
         • Guaranteed convergence properties directly from existing works.
                                                                                                                                             25
References
    1. Nachuan Xiao, Tianyun Tang, Shiwei Wang, Kim-Chuan Toh, An exact penalty approach for equality
       constrained optimization over a convex set, arXiv preprint arXiv:2505.02495.
    2. Nachuan Xiao, Xin Liu, Kim-Chuan Toh, Dissolving constraints for Riemannian optimization, Mathematics
       of Operations Research 49(1):366-397. (link)
    3. Xiaoyin Hu, Nachuan Xiao, Xin Liu, Kim-Chuan Toh, An improved unconstrained approach for bilevel
       optimization, SIAM Journal on Optimization, 2023, 33(4): 2801-2829. (link)
    4. Xiaoyin Hu, Nachuan Xiao, Xin Liu, Kim-Chuan Toh, A constraint dissolving approach for nonsmooth
       optimization over the Stiefel manifold, IMA Journal of Numerical Analysis, 2023, drad098. (link)
                                                                                                               26
Thanks for your attention!
   Email: xnc@lsec.cc.ac.cn
                              27