In this repository, we present a series of experiments to evaluate the performance of different variants of Stochastic Gradient Descent (SGD). Each subsection details a specific variant or aspect of SGD, providing insights into their behavior and effectiveness in various settings.Here we share with you some comparison results.
- SGD with a constant step size performs better initially, but as the process continues, the shrinking step size begins to converge to the optimal solution. In contrast, the constant step size descends quickly at first but then starts to fluctuate around similar values
What happens when is you use sampling without replacement instead:
SGD with constant stepsizes is faster at the beginning, while, SGD with shrinking stepsizes reaches the best solution.
SGD with shrinking stepsizes is faster and performs better than SGD with averaging
Final_Notebook_SGD_Final_Lab.ipynb: All implementationsassets: Contains session images readme
Language: Python
- weight averaging $$ w_{\text{average}}^{(k)} = \frac{(k - \text{start_averaging}) \cdot w_{\text{average}}^{(k-1)} + w^{(k)}}{k - \text{start_averaging} + 1} $$ where :
- $ w_{\text{average}}^{(k)} $ is the averaged weight vector at iteration ( k ),
- $ w^{(k)} $ is the current weight vector at iteration ( k ),
- $ \text{start_averaging} $ is the iteration at which averaging begins
- $ k $ is the index of the current iteration.
Using this formula presents the following advantages:
Memory: The recursive formula only requires storing two weight vectors at a time. This significantly reduces memory usage.
Computation: It reduces the computational cost and makes it more efficient in real-time.
Clone the project
git clone https://github.com/Omer-alt/-Stochastic-Gradient-Descent.gitRun the notebook with Google collab or locally with jupiter.