Stochastic Gradient Descent implementation

In this repository, we present a series of experiments to evaluate the performance of different variants of Stochastic Gradient Descent (SGD). Each subsection details a specific variant or aspect of SGD, providing insights into their behavior and effectiveness in various settings.Here we share with you some comparison results.

1. Comparison of SGD with constant step size and SGD with Shrinking Step Size

SGD with a constant step size performs better initially, but as the process continues, the shrinking step size begins to converge to the optimal solution. In contrast, the constant step size descends quickly at first but then starts to fluctuate around similar values

What happens when is you use sampling without replacement instead:

SGD with constant stepsizes is faster at the beginning, while, SGD with shrinking stepsizes reaches the best solution.

2. Comparison of SGD with shrinking stepsizes and SGD with averaging.

SGD with shrinking stepsizes is faster and performs better than SGD with averaging

3.Comparison of SGD with momentum and with averaging

4.Comparison of SGD with momentum and SGD with decreasing stepsizes with late start

5.Comparison of all SGD

6.GD with SGD

7.Compare SGD and GD in terms of test error. That is, use w_model_truth to compare

Project Structure

Final_Notebook_SGD_Final_Lab.ipynb: All implementations
assets: Contains session images readme

Build With

Language: Python

Note

weight averaging $$ w_{\text{average}}^{(k)} = \frac{(k - \text{start_averaging}) \cdot w_{\text{average}}^{(k-1)} + w^{(k)}}{k - \text{start_averaging} + 1} $$ where :
$ w_{\text{average}}^{(k)} $ is the averaged weight vector at iteration ( k ),
$ w^{(k)} $ is the current weight vector at iteration ( k ),
$ \text{start_averaging} $ is the iteration at which averaging begins
$ k $ is the index of the current iteration.

Using this formula presents the following advantages:

Memory: The recursive formula only requires storing two weight vectors at a time. This significantly reduces memory usage.

Computation: It reduces the computational cost and makes it more efficient in real-time.

Run Locally

Clone the project

    git clone https://github.com/Omer-alt/-Stochastic-Gradient-Descent.git

Run the notebook with Google collab or locally with jupiter.

Authors

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
.DS_Store		.DS_Store
Final_Notebook_SGD_Final_Lab.ipynb		Final_Notebook_SGD_Final_Lab.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Stochastic Gradient Descent implementation

1. Comparison of SGD with constant step size and SGD with Shrinking Step Size

2. Comparison of SGD with shrinking stepsizes and SGD with averaging.

3.Comparison of SGD with momentum and with averaging

4.Comparison of SGD with momentum and SGD with decreasing stepsizes with late start

5.Comparison of all SGD

6.GD with SGD

7.Compare SGD and GD in terms of test error. That is, use w_model_truth to compare

Project Structure

Build With

Note

Run Locally

Authors

License

About

Uh oh!

Releases

Packages

Languages

License

Omer-alt/-Stochastic-Gradient-Descent

Folders and files

Latest commit

History

Repository files navigation

Stochastic Gradient Descent implementation

1. Comparison of SGD with constant step size and SGD with Shrinking Step Size

2. Comparison of SGD with shrinking stepsizes and SGD with averaging.

3.Comparison of SGD with momentum and with averaging

4.Comparison of SGD with momentum and SGD with decreasing stepsizes with late start

5.Comparison of all SGD

6.GD with SGD

7.Compare SGD and GD in terms of test error. That is, use w_model_truth to compare

Project Structure

Build With

Note

Run Locally

Authors

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages