https://www.lesswrong.com/posts/tnWRXkcDi5Tw9rzXw/the-design-space-of-minds-in-general
Artificial Neural Networks are just a bunch of matrix multiplications? So is the standard model of particle physics. Iterate either one of them long enough, and you might get intelligence:
https://en.wikipedia.org/wiki/Operator_(physics)#Operators_in_matrix_mechanics
Also, matrix multiplications are linear, but Neural Nets can represent nonlinear functions. How does that work? The activation function for each neuron, which must be nonlinear:
https://towardsdatascience.com/how-to-choose-the-right-activation-function-for-neural-networks-3941ff0e6f9c https://www.pinecone.io/learn/train-sentence-transformers-softmax/
It took humanity 80 years but we have finally invented computer security that can be bypassed with polite but firm insistence:
https://twitter.com/ESYudkowsky/status/1598663598490136576
https://octoml.ai/blog/from-gans-to-stable-diffusion-the-history-hype-and-promise-of-generative-ai/
https://medium.com/intuitionmachine/the-strange-loop-in-alphago-zeros-self-play-6e3274fcdd9f
When you talk to ChatGPT, you're talking to RLHF (Reinforcement Learning with Human Feedback, ChatGPT), on top of Supervised Fine-Tuning (Text-Davinci-003), on top of the actual Unsupervised Transformer Model (GPT 3) which is the giant inscrutable mass of neural weights. You can visualize it like this:
https://twitter.com/anthrupad/status/1626113680340566018
https://platform.openai.com/playground
Training a neural net means using something like Stochastic Gradient Descent to reduce the loss, as measured by the cost function. You can do a small version, yourself:
https://realpython.com/gradient-descent-algorithm-python/ https://machinelearningmastery.com/difference-between-backpropagation-and-stochastic-gradient-descent/
If you want to do real deep learning, yourself, don't worry about GPU scalpers. Just use Google Colab, for free:
The deep learning breakthrough that specifically led to BERT and GPT is called the Transformer model. The paper that introduced it:
https://arxiv.org/abs/1706.03762
https://huggingface.co/blog/annotated-diffusion
https://homes.cs.washington.edu/~marcotcr/blog/lime/
SHAP (based on the Shapley Value, from economics)--how much did each feature contribute to the classification?:
https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/
https://twitter.com/neelnanda5
See the cutting-edge (until at least next week) LLM with visual perception, from Microsoft, Kosmos-1:
https://arxiv.org/abs/2302.14045
https://www.alignmentforum.org/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities
https://www.researchgate.net/publication/368685319_AI_Risk_Skepticism_-A_Comprehensive_Survey