-
Global Attention
-
Creating simple RAG
-
Interpretable Features from Neural Networks
-
CNNs
-
Gradient-Checkpointing
-
Case Study on Transformer-based architecture
-
Key-Value caching
-
I challenged myself to visualize attentions (nothing special)
-
Transformers
-
Parameter updates
-
Understanding loss.backward()
-
Batch Normalization
-
Dead Neurons
-
Gated Recurrent Unit (GRU)
-
Long Short-Term Memory (LSTMs)
-
Stateful Recurrent Neural Network
-
Tokenize & Numericalize
-
Self-Supervised & Transfer Learning in Language Models
-
Embeddings in sequential Neural Network
-
Embeddings in Recommendation Systems
-
Cross Entropy in Classification
-
Logarithms in Deep Learning
-
Algorithm behind universal function approximator
-
Traffic on blocked ports
-
I relearned entropy