-
Gradient-Checkpointing, a hack for training deep NN
-
Case Study: Transformer-Based architecture development
-
Transformer Key-Value caching for fast inference
-
I challenged myself to visualize attentions (nothing special)
-
Transformers
-
Parameter updates
-
Understanding loss.backward()
-
Batch Normalization
-
Dead Neurons
-
Gated Recurrent Unit (GRU)
-
Long Short-Term Memory (LSTMs)
-
Stateful Recurrent Neural Network
-
Tokenize & Numericalize
-
Self-Supervised & Transfer Learning in Language Models
-
Embeddings in sequential Neural Network
-
Embeddings in Recommendation Systems
-
Cross Entropy in Classification
-
Logarithms in Deep Learning
-
Algorithm behind universal function approximator
-
Traffic on blocked ports
-
I relearned entropy