Transformer Architecture Overview¶
This simplified diagram illustrates how encoder and decoder blocks process tokens within a transformer model. Each block stacks multiple layers of attention and feed‑forward networks.
flowchart LR
A[Input Tokens] --> B[Token Embeddings]
B --> C[Positional Encoding]
C --> D[Encoder Blocks]
D --> E[Decoder Blocks]
E --> F[Output Tokens]
subgraph Encoder Blocks
direction TB
SA[Self-Attention] --> FF[Feed Forward]
end
subgraph Decoder Blocks
direction TB
MS[Masked Self-Attention] --> CA[Cross-Attention]
CA --> FF2[Feed Forward]
end
Multiple encoder and decoder blocks are typically stacked for greater capacity.