Transformer Architecture Overview¶

This simplified diagram illustrates how encoder and decoder blocks process tokens within a transformer model. Each block stacks multiple layers of attention and feed‑forward networks.

flowchart LR
    A[Input Tokens] --> B[Token Embeddings]
    B --> C[Positional Encoding]
    C --> D[Encoder Blocks]
    D --> E[Decoder Blocks]
    E --> F[Output Tokens]

    subgraph Encoder Blocks
        direction TB
        SA[Self-Attention] --> FF[Feed Forward]
    end

    subgraph Decoder Blocks
        direction TB
        MS[Masked Self-Attention] --> CA[Cross-Attention]
        CA --> FF2[Feed Forward]
    end

Multiple encoder and decoder blocks are typically stacked for greater capacity.