Alignment In Transformers, Simply Explained

AI, But Simple Issue #66

Hello from the AI, but simple team! If you enjoy our content, consider supporting us so we can keep doing what we do.

Our newsletter is no longer sustainable to run at no cost, so we’re relying on different measures to cover operational expenses. Thanks again for reading!

Alignment In Transformers, Simply Explained

AI, But Simple Issue #66

Transformer models are the most widely used architecture in AI at the moment. Some of its most popular uses are Natural Language Processing (NLP) tasks and multimodal tasks such as Image Captioning.

Transformers are one of the key architectures that tackle the problem of alignment.

Well, what exactly is alignment? In NLP, if we look at language translation, the difference in order and context of words forms a sentence’s meaning—we must figure out which input words correspond to which target words.

  • This process is essentially what alignment is: figuring out which part(s) of the input corresponds to which part(s) of the output.

In traditional models, alignment had to be solved as a separate task, but with the arrival of transformers and their “self-attention” mechanism, alignment became integrated within the model.

Subscribe to keep reading

This content is free, but you must be subscribed to AI, But Simple to continue reading.

Already a subscriber?Sign in.Not now