Attention（注意機構）

ある文中の単語を理解する際に、文中の他のどの単語に「注意」しているかを表すスコア。系列内（あるいは系列間）の要素同士の関連度を動的に重み付けし、文脈に応じた表現を作る。

主要な派生形

Scaled Dot-Product Attention: クエリ・キーの内積をスケーリングして softmax で重みにする基本形。
Multi-Head Attention: 複数の Attention を並列に計算し、異なる関係性を同時に捉える。
Self-Attention: 同一系列内の要素間の Attention。並列計算が可能で系列長に対する逐次依存を解消する。
Masked Self-Attention: デコーダ側で未来の解答を隠す（下三角マスク）。自己回帰生成で使う。
Source-Target Attention: エンコーダ出力（source）とデコーダ（target）の間の Attention。

意義

Attention のみで構成された transformer（“Attention Is All You Need”, 2017）が、recurrent-neural-network（RNN/LSTM）の逐次計算 $O (N)$ 問題を克服し並列化を可能にした。bert では Multi-Head / Scaled Dot-Product / Source-Target-Attention などが使われる。

vla のアーキテクチャでも中心的で、視覚・言語・触覚トークンを自由に相互参照する非因果的 Attention（Tactile-VLA）や、Cross-Attention と Causal Self-Attention の交互配置（smolvla）として再利用される。

Quartz 5

Explorer

Attention（注意機構）

Attention（注意機構）

主要な派生形

意義

関連

Graph View

Table of Contents

Backlinks