Quartz 5

❯

❯

Transformer

Properties1

tags	ml, nn, nlp, architecture

Jun 28, 20262 min read

Transformer

2017 年に Google Research が “Attention Is All You Need” で発表した深層学習モデル。それまで主流だった recurrent-neural-network（RNN/LSTM）より少ない計算量で圧倒的精度を出し、自然言語処理のブレイクスルーを起こした。

特徴

RNN/LSTM の「逐次的に出力を計算しなければならない」 $O (N)$ 問題を克服し、attention-mechanism のみで構成することで並列化を容易にした。トランスダクション（系列→系列）モデルとして エンコーダ・デコーダ 構成をとる:

エンコーダ: 入力文 $(x_{1}, \dots, x_{n})$ を表現 $z = (z_{1}, \dots, z_{n})$ へ変換。
デコーダ: $z$ から単語 $(y_{1}, \dots, y_{m})$ を出力。1 時刻 1 単語で、前時刻のデコーダ出力を現時刻の入力に使う（自己回帰）。

派生・応用

bert: エンコーダ系。Scaled Dot-Product / Multi-Head / Source-Target Attention と Positional Encoding を使う。
GPT 系: デコーダ系の自己回帰生成。
VLM / VLA: PaliGemma・Gemma・SmolLM2 等の Transformer バックボーンが pi0 や smolvla の中核。Vision Transformer（ViT）で画像をパッチ化して同じ系列処理に載せる。

関連

attention-mechanism / recurrent-neural-network
bert / vla
_moc-ml-robotics（ml-robotics クラスタの atomic ノート群）

Graph View

Transformer
特徴
派生・応用
関連

Backlinks

Attention（注意機構）
RNN / LSTM（再帰型ニューラルネット）
BERT
PaliGemma（VLM バックボーン）
SmolVLM / SmolVLM2（VLM バックボーン）
MOC(curated): ML/DL

Created with Quartz v5.0.0 © 2026

GitHub
Discord Community