Quest 5

A tiny transformer: attention over bar sequences

Karpathy's makemore-style transformer, adapted to 'what comes next after these 20 bars?'

Lesson

Why attention for price data?

MLPs see all features equally. But when predicting the next bar, which of the last 20 bars matters most?

A pump 15 bars ago + a pullback 3 bars ago might signal "bull continuation" more than a flat last bar. Attention lets the model learn these patterns.

Follow Karpathy's makemore walkthrough — conceptually:

Q, K, V = compute_queries_keys_values(bar_sequence)
attention_weights = softmax(Q @ K.T / sqrt(d))
output = attention_weights @ V

Reality check

A full transformer for price data is typically overkill. You need THOUSANDS of examples per regime to train, and crypto barely gives you a few clean regime chunks per year.

This quest is conceptual — understand the mechanism, but don't deploy one on $20.

Your task

Implement a tiny scaled dot-product attention over 4 "bar embeddings" and report the output.

pythonEdit below, click Run tests

import math

# 4 bars, each embedded as a 3-dim vector (in reality: RSI, ROC, vol features)
bars = [
  [1.0, 0.02, 0.1],    # bar -3 (oldest)
  [0.9, -0.01, 0.15],
  [0.85, 0.0, 0.2],
  [0.95, 0.03, 0.12],  # bar 0 (most recent)
]

# Simplified: Q = K = V = bars (no learned weights — pure mechanism demo)
def dot(a, b): return sum(x*y for x,y in zip(a,b))
def softmax(xs):
  m = max(xs); es = [math.exp(x-m) for x in xs]
  s = sum(es); return [e/s for e in es]

# Attention: for each bar, how much does it attend to each other bar?
d = len(bars[0])
scale = 1.0 / math.sqrt(d)
attn_weights = []
for q in bars:
  scores = [dot(q, k) * scale for k in bars]
  attn_weights.append(softmax(scores))

# Output: weighted sum of V (= bars here)
outputs = []
for i, weights in enumerate(attn_weights):
  out = [sum(weights[j]*bars[j][k] for j in range(len(bars))) for k in range(d)]
  outputs.append(out)

print("attention weights for bar 0 (most recent):")
print([round(w, 3) for w in attn_weights[-1]])
print("output for bar 0:", [round(o, 3) for o in outputs[-1]])

ML for Trading