Loading...
Karpathy's makemore-style transformer, adapted to 'what comes next after these 20 bars?'
MLPs see all features equally. But when predicting the next bar, which of the last 20 bars matters most?
A pump 15 bars ago + a pullback 3 bars ago might signal "bull continuation" more than a flat last bar. Attention lets the model learn these patterns.
Follow Karpathy's makemore walkthrough — conceptually:
Q, K, V = compute_queries_keys_values(bar_sequence)
attention_weights = softmax(Q @ K.T / sqrt(d))
output = attention_weights @ VA full transformer for price data is typically overkill. You need THOUSANDS of examples per regime to train, and crypto barely gives you a few clean regime chunks per year.
This quest is conceptual — understand the mechanism, but don't deploy one on $20.
Implement a tiny scaled dot-product attention over 4 "bar embeddings" and report the output.