TRIBE2
Predict where your video loses the viewer. Powered by Meta's brain-encoding model — maps cortical response across 20,000 vertices in real time.
The model
TRIBE v2 is Meta's brain-encoding foundation model. It predicts fMRI-level cortical activation from video, audio, and text using three extractors:
V-JEPA2 (vision) + Wav2Vec-BERT 2.0 (audio) + LLaMA 3.2 (language)
The output is a prediction across ~20,000 cortical vertices on the fsaverage5 surface. We aggregate those into five zones and derive a composite engagement signal.
Predictions carry a ~5s hemodynamic offset inherent to fMRI. Treat each timestamp as an approximate editing window.