Multimodal Event Representation Learning Over Time (MERLOT)