: Sequential models, such as Long Short-Term Memory (LSTM) or 3D Convolutional Networks , capture motion and how objects move over time.
: Assigning categories to video segments (e.g., identifying satellite scenes or action types).
: Using "context-aware" deep features to identify and follow targets across video frames.