If the "text" is visual (like burnt-in captions or signs), you can use Optical Character Recognition (OCR) tools:
: Provides an MP4-to-text converter that generates editable transcripts in multiple languages. 9676693.mp4.mp4
MP4 files are container formats that can hold separate text tracks, such as subtitles or metadata: If the "text" is visual (like burnt-in captions