Multimodal LLMs Basics: How LLMs Process Text, Images, Audio & Videos