Multimodal-Outpost is a collection of Colab notebooks designed for image inference and multimodal vision-language model (VLM) experimentation. It provides tools for OCR, image captioning, video understanding and generating DOCX or PDF documents containing both images and extracted text.
This repository contains a curated collection of notebooks for implementing state-of-the-art multimodal Vision-Language Models (VLMs).
Notebook Name | Link ↗ |
---|---|
Aya-Vision-8B-VideoUnderstanding | Link |
Behemoth-3B-070225-post.1 | Link |
Camel-Doc-OCR-080125 | Link |
Florence-2-Models-Image-Caption | Link |
Gemma3-VL-VideoUnderstanding | Link |
Imgscope-OCR-2B-0527-VideoUnderstanding | Link |
Inkscope-Captions-2B-0526-VideoUnderstanding | Link |
LFM2-VL-1.6B-LiquidAI | Link |
LFM2-VL-450M-LiquidAI | Link |
Lumian-VLR-7B-Thinking-Demo-Notebook | Link |
Lumian2-VLR-7B-Thinking-Demo-Notebook | Link |
Megalodon-OCR-Sync-0713-ColabNotebook | Link |
MiMo-VL-7B-RL-VideoUnderstanding | Link |
MiMo-VL-7B-SFT-VideoUnderstanding | Link |
MonkeyOCR-0709 | Link |
OCRFlux3B | Link |
Qwen2-VL-MessyOCR-VideoUnderstanding | Link |
Qwen2-VL-OCR-2B-Instruct | Link |
Qwen2-VL-VideoUnderstanding | Link |
Qwen2.5-VL-3B-Abliterated-Caption-it(caption) | Link |
Qwen2.5-VL-3B-Instruct | Link |
Qwen2.5-VL-7B-Abliterated-Caption-it | Link |
Qwen2.5-VL-VideoUnderstanding | Link |
RolmOCR-Qwen2.5-VL-VideoUnderstanding | Link |
SmolDocling-256M-preview | Link |
monkey-OCR | Link |
moondream2-2025-06-21 | Link |
nanonets-OCR | Link |
olmOCR-Qwen2-VL-VideoUnderstanding | Link |
typhoon-OCR | Link |
typhoon-ocr-7b-Qwen2.5VL-VideoUnderstanding | Link |
- Extracts text from images using various OCR models
- Supports image captioning and multimodal inference
- Embeds images and extracted text into DOCX or PDF formats
- Designed for quick deployment via Google Colab

- Python
- PyTorch
- Hugging Face Transformers
- ReportLab
- Gradio (for UI)
- (Qwen2.5-VL based) / Others
All dependencies are automatically installed in the Colab environment.
Created and maintained by PRITHIVSAKTHIUR