Combine text, images, and documents for richer AI interactions
Modern LLMs can process images, PDFs, screenshots, and other visual inputs alongside text. Multi-modal prompting unlocks new use cases: analyzing charts, extracting data from documents, answering questions about diagrams, and more.
Just like text prompts, image analysis benefits from specific, clear instructions.
❌ Vague:
[Uploads chart image]
"What do you see?"
✓ Specific:
[Uploads chart image]
"Analyze this sales chart. Identify: 1) Overall trend, 2) Largest month-over-month change, 3) Any anomalies that need investigation."
Guide the LLM's attention to specific areas when analyzing complex images.
[Uploads UI mockup screenshot]
Review this dashboard mockup for UX issues:
Some LLMs can process multiple images in a single prompt for comparison or context.
Use Cases:
LLMs can read PDFs, invoices, receipts, contracts, and extract specific information.
[Uploads invoice PDF]
Extract the following fields from this invoice and return as JSON:
{
"invoice_number": "string",
"invoice_date": "YYYY-MM-DD",
"vendor_name": "string",
"vendor_address": "string",
"total_amount": "number",
"currency": "string",
"line_items": [
{
"description": "string",
"quantity": "number",
"unit_price": "number",
"total": "number"
}
],
"payment_terms": "string",
"due_date": "YYYY-MM-DD or null"
}
If any field is not found, use null.
Upload screenshots for bug reports, UX feedback, or automated testing insights.
Example Prompts:
Bug Report Analysis
"This is a screenshot of an error. Describe the issue, identify likely causes, and suggest debugging steps."
Accessibility Audit
"Review this page screenshot for WCAG 2.1 AA compliance. Check color contrast, text size, touch target sizes, and semantic structure."
Competitive Analysis
"Analyze this competitor's pricing page. What persuasion tactics are they using? How does their information hierarchy work?"
For long PDFs, specify which sections or pages to focus on, or break into chunks.
Strategies:
Frame questions precisely to get accurate answers about charts, diagrams, or photos.
[Uploads architecture diagram]
Questions:
LLMs can read values from charts, though OCR accuracy varies. Verify critical numbers.
Example Prompts:
⚠️ Note: Always verify extracted numbers for financial or critical decisions
Ask the LLM to identify, locate, or count specific objects in images.
Use Cases:
Clear, high-resolution images produce better results than blurry or low-res images
Provide written context about what the image shows and what you need from it
Request JSON, tables, or structured formats when extracting information from documents
Different models have varying vision capabilities — test with your specific use case
Vision models can misread text in images. Verify numbers and dates for important use cases
Apply the same privacy considerations to images as you would to text data
We can help you build AI workflows that process images, documents, and text together