Error Handling & Fallbacks
Build resilient AI workflows that gracefully handle failures and edge cases
LLM-powered workflows introduce unique challenges: API rate limits, unpredictable output formats, and variable response times. Proper error handling is essential for production reliability.
Common Failure Points in AI Workflows
API Rate Limits
LLM providers enforce rate limits. Exceeding them causes workflow failures.
Timeout Issues
Complex prompts or large inputs can cause requests to timeout.
Format Inconsistency
LLMs may return unexpected formats despite clear instructions.
Service Outages
API providers experience downtime. Your workflow needs to handle this.
Content Moderation
Some inputs trigger content filters, causing request rejections.
Context Limits
Input exceeds model's context window, causing truncation or errors.
Strategy 1: Implement Retry Logic
Transient errors (rate limits, temporary outages) often resolve themselves. Implement exponential backoff to retry failed requests.
N8N Implementation
- 1 Right-click the LLM node → Settings
- 2 Enable Continue On Fail
- 3 Set Retry On Fail: 3 attempts
- 4 Set Wait Between Tries: 5000ms (increases exponentially)
Zapier Implementation
Zapier automatically retries failed steps up to 3 times over approximately 2 hours. For custom retry logic:
- • Add a Delay action after the LLM step
- • Use Paths to create retry branches
- • Add a Filter to detect errors and route to retry path
Make Implementation
- 1 Right-click the LLM module → Add error handler
- 2 Select Resume error handler
- 3 Add a Sleep module (5-10 seconds)
- 4 Connect back to the LLM module to retry
- 5 Add a counter to limit retries (use Data Store)
Strategy 2: Use Fallback LLM Providers
Don't rely on a single LLM provider. If OpenAI fails, automatically try Anthropic or Google AI as a backup.
Implementation Pattern
Try primary provider (e.g., OpenAI GPT-4)
If error → Try secondary provider (e.g., Anthropic Claude)
If error → Try tertiary provider (e.g., Google Gemini)
If all fail → Execute fallback action
Pro Tip:
Standardize your prompts across providers. Test that each fallback provider produces acceptable output for your use case.
Strategy 3: Validate LLM Output
Never assume the LLM's output will match your expected format. Always validate before using the response.
JSON Validation
If you request JSON output, validate it's parseable:
- • N8N: Use the JSON node with "Keep Only Set" option
- • Zapier: Use Formatter → Utilities → Parse JSON
- • Make: Use the JSON → Parse JSON module
If parsing fails, trigger an error handler or fallback action.
Content Validation
Check that required fields are present and non-empty:
Example Filter Logic:
IF response.category exists AND
response.category is not empty AND
response.category is in [valid options]
THEN continue
ELSE trigger error handler
Length Validation
Ensure output meets minimum/maximum length requirements:
- • Check word count or character count
- • Verify output isn't truncated (doesn't end mid-sentence)
- • If too short, retry with modified prompt
Strategy 4: Human-in-the-Loop Fallback
For critical workflows, have a human review step when automated processing fails or confidence is low.
Notification Approach
- • Send Slack/email alert to team member
- • Include error details and original input
- • Provide link to manually complete the task
- • Track how often manual intervention is needed
Queue Approach
- • Add failed items to a review queue (Airtable, Notion)
- • Team reviews queue daily
- • When resolved, workflow continues from that point
- • Analyze patterns to improve automation
Strategy 5: Graceful Degradation
Define what happens when the LLM fails entirely. The workflow should still complete with reduced functionality.
Example: Email Triage
Ideal: LLM categorizes email and routes to correct team
Fallback: Route all failed emails to general support queue
Result: No emails are lost; they just need manual triage
Example: Content Generation
Ideal: LLM generates personalized email copy
Fallback: Use pre-written template with merge fields
Result: Email still sends; less personalized but functional
Example: Data Extraction
Ideal: LLM extracts structured data from document
Fallback: Save raw document to review folder
Result: Data can be manually extracted later
Monitoring & Alerting
Set Up Failure Alerts
Configure notifications for workflow failures:
- • N8N: Add Error Trigger workflow that sends Slack alerts
- • Zapier: Enable "Zap errors" notifications in settings
- • Make: Add error handler routes that notify your team
Track Error Metrics
Log errors to a spreadsheet or database to track:
- • Error frequency and type
- • Which LLM provider failed
- • Time of day patterns
- • Success rate of retries
- • Impact on business operations
Testing Error Scenarios
Before Going to Production
- ✅ Test with invalid API keys to verify error handling works
- ✅ Send malformed inputs to see how the LLM responds
- ✅ Test edge cases like empty inputs or very long inputs
- ✅ Simulate rate limits by running many requests quickly
- ✅ Verify notifications are sent when errors occur
Error Handling Best Practices
1. Fail Fast: Don't wait until the end of a workflow to check for errors. Validate at each step.
2. Log Everything: Capture error details, timestamps, and context for troubleshooting.
3. Set Timeouts: Don't let workflows hang indefinitely. Set reasonable timeout limits.
4. Plan for Scale: Error rates increase with volume. Design for the worst case scenario.
5. Iterate: Review error logs regularly and refine your error handling based on real-world failures.