Error Handling & Fallbacks

Build resilient AI workflows that gracefully handle failures and edge cases

LLM-powered workflows introduce unique challenges: API rate limits, unpredictable output formats, and variable response times. Proper error handling is essential for production reliability.

Common Failure Points in AI Workflows

🚫

API Rate Limits

LLM providers enforce rate limits. Exceeding them causes workflow failures.

⏱️

Timeout Issues

Complex prompts or large inputs can cause requests to timeout.

📋

Format Inconsistency

LLMs may return unexpected formats despite clear instructions.

🔌

Service Outages

API providers experience downtime. Your workflow needs to handle this.

💭

Content Moderation

Some inputs trigger content filters, causing request rejections.

🎯

Context Limits

Input exceeds model's context window, causing truncation or errors.

Strategy 1: Implement Retry Logic

Transient errors (rate limits, temporary outages) often resolve themselves. Implement exponential backoff to retry failed requests.

N8N Implementation

1 Right-click the LLM node → Settings
2 Enable Continue On Fail
3 Set Retry On Fail: 3 attempts
4 Set Wait Between Tries: 5000ms (increases exponentially)

Zapier Implementation

Zapier automatically retries failed steps up to 3 times over approximately 2 hours. For custom retry logic:

• Add a Delay action after the LLM step
• Use Paths to create retry branches
• Add a Filter to detect errors and route to retry path

Make Implementation

1 Right-click the LLM module → Add error handler
2 Select Resume error handler
3 Add a Sleep module (5-10 seconds)
4 Connect back to the LLM module to retry
5 Add a counter to limit retries (use Data Store)

Strategy 2: Use Fallback LLM Providers

Don't rely on a single LLM provider. If OpenAI fails, automatically try Anthropic or Google AI as a backup.

Implementation Pattern

Try primary provider (e.g., OpenAI GPT-4)

If error → Try secondary provider (e.g., Anthropic Claude)

If error → Try tertiary provider (e.g., Google Gemini)

If all fail → Execute fallback action

💡

Pro Tip:

Standardize your prompts across providers. Test that each fallback provider produces acceptable output for your use case.

Strategy 3: Validate LLM Output

Never assume the LLM's output will match your expected format. Always validate before using the response.

JSON Validation

If you request JSON output, validate it's parseable:

• N8N: Use the JSON node with "Keep Only Set" option
• Zapier: Use Formatter → Utilities → Parse JSON
• Make: Use the JSON → Parse JSON module

If parsing fails, trigger an error handler or fallback action.

Content Validation

Check that required fields are present and non-empty:

Example Filter Logic:

IF response.category exists AND
   response.category is not empty AND
   response.category is in [valid options]
THEN continue
ELSE trigger error handler

Length Validation

Ensure output meets minimum/maximum length requirements:

• Check word count or character count
• Verify output isn't truncated (doesn't end mid-sentence)
• If too short, retry with modified prompt

Strategy 4: Human-in-the-Loop Fallback

For critical workflows, have a human review step when automated processing fails or confidence is low.

Notification Approach

• Send Slack/email alert to team member
• Include error details and original input
• Provide link to manually complete the task
• Track how often manual intervention is needed

Queue Approach

• Add failed items to a review queue (Airtable, Notion)
• Team reviews queue daily
• When resolved, workflow continues from that point
• Analyze patterns to improve automation

Strategy 5: Graceful Degradation

Define what happens when the LLM fails entirely. The workflow should still complete with reduced functionality.

Example: Email Triage

Ideal: LLM categorizes email and routes to correct team

Fallback: Route all failed emails to general support queue

Result: No emails are lost; they just need manual triage

Example: Content Generation

Ideal: LLM generates personalized email copy

Fallback: Use pre-written template with merge fields

Result: Email still sends; less personalized but functional

Example: Data Extraction

Ideal: LLM extracts structured data from document

Fallback: Save raw document to review folder

Result: Data can be manually extracted later

Monitoring & Alerting

Set Up Failure Alerts

Configure notifications for workflow failures:

• N8N: Add Error Trigger workflow that sends Slack alerts
• Zapier: Enable "Zap errors" notifications in settings
• Make: Add error handler routes that notify your team

Track Error Metrics

Log errors to a spreadsheet or database to track:

• Error frequency and type
• Which LLM provider failed
• Time of day patterns
• Success rate of retries
• Impact on business operations

Testing Error Scenarios

Before Going to Production

✅ Test with invalid API keys to verify error handling works
✅ Send malformed inputs to see how the LLM responds
✅ Test edge cases like empty inputs or very long inputs
✅ Simulate rate limits by running many requests quickly
✅ Verify notifications are sent when errors occur

Error Handling Best Practices

1. Fail Fast: Don't wait until the end of a workflow to check for errors. Validate at each step.

2. Log Everything: Capture error details, timestamps, and context for troubleshooting.

3. Set Timeouts: Don't let workflows hang indefinitely. Set reasonable timeout limits.

4. Plan for Scale: Error rates increase with volume. Design for the worst case scenario.

5. Iterate: Review error logs regularly and refine your error handling based on real-world failures.