Error Handling & Fallbacks

Build resilient AI workflows that gracefully handle failures and edge cases

LLM-powered workflows introduce unique challenges: API rate limits, unpredictable output formats, and variable response times. Proper error handling is essential for production reliability.

Common Failure Points in AI Workflows

🚫

API Rate Limits

LLM providers enforce rate limits. Exceeding them causes workflow failures.

⏱️

Timeout Issues

Complex prompts or large inputs can cause requests to timeout.

📋

Format Inconsistency

LLMs may return unexpected formats despite clear instructions.

🔌

Service Outages

API providers experience downtime. Your workflow needs to handle this.

💭

Content Moderation

Some inputs trigger content filters, causing request rejections.

🎯

Context Limits

Input exceeds model's context window, causing truncation or errors.

Strategy 1: Implement Retry Logic

Transient errors (rate limits, temporary outages) often resolve themselves. Implement exponential backoff to retry failed requests.

N8N Implementation

  • 1 Right-click the LLM node → Settings
  • 2 Enable Continue On Fail
  • 3 Set Retry On Fail: 3 attempts
  • 4 Set Wait Between Tries: 5000ms (increases exponentially)

Zapier Implementation

Zapier automatically retries failed steps up to 3 times over approximately 2 hours. For custom retry logic:

  • • Add a Delay action after the LLM step
  • • Use Paths to create retry branches
  • • Add a Filter to detect errors and route to retry path

Make Implementation

  • 1 Right-click the LLM module → Add error handler
  • 2 Select Resume error handler
  • 3 Add a Sleep module (5-10 seconds)
  • 4 Connect back to the LLM module to retry
  • 5 Add a counter to limit retries (use Data Store)

Strategy 2: Use Fallback LLM Providers

Don't rely on a single LLM provider. If OpenAI fails, automatically try Anthropic or Google AI as a backup.

Implementation Pattern

1

Try primary provider (e.g., OpenAI GPT-4)

2

If error → Try secondary provider (e.g., Anthropic Claude)

3

If error → Try tertiary provider (e.g., Google Gemini)

4

If all fail → Execute fallback action

💡

Pro Tip:

Standardize your prompts across providers. Test that each fallback provider produces acceptable output for your use case.

Strategy 3: Validate LLM Output

Never assume the LLM's output will match your expected format. Always validate before using the response.

JSON Validation

If you request JSON output, validate it's parseable:

  • N8N: Use the JSON node with "Keep Only Set" option
  • Zapier: Use Formatter → Utilities → Parse JSON
  • Make: Use the JSON → Parse JSON module

If parsing fails, trigger an error handler or fallback action.

Content Validation

Check that required fields are present and non-empty:

Example Filter Logic:

IF response.category exists AND
   response.category is not empty AND
   response.category is in [valid options]
THEN continue
ELSE trigger error handler

Length Validation

Ensure output meets minimum/maximum length requirements:

  • • Check word count or character count
  • • Verify output isn't truncated (doesn't end mid-sentence)
  • • If too short, retry with modified prompt

Strategy 4: Human-in-the-Loop Fallback

For critical workflows, have a human review step when automated processing fails or confidence is low.

Notification Approach

  • • Send Slack/email alert to team member
  • • Include error details and original input
  • • Provide link to manually complete the task
  • • Track how often manual intervention is needed

Queue Approach

  • • Add failed items to a review queue (Airtable, Notion)
  • • Team reviews queue daily
  • • When resolved, workflow continues from that point
  • • Analyze patterns to improve automation

Strategy 5: Graceful Degradation

Define what happens when the LLM fails entirely. The workflow should still complete with reduced functionality.

Example: Email Triage

Ideal: LLM categorizes email and routes to correct team

Fallback: Route all failed emails to general support queue

Result: No emails are lost; they just need manual triage

Example: Content Generation

Ideal: LLM generates personalized email copy

Fallback: Use pre-written template with merge fields

Result: Email still sends; less personalized but functional

Example: Data Extraction

Ideal: LLM extracts structured data from document

Fallback: Save raw document to review folder

Result: Data can be manually extracted later

Monitoring & Alerting

Set Up Failure Alerts

Configure notifications for workflow failures:

  • N8N: Add Error Trigger workflow that sends Slack alerts
  • Zapier: Enable "Zap errors" notifications in settings
  • Make: Add error handler routes that notify your team

Track Error Metrics

Log errors to a spreadsheet or database to track:

  • • Error frequency and type
  • • Which LLM provider failed
  • • Time of day patterns
  • • Success rate of retries
  • • Impact on business operations

Testing Error Scenarios

Before Going to Production

  • Test with invalid API keys to verify error handling works
  • Send malformed inputs to see how the LLM responds
  • Test edge cases like empty inputs or very long inputs
  • Simulate rate limits by running many requests quickly
  • Verify notifications are sent when errors occur

Error Handling Best Practices

1. Fail Fast: Don't wait until the end of a workflow to check for errors. Validate at each step.

2. Log Everything: Capture error details, timestamps, and context for troubleshooting.

3. Set Timeouts: Don't let workflows hang indefinitely. Set reasonable timeout limits.

4. Plan for Scale: Error rates increase with volume. Design for the worst case scenario.

5. Iterate: Review error logs regularly and refine your error handling based on real-world failures.