You set it up yourself. It worked. Then it stopped. Data stopped flowing, things fell through cracks, and someone went back to doing it manually.
The tools are not bad. The problem is how they were set up. Before trying to fix it, take the Automation Readiness Checklist to see where you stand.
The good news is that most broken automations fail for a small number of predictable reasons. Once you know what to look for, you can diagnose the problem in minutes and either fix it or make a clear decision to rebuild.
Why Do DIY Automations Break?
1. No Error Handling
Built for the happy path. A blank field, an API timeout, a wrong date format - the chain stops silently.
How to troubleshoot this: Go to your automation platform's task history and look for failed runs. In Zapier, this is under "Zap History." In Make, it is "Scenario History." Filter by errors. You will usually see a pattern: the same step failing repeatedly because of a data format the automation was not designed to handle. The fix is to add a filter or formatter step before the failure point that catches the unexpected input and either corrects it, skips it, or sends an alert.
2. Brittle Triggers
- Spreadsheet columns. Someone inserts a column. Automation watches the wrong one forever.
- Email subjects. Platform changes the format. Automation stops matching.
- Webhook URLs. App updates endpoints. Old URL returns 404.
How to troubleshoot this: If your automation stopped triggering entirely (no runs at all, not even failed ones), the trigger itself is broken. For spreadsheet-based triggers, open the spreadsheet and verify the column structure has not changed. For email triggers, check if the subject line format changed. For webhooks, send a test payload from the source app and check whether the automation platform received it. Most platforms have a "test trigger" button that shows you exactly what data is coming through.
3. Linear Chains With No Recovery
Step 1 feeds Step 2 feeds Step 3. One failure kills everything downstream. No retry, no fallback.
How to troubleshoot this: Look at your failed run and identify the exact step that failed. Then check: does the data from the previous step look correct? If yes, the problem is in the current step. If no, trace backward until you find the step that produced bad output. The structural fix is to add error paths at critical junctions. In Zapier, this means adding a "Paths" step that routes to an error handling branch if a condition is not met. In Make, use the error handler module to define what happens when a step fails.
4. No Monitoring
Processing 90% correctly, silently skipping 10%. Nobody watching until a customer complains.
How to troubleshoot this: The tricky part about silent failures is discovering they exist. Pull a sample of recent records and manually verify they made it through the entire automation chain. For example, if your automation moves new orders from your website to your fulfillment system, pick 20 recent orders and confirm each one appears in both places. If 2 out of 20 are missing, you have a 10% failure rate you did not know about.
5. Authentication Decay
OAuth tokens expire. API keys rotate. If your automation stopped after months of working fine, this is usually why.
How to troubleshoot this: Check the connections page in your automation platform. Zapier calls these "Connected Accounts." Make calls them "Connections." If you see a warning icon or an "expired" label, reconnect the account. Some platforms require you to re-authorize every 60 to 90 days, especially for Google and Microsoft accounts. Set a calendar reminder to check your connections monthly.
How Do You Fix It?
- Check error logs. Every platform has task history. Look at the last 50 to 100 runs and filter for errors. The pattern will tell you where the problem is.
- Test each step alone. Find the exact failure point. Most platforms let you re-run a failed task or send test data through individual steps. Use this to isolate the problem.
- Add error handling. For each step that can fail, decide: should it retry (good for temporary API errors), skip and continue (good for non-critical steps), or alert someone (good for critical steps where data loss matters)?
- Replace brittle triggers. Webhooks over polling. Field IDs over column positions. A properly designed workflow automation avoids these pitfalls from the start.
- Add monitoring. Alert if it has not run in X hours. Set up a simple check that verifies the automation is still producing output at the expected frequency.
A working automation you cannot monitor is a time bomb.
What Should You Check Before Things Break?
Prevention is cheaper than repair. Here is a monthly checklist that catches most problems before they become emergencies.
- Verify all connections are active. Go to your platform's connections page and confirm every linked account shows a healthy status. Re-authorize any that are expiring soon.
- Review error rates. Check the task history for the past 30 days. If your error rate is above 2%, investigate. Anything above 5% is a problem that needs immediate attention.
- Confirm triggers are firing. Check that each automation has run the expected number of times. If your daily automation only ran 20 times last month instead of 30, something interrupted it.
- Test with edge cases. Send a blank form submission, an unusually large order, or a record with special characters through each automation. These are the inputs that break things.
- Check downstream data. Pick 5 to 10 random records and manually verify they made it through the entire chain correctly. This catches silent data corruption.
- Review platform changelogs. If any of your connected apps recently updated, check whether the update changed any API endpoints, field names, or authentication methods your automation depends on.
What Monitoring Tools Should You Use?
You do not need expensive monitoring software. Start simple and add complexity only if your automation stack demands it.
- Built-in platform alerts. Both Zapier and Make offer native error notifications via email. Turn these on for every automation. This is free and takes 30 seconds per workflow.
- Slack or email digest. Set up a weekly summary of all automation runs, including success rates and error counts. Most platforms support this natively or through a simple integration.
- Heartbeat monitoring. For critical automations, create a simple "heartbeat" check. Have the automation send a ping (an email, a Slack message, or a webhook) at the end of each successful run. If the ping stops, you know something is wrong. Services like Cronitor or Healthchecks.io are built for exactly this.
- Data reconciliation checks. For automations that move data between systems, build a weekly reconciliation that compares record counts on both sides. If your CRM has 150 new contacts this week but your email platform only received 140, you have a gap to investigate.
When Should You Rebuild?
- 15+ steps nobody can follow
- More patches than original design
- Business process has changed significantly
- Maintenance costs more than doing it manually
- The person who built it can no longer explain what it does
- You are afraid to change anything because you do not know what will break
If you check two or more of those items, a rebuild will save you more time and money than continued patching. A clean rebuild with proper error handling and monitoring takes less effort than you think, and it gives you a system you can actually trust.
If you are not sure whether a task is worth automating at all, use our 5-Minute Test framework to find out.
We Fix Broken Automations
Send us your Frankenstein workflow. We will diagnose it and either fix or rebuild it properly.
Book a Free Assessment