Every developer should take this advice to heart. If a customer asks for 'a daily email report', you should strongly advise against it.
- Most reports the customer asks for will be meaningless vanity metrics anyway
- If you only do error reporting (not sending an email if everything goes smoothly), you won't notice it if email is not being delivered.
- If you report everything (not just the errors), people will stop reading them
- A mailbox history is not a log
- you'll get a request just about every week to change the recipient list of the report.
- Who reads the reports on weekends?
I could go on with arguments for a while, but the short story is this: email is not suitable for logging and reporting. In fact: email is not suitable for a lot of things. But customers will ask you anyway, because that's the tool they know (if you have a hammer, all problems will look like nails).
My advice:
- Log everything to disk when possible (syslog, pipe)
- use a logging aggregator to filter and archive (fluentd, ELK, graylog)
- use an exception tracker for exceptions (sentry, raygun)
- make incidents actionable by creating tickets automatically (jira, zendesk, slack)
- if needed, use an incident response service (pagerduty, opsgenie)
>make incidents actionable by creating tickets automatically
not really a serious concern, but what happens when ticket automation goes down? If all the problems are auto-reported, in a form of tickets or tasks, then if that somehow breaks down it might take a moment before anyone notices, especially if reported exceptions are rather rare. If it is normal to have tens of tickets every day, then problems with ticket creation will be noticeable almost instantly, if they are very rare - then not so much
The second issue with automated exception tracking is that you loose the "huh, this is weird" mechanism that works when actual humans go through logs or reports. While any tool will of course be orders of magnitude faster and also probably more accurate, by relying solely on such automation an opportunity to notice some "weird"/not-typical entries or rare/unexpected sequences of those might be missed. Then again in most cases - I guess - simple statistical analysis might be a good substitute. And that can be automated.
(edit: formatting)
Don't let the perfect be the enemy of the good. 99.9XXXX reliability is good enough. Eventually have a enough nines and your risks are things like "nuclear war", "dinasour-killer sized asteroid hitting the earth", et cetera.
Agreed, there is absolutely no point going further after reaching a certain reliability level. However, one thing is eliminating risks, the other is limiting the consequences of said risks. I strongly prefer 99.9% reliability where that 0.1% means some insignificant problem over 99.99% reliability where the remaining 0.01% means total disaster.
My point is that doing "too much" automation gives diminishing returns (which is not bad in itself), but might also disproportionately increase the consequences of that 0.xxxx1%
<cynical-response>
So you are telling me by adding a simple email tool, I can make the customer happy because I'm giving them what they want and not have to set up 5 tools and continue to pay them monthly?
</cynical-response>
It's your job to advice them to use the proper solution.
But the customer ultimately pays you, so if they really want an email monitoring solution, you should built it for them.
Also, the cost savings by automating the task should outweigh the monthly cost of the tools that are required to run it. If not, the task is probably not worth automating.
Every developer should take this advice to heart. If a customer asks for 'a daily email report', you should strongly advise against it.
- Most reports the customer asks for will be meaningless vanity metrics anyway
- If you only do error reporting (not sending an email if everything goes smoothly), you won't notice it if email is not being delivered.
- If you report everything (not just the errors), people will stop reading them
- A mailbox history is not a log
- you'll get a request just about every week to change the recipient list of the report.
- Who reads the reports on weekends?
I could go on with arguments for a while, but the short story is this: email is not suitable for logging and reporting. In fact: email is not suitable for a lot of things. But customers will ask you anyway, because that's the tool they know (if you have a hammer, all problems will look like nails).
My advice:
- Log everything to disk when possible (syslog, pipe)
- use a logging aggregator to filter and archive (fluentd, ELK, graylog)
- use an exception tracker for exceptions (sentry, raygun)
- make incidents actionable by creating tickets automatically (jira, zendesk, slack)
- if needed, use an incident response service (pagerduty, opsgenie)