mayank-kansal15/logging-best-practices.md

Last active May 25, 2022 05:10

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/mayank-kansal15/68ddbeee93c9980d99571191afc72540.js"></script>
Save mayank-kansal15/68ddbeee93c9980d99571191afc72540 to your computer and use it in GitHub Desktop.

Download ZIP

Logging best practices

Raw

logging-best-practices.md

What is logging?

Logging is the process of recording application actions and state to a secondary interface.

Why logging?

It tells what happened in the application at what time.
It is needed to debug an issue. An on-call person can tell you the real importance when there is a burning issue in production but relevant logs are missing.
It can be used to do many audits like, "some security issue", "some error came which is not raised by anyone", "concurrent requests count", "explore usage patterns for market research", etc.
The real use of logging is when some security or other warning comes in logs, an alert is triggered to admin/on-call to remind that it's time to act.

What to log and What not to?

To log:

At minimum what is coming in system and what is going out of system should be logged. Eg: Log API request details like API path, method, sender IP, request body, params, headers etc, also log response details like body, headers, etc. Log complete or partial details as per need.
All request and response of external service calls must be logged.
In case of batch jobs at-least log start and end time with details.
Every log message line must have it's context. Log context is crucial. To save from logging context with every log line we can use request corelation id.
Errors should be logged. Remember all exceptions are not errors, e.g. invalid credentials shouldn't be logged as error it can be warn or info.
Warning level should be used only when there are some actions setup on the warning otherwise using warning is useless.
Server events must be logged like start, stop, db connection, security events, resource thresholds, etc.
Timestamp must be logged with each log line.
When logging in micro-services then there should be correlation/trace ID which every service will log to identify what happened with the request in which service.

Not to log:

Never, ever log credentials, passwords or any sensitive information.
Never log PII data. If necessary to log then log the masked value.
Beware of laws and regulations that prohibit you from recording certain pieces of information. The most famous of such regulation is probably GDPR.

How to log?

Avoid one logging library lock-in and write a wrapper around the library.
Keep same key field name as same in all application this will help in searching.
Logs should be structured in a way that both humans and machines can read. JSON is a nice structure.
Destination of logging should be one from logging framework but uploading to different destination should be done outside of the server. Eg: Uploading to ELK, CloudWatch, etc. From logging framework the one destination should always be the standard output and error.

Where to log exception?

We can log exception at three levels, top, middle or bottom.

Top: Top of the call stack where execution was started.
Bottom: Bottom of the call stack where exception actually occurred.
Middle: Somewhere in b/w the call stack.

Generally avoid logging exception at bottom and middle level because the API context will be missing and without API context we might not be able to write clear log message. If this component needs to be reused in different application then each application might have different log format. It might also create lot of logging.

Better to log message at top level where context is available and we can write better log message, we can take appropriate decision to log at error/info level.

Log Levels:

By default server should start with info level but we should have some admin APIs to get and set the log level at runtime. Default log level can vary based on env like dev, test and prod. Logging framework should also support dynamic change of log levels.

INFO: To use this log level effectively, try to think about what general information would be useful for diagnosing an application error when the primary interface is not accessible like server start, stop, api handler start stop, etc. Information should be used when the application is executing normally but you want to communicate something to the future log reader.

WARN: WARN level messages should be used to indicate that the application faced a potential problem; however, the user experience has not been affected in any way. Examples might be a cache miss on an expensive object that really should have been in the cache, a piece of code that completed but took longer than expected, a failed login attempt, or an access control violation. The key though is that the warning should be actionable. You shouldn’t clutter the logs with warnings that you don’t intend to do anything about.

ERROR: ERROR messages should be used to indicate that the application faced a significant problem and that, as a result, the user experience was affected in some way. For example, a database connection could have failed, resulting in parts of the application being rendered unusable.

DEBUG: This log level is used to indicate that the logged message is to be used for debugging purposes - in other words, these messages are aimed squarely at the developer. Debug messages support info messages and help developer to deep dive in the issue and finding the root cause.

Choosing the right logging framework:

Don’t let logging block your application. Write logs asynchronously with a buffer or queue so the application can keep running.
Check if the lib support batching logging to reduce logging time. Batch logging means don't log immediately write logs in batch to reduce I/O time.
Use a standard and easily configurable logging framework.
It should support different log levels.
It support different transport.
It support changing log levels dynamically.
It should have community support.
If support remote central logging it will be extra advantage.
Check benchmarking of logging libraries.

Some JavaScript logging libraries: Winston, Log4JS, Pino.

Impact/Cost of logging:

Lines of logging code are like any other, they take time to write and maintain. Excessive logging is costly. Excess logging also clutter the logs and reduce focus in important thing.
If the application is writing 1000's of lines to the log each second, performance will be affected.
Highly detailed log messages regarding each and every method invoked in turn are almost always unnecessary. With the right level of context information available, it is an easy task to re-trace the program execution - fortunately, most computer programs are deterministic!

Resources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment