At clients, we work to make sure the best information is available to:
Often, developers or security folks think of these as overlapping. We hear:
“We’re using Log4J wrapped with SLF4J, it seems really redundant to do anything else.”
In practice, we believe the information is different and needs to be stored and reviewed in different ways by different people. That’s why we build libraries to help integrate these things into applications easily – each as a first class piece of information. As we examine each in further detail, we’ll call out the technology, audience involved and typical content.
Let’s start with logging because every developer knows about logging, right? We work with some companies that log every request that they process. That seems like a lot and should start to trigger alarm bells about what information lives in the logs – but let’s not be mad at logging. For the most part, there are established solutions for doing it. The logs need to get aggregated or centralized somewhere and then we can try to see what happened.
We would be remiss here not to point out that it is really important to keep sensitive data out of logs. We’ve seen everything from card numbers to passwords to reset tokens to session ids …
But the point is, there isn’t anything wrong with a little log.debug(“XYZ”); or log.warn(“Data is stale”);. From a maintenance and debugging perspective, this information is valuable – generally to operations.
Technology: Typically file based, then aggregated. Need text search. High volume. Retained for relatively short periods of time (weeks).
Audience: Developers, Operations
Content: Freeform – anything a developer might think is useful.
Some applications explicitly need to be able to produce an audit record for the objects they manage. This might be who created it, when it changed and how – at who’s direction. It might even be who accessed this data? Consider the Stripe interface where they let you access your secret. The secret is obscured and you have to take an extra action to see it. Pretty sure they audit that so they know who saw it when.
Technically, you could write audit messages to logs. This results in tedious work getting the detail back out of the logs and in any system where logs are not smoothly aggregated or can’t be managed at scale, this approach falls down. Furthermore, someone looking for the messages needs to sift through lots of unrelated data.
A deeper issue is that if you want to produce a true audit record, like for a partner or a customer or an auditor, you can’t just give them all your dev logs! We want to be able to produce a report tailored for the question they are asking and containing only data for the users they should be able to ask about. Also, audit records need to be stored for a lot longer than log messages.
Technology: Queryable interface, centralized long term storage, retained “for a long time”
Audience: Compliance, Partners, Auditors
Content: Specific object reads and changes (ideally with before / after visibility) associated to users & time.
Two deeper notes here:
When I say signal, I really mean security signal. As in, the opposite of noise. Let’s face it, most of what is in logs is noise. Even cutting edge technology built to collect and analyze logs produces a ton of noise. When we get into the application and signal specific events there, we can break out of the noise and give security monitoring teams lots of rich data.
For example, we may want to know about failed logins. A stream of failed logins looks bad. A similar stream followed by a successful login looks worse. (Exercise for reader) Either way, this is information the security team probably doesn’t see right now. Go a step deeper – what if input validation fails? What if someone tries to do something they shouldn’t have permission to do? Should security get notified? Obviously the goal would be to take some defensive actions autonomously and in systems we’ve worked on, this works best when you can capture the actual events you care the most about. Where can you do that? In the application.
Another key thing with Signal is that it needs to go to the right place. Often security operations teams are using their own SEIM that is different than the log collector used by developers. That is smart. They are optimized for different things. But we need to help developers get the security events to the SEIM.
Technology: Push signal to syslog + SEIM, ideally not retained for more than weeks but aggregated and processed for future context.
Audience: Security Operations Team (The Watchers), Automated Security Response
Content: Specific security events only.
At companies that have the capability and resources (say they have compliance and security monitoring teams) separating these application generated log stream messages has value because they are used by different people for different things in different tools.
We may circle back in the future with another post about our libraries for doing these things and some of the more extended benefits or specific examples of data to track. Let us know if you are interested!