Vim: Converting timestamp formats

Every now and then (with increasing frequency, it seems) I find myself needing to analyze millions of lines of CAN logs that look like

(1630536594.856841) can1 15F8051C#002FE7B7490FF3FD
(1630536594.857974) can1 15F8051C#01300049A616359F
(1630536594.859108) can1 15F8051C#02100680E22138F9
(1630536594.859666) can1 15F8051C#035194F22D88011A
(1630536594.860264) can1 15F8051C#0400000000400013
(1630536594.860834) can1 15F8051C#053C006E00B0F5FF
(1630536594.861406) can1 15F8051C#06FF01E000C800FF
(1630536594.876000) can1 09F80281#2CFC03010000FFFF
(1630536594.906718) can1 09F8021C#E7FC14010000FFFF
(1630536594.907256) can1 09F8031C#E70A000000000000
(1630536594.907821) can1 09F8041C#E70AC41401000000
(1630536594.915459) can1 09F80381#2CB2FEFFFF000000
(1630536594.916033) can1 09F80481#2CB2C40201000000

For CAN logs, it makes sense to use Unix timestamps (fractional seconds since epoch) because you're often only concerned with relative times between messages, not the absolute time a message occurred. However, I often find myself needing to correlate these CAN logs with messages from other log files with a different format, such as

[2021-09-01 22:49:54.906718] [CATEGORY] [INFO]: Some log message.

My normal workflow for this, is to filter the CAN log to the messages I care about (because I was serious about the logs containing millions of lines) to a new Vim buffer and then painstakingly interleave messages from different sources. After (or during, if I don't have a clear idea of what messages are important) I have the log message interleaved, I start annotating the messages. This is how I troubleshoot (because no one looks at CAN logs for fun) the events portrayed in the logs. This works, but it's time-consuming, and involves a lot of manual work.

date

Given a Unix timestamp like 1630536594.906718, you can convert it to an ISO-8601 timestamp with date like so:

$ date --utc --iso=8601=ns -d @1630536594.906718
2021-09-01T22:49:54,906718000+00:00

Due to the limitations of strftime(3), this is how I will convert between different timestamp formats.

sed

When sed's your hammer, everything becomes a nail. I'm more comfortable in the shell than I am in Vim, so sed was the first tool I reached for.

$ head -n 1 candump.log |
    sed 's/(\(.*\))\(.*\)/printf "("; date --utc --iso-8601=ns -d @\1 | tr -d "\n"; printf ")%s" "\2"/e'
(2021-09-01T22:49:54,856841000+00:00) can1 15F8051C#002FE7B7490FF3FD

This is overly verbose, and has several things I don't like:

awk

It feels like this is something awk could do better, but I don't really know awk, so it's not a tool I reach for. I'll eventually rectify this.

Vim

I've identified three different Vim solutions, which meets my needs a little better, because these transformed logs aren't really a standalone log file – they're in some notes.txt file containing log snippets, timestamps of interest, hypotheses, shell snippets, CAN message definitions, etc., so the more interactive this process is, the better.

vis

You might think "oh, I'll just select the timestamp, and use

:'<,'> !xargs -I{} date --utc --iso-8601=ns -d @{}

!". That's nice, but remember that no matter what you do, selections in Ex mode are line-wise, so this little snippet passes the whole line to date -d.

To resolve this, that is, allow passing just the selections to the shell, you can use the vis plugin with

:'<,'>B !xargs -I{} date --utc --iso-8601=ns -d @{}

This is almost identical to the first attempt, except that we use :'<,'>B instead of :'<,'>.

This is nice, but

system() in a :substitution

Using \= in a :substitute or substitute() turns the rest of the substitution string into an expression. This, combined with submatch() to access a capture group from within the expression. So we could do something like

:%s/(\(.*\))/\=system('date --utc --iso-8601=ns -d @' . submatch(1))/

but this has the same problem as the very first sed incantation – the line ending from the date output is included in the replacement. This can be remedied with

:%s/(\(.*\))/\=system('date --utc --iso-8601=ns -d @' . submatch(1) . ' | tr -d "\n"')/

but it still leaves us with another problem – we lose the parentheses wrapping the timestamp. Depending on the use case, this may or may not be important. I want to be able to use :sort on the interleaved log messages from different sources, so the timestamp prefixes must all be in the same format.

To accomplish this, we can wrap the system() call in a substitute() like so:

:%s/(\(.*\))/\=substitute(system('date --utc --iso-8601=ns -d @' . submatch(1)), '\(.*\)\n', '(\1)', '')/

Depending on the timestamp being reformatted, you may need to further wrap the submatch(1) in a shellescape() to properly escape anything that needs it. Regardless, this method is far too unwieldy to keep handy in my back pocket.

Macro

This is my favorite solution. It's nicely composable, and once you record it, you can play it back over and over as needed. Further, with .viminfo files, the macro may be restored when closing and reopening the file, as long as you don't overwrite the macro in an attempt to make it even better ;)

qd
0
"tcib
<C-r>=strftime("%FT%T %Z", @t)<CR>
<Esc>
q

That's nice and simple (broken onto multiple lines for clarity) but strftime(3) has a fatal flaw that makes it unusable for CAN log analysis – it can only provide second-level precision (no fractional seconds). So we should replace strftime() with a system() call to date.

Additionally, using strftime() will convert timestamps to your current locale – which may or may not be helpful depending on the other log messages you're trying to correlate with the CAN logs. In my case, I want UTC timestamps for everything.

qd
0
"tcib
<C-r>=system('date --utc --iso-8601=ns -d @' . @t . ' | tr -d "\n"')<CR>
<Esc>
q

This macro can then be invoked ad hoc by @d, or over a range with :{range}norm @d. This solution is most flexible and repeatable, and is best for my needs, where I may be adding and removing log statements as I learn about whatever problem I'm troubleshooting.