Using LUA Filters with pandoc


While testing the Markdown files generated by pandoc in Hugo, I realized I had some duplicate content that was making the individual posts very difficult to read. After careful reading of the pandoc Manual I discovered to my sadness that there was no built-in way to remove the content I did not want 😉. The only option then was to write pandoc filters. Now pandoc has libraries that allow you to write filters in pretty much any modern language but out of the box, it supports filters written in Lua. Which is a language I had (and still have) zero knowledge about. But with some intensive trawling of the pandoc mailing list and various GitHub repo’s I was able to stitch together two filters.

Filter 1 - Remove all Level 1 Headers

function Block (elem)
    if elem.level == 1 then
        return {} -- "empty list" == remove
    else
        return head
    end
end

Filter 2 - Remove Level 3 Headers with the word “Tags” present

local looking_at_tags = false
local remove = {}

function Header (elem)
  if elem.level == 3 and elem.identifier == 'tags' then
    looking_at_tags = true
    return {}
  else
    looking_at_tags = false
  end
end

function Block (elem)
  if looking_at_tags then
    remove[#remove + 1] = elem
    return {}
  end
end

In order to run these filters, you can put them in the pandoc Data directory. The pandoc Data directory can be found by running pandoc -V and looking at the paths mentioned in the output. Once you’ve placed the filters there, you can invoke them as follows:

pandoc X:\path\to\source.md -f markdown --wrap=preserve --atx-headers --lua-filter=header-strip.lua --lua-filter=tags-strip.lua -t markdown_mmd+yaml_metadata_block -o X:\path\to\content\posts\target.md

It’s important to note that pandoc processes filters in sequence so to avoid unexpected side-effects, make sure to specify them in the right order. Also, I think you cannot have multiple function Block elements in a single LUA filter but like I said I don’t know anything about Lua.