While testing the Markdown files generated by pandoc in Hugo, I realized I had some duplicate content that was making the individual posts very difficult to read. After careful reading of the pandoc Manual I discovered to my sadness that there was no built-in way to remove the content I did not want 😉. The only option then was to write pandoc filters. Now pandoc has libraries that allow you to write filters in pretty much any modern language but out of the box, it supports filters written in Lua. Which is a language I had (and still have) zero knowledge about. But with some intensive trawling of the pandoc mailing list and various GitHub repo’s I was able to stitch together two filters.
Filter 1 - Remove all Level 1 Headers
function Block (elem)
if elem.level == 1 then
return {} -- "empty list" == remove
else
return head
end
end
Filter 2 - Remove Level 3 Headers with the word “Tags” present
local looking_at_tags = false
local remove = {}
function Header (elem)
if elem.level == 3 and elem.identifier == 'tags' then
looking_at_tags = true
return {}
else
looking_at_tags = false
end
end
function Block (elem)
if looking_at_tags then
remove[#remove + 1] = elem
return {}
end
end
In order to run these filters, you can put them in the pandoc Data directory. The pandoc Data directory can be found by running pandoc -V
and looking at the paths mentioned in the output. Once you’ve placed the filters there, you can invoke them as follows:
pandoc X:\path\to\source.md -f markdown --wrap=preserve --atx-headers --lua-filter=header-strip.lua --lua-filter=tags-strip.lua -t markdown_mmd+yaml_metadata_block -o X:\path\to\content\posts\target.md
It’s important to note that pandoc processes filters in sequence so to avoid unexpected side-effects, make sure to specify them in the right order. Also, I think you cannot have multiple function Block
elements in a single LUA filter but like I said I don’t know anything about Lua.