Advanced Guide
Create plugins
Mistune has many built-in plugins, you can take a look at the source code
in mistune/plugins to find out how to write a plugin. In this documentation,
I’ll guide you with an example, let’s take a look at the math plugin
(located at mistune/plugins/math.py):
def math(md):
md.block.register('block_math', BLOCK_MATH_PATTERN, parse_block_math, before='list')
md.inline.register('inline_math', INLINE_MATH_PATTERN, parse_inline_math, before='link')
if md.renderer and md.renderer.NAME == 'html':
md.renderer.register('block_math', render_block_math)
md.renderer.register('inline_math', render_inline_math)
The parameter md is the instance of Markdown. In our example, we have registered
a block level math plugin and an inline level math plugin.
Block level plugin
Function md.block.register will register a block level plugin. In the math example:
$$
\operatorname{ker} f=\{g\in G:f(g)=e_{H}\}{\mbox{.}}
$$
This is how a block level math syntax looks like. Our BLOCK_MATH_PATTERN is:
# block level pattern MUST startswith ^
BLOCK_MATH_PATTERN = r'^ {0,3}\$\$[ \t]*\n(?P<math_text>.+?)\n\$\$[ \t]*$'
# regex represents:
BLOCK_MATH_PATTERN = (
r'^ {0,3}' # line can startswith 0~3 spaces just like other block elements defined in commonmark
r'\$\$' # followed by $$
r'[ \t]*\n' # this line can contain extra spaces and tabs
r'(?P<math_text>.+?)' # this is the math content, MUST use named group
r'\n\$\$[ \t]*$' # endswith $$ + extra spaces and tabs
)
# if you want to make the math pattern more strictly, it could be like:
BLOCK_MATH_PATTERN = r'^\$\$\n(?P<math_text>.+?)\n\$\$$'
Then the block parsing function:
def parse_block_math(block, m, state):
text = m.group('math_text')
# use ``state.append_token`` to save parsed block math token
state.append_token({'type': 'block_math', 'raw': text})
# return the end position of parsed text
# since python doesn't count ``$``, we have to +1
# if the pattern is not ended with `$`, we can't +1
return m.end() + 1
The token MUST contain type, others are optional. Here are some examples:
{'type': 'thematic_break'} # <hr>
{'type': 'paragraph', 'text': text}
{'type': 'block_code', 'raw': code}
{'type': 'heading', 'text': text, 'attrs': {'level': level}}
text: inline parser will parse text
raw: inline parser WILL NOT parse the content
attrs: extra information saved here, renderer will use attrs
Inline level plugin
Function md.inline.register will register an inline level plugin. In the math example:
function $f$
This is how an inline level math syntax looks like. Our INLINE_MATH_PATTERN is:
INLINE_MATH_PATTERN = r'\$(?!\s)(?P<math_text>.+?)(?!\s)\$'
# regex represents:
INLINE_MATH_PATTERN = (
r'\$' # startswith $
r'(?!\s)' # not whitespace
r'(?P<math_text>.+?)' # content between `$`, MUST use named group
r'(?!\s)' # not whitespace
r'\$' # endswith $
)
Then the inline parsing function:
def parse_inline_math(inline, m, state):
text = m.group('math_text')
# use ``state.append_token`` to save parsed inline math token
state.append_token({'type': 'inline_math', 'raw': text})
# return the end position of parsed text
return m.end()
The inline token value looks the same with block token. Available keys:
type, raw, text, attrs.
Plugin renderers
It is suggested to add default HTML renderers for your plugin. A renderer function looks like:
def render_hr(renderer):
# token with only type, like:
# {'type': 'hr'}
return '<hr>'
def render_math(renderer, text):
# token with type and (text or raw), e.g.:
# {'type': 'block_math', 'raw': 'a^b'}
return '<div class="math">$$' + text + '$$</div>'
def render_link(renderer, text, **attrs):
# token with type, text or raw, and attrs
href = attrs['href']
return f'<a href="{href}">{text}</a>'
If current markdown instance is using HTML renderer, developers can register the plugin renderer for converting markdown to HTML.
Write directives
Mistune has some built-in directives that have been presented in
the directives part of the documentation. These are defined in the
mistune/directives, you can learn how to write a new directive
by reading the source code in mistune/directives/.
Parsing AST tokens
Mistune provides direct access to AST tokens by creating a markdown object
via mistune.create_markdown(renderer='ast') (see Abstract syntax tree).
By walking down the AST returned from the markdown object, you can integrate
Mistune’s parser into other systems.
import mistune
markdown = mistune.create_markdown(renderer='ast')
tokens = markdown(
'''# Title
Subtitle
--------
Hello World!'''
)
stk = list(reversed(tokens))
while stk:
token = stk.pop()
print({k:v for k, v in token.items() if k != 'children'})
if 'children' in token:
for child in reversed(token['children']):
stk.append(child)
Below is the documentation for the list of tokens that can occur in
renderer='ast' mode.
Token structure
An AST token is a dict containing an item whose key is 'type' and value
is a string representing the token type (such as 'text', 'emphasis',
'strong'). If the token has children, they are represented as a
list under the 'children' key.
Inline elements
{ 'type': 'linebreak' }
{ 'type': 'softbreak' }
{ 'type': 'text', 'raw': str }
{ 'type': 'emphasis', 'children': list[dict] }
{ 'type': 'strong', 'children': list[dict] }
{ 'type': 'codespan', 'raw': str }
{ 'type': 'inline_html', 'raw': str }
# links and images
#
# 'children' contains elements in the link text section. If you
# write something like [**text**](url), **text** goes to 'children'.
# This behavior is identical for both images and links, but the HTML
# renderer extracts only the text part of children when actually
# putting it into 'alt' attribute (e.g.,  returns
# <img src="url" alt="text">, not <img src="url" alt="**text**">)
#
# for reference links and images (like [text][label], [label], etc.),
# 'ref' and 'label' are also given. Both contain the same content,
# but 'ref' is an uppercase version, while 'label' is case-sensitive.
#
{
'type': 'image',
'children': list[dict], # link text
'attrs': {
'url': str,
'title': str | None # is None if not given
},
'ref': str, # omitted if not reference links and images
'label': str # omitted if not reference links and images
}
{
'type': 'link',
'children': list[dict], # link text
'attrs': {
'url': str,
'title': str | None # is None if not given
},
'ref': str, # omitted if not reference links and images
'label': str # omitted if not reference links and images
}
Block elements
{ 'type': 'blank_line' }
{ 'type': 'thematic_break' }
{ 'type': 'paragraph', 'children': list[dict] }
# 'block_text' is a special text block that occurs in 'tight' lists.
#
# when a list is tight (i.e., there is no blank line between any list
# items or their children), and if a leaf list item contains only a
# paragraph, that paragraph's 'type' is changed to 'block_text'
# ('children' remains the same).
#
# block_texts are immediately put between <li>...</li>, where paragraphs
# (occurring in 'loose' lists) are rendered like <li><p>...</p></li>.
#
{ 'type': 'block_text', 'children': list[dict] }
# 'style' can be 'atx' or 'setext'
{
'type': 'heading',
'children': list[dict],
'attrs': {'level': int},
'style': str
}
{ 'type': 'block_quote', 'children': list[dict] }
{ 'type': 'block_html', 'raw': str }
{ 'type': 'block_code', 'raw': str, 'style': 'indent' }
# fenced block code
{
'type': 'block_code',
'raw': str,
'style': 'fenced',
'marker': str,
'attrs': {'info': str} # appears if info string is given
}
List elements
{
'type': 'list',
'children': [{'type': 'list_item', 'children': list[dict]}, ...],
'tight': bool, # whether the list is 'tight' or 'loose'
'bullet': str, # list marker character
'attrs': {
'depth': int,
'ordered': bool, # whether the list is ordered or unordered
'start': int # appears if the list is ordered and start != 1
}
}
Plugin elements
# strikethrough, mark, insert, superscript, and subscript plugin
{ 'type': 'strikethrough', 'children': list[dict] }
{ 'type': 'mark', 'children': list[dict] }
{ 'type': 'insert', 'children': list[dict] }
{ 'type': 'superscript', 'children': list[dict] }
{ 'type': 'subscript', 'children': list[dict] }
# footnotes plugin
{ 'type': 'footnote_ref', 'raw': str, 'attrs': {'index': int} }
{
'type': 'footnotes',
'children': [
{
'type': 'footnote_item',
'children': [{'type': 'paragraph', 'children': list[dict]}],
'attrs': {'key': str, 'index': int}
},
...
]
}
# table plugin
{
'type': 'table',
'children': [
{
'type': 'table_head',
'children': [
{
'type': 'table_cell',
'children': list[dict],
'attrs': {
# 'align' is 'center', 'left', 'right', or None
'align': str | None,
'head': True
}
},
...
]
},
{
'type': 'table_body',
'children': {
'type': 'table_row',
'children': [
{
'type': 'table_cell',
'children': list[dict],
'attrs': {
# 'align' is 'center', 'left', 'right', or None
'align': str | None,
'head': False
}
},
...
]
}
}
]
}
# url plugin does not add new elements
# (it uses 'link' element just like normal links)
# task_lists plugin
#
# task_list_item appears in the same contexts as list_item.
#
{
'type': 'task_list_item',
'children': list[dict],
'attrs': {'checked': bool}
}
# def_list plugin
#
# similar to regular lists, sole paragraphs in def_list_items are
# converted to 'block_texts' if the definition list is tight.
#
{
'type': 'def_list',
'children': [
{ 'type': 'def_list_head', 'children': list[dict] },
{ 'type': 'def_list_item', 'children': list[dict] },
...
]
}
# abbr plugin
{
'type': 'abbr',
'children': [{'type': 'text', 'raw': str}],
'attrs': {'title': str}
}
# math plugin
{ 'type': 'block_math', 'raw': str }
{ 'type': 'inline_math', 'raw': str }
# ruby plugin
{ 'type': 'ruby', 'raw': str, 'attrs': {'rt': str} }
# spoiler plugin
{ 'type': 'block_spoiler', 'children': list[dict] }
{ 'type': 'inline_spoiler', 'children': list[dict] }