如何在 mistune 解析器上完美支持 Mathjax

Python 指南 Markdown

教程

本文最后更新于：2020年7月20日上午

Markdown真是一个好用的文本标记语言，Mathjax也是个好数学公式渲染引擎，但自从Markdown出来后，各种解析器就各立潮头，多少有些标准不同，但有两点是差不多都有的，\的转义和_转换成了<em></em>标签，于是，当你用上Mathjax时，有些公式就出大问题了……

mistune 篇（Markdown 篇）

我用的是mistune解析器（因为这真是Python上很快的 Markdown 解析器了），本来作者也写了math支持扩展，但是exmaple并没有给全，这……只有自己去寻找蛛丝马迹……

mistune官方扩展库

然而我照着网上的一些教程的使用方式，却发现作者的math支持扩展解析自定义块级元素，在运行时会报错，最后翻阅了许久网站，终于又在某个国外网站找到了解决方案。

附上我稍微改动的代码:

#!/usr/bin/python3
# Modify from https://blog.depado.eu/post/mistune-parser-syntax-mathjax-centered-images
import re, mistune

class MathBlockGrammar(mistune.BlockGrammar):
    block_math = re.compile(r"^\$\$(.*?)\$\$", re.DOTALL)
    latex_environment = re.compile(r"^\\begin\{([a-z]*\*?)\}(.*?)\\end\{\1\}", re.DOTALL)

class MathBlockLexer(mistune.BlockLexer):
    default_rules = ['block_math', 'latex_environment'] + mistune.BlockLexer.default_rules

    def __init__(self, rules=None, **kwargs):
        if rules is None:
            rules = MathBlockGrammar()
        super(MathBlockLexer, self).__init__(rules, **kwargs)

    def parse_block_math(self, m):
        """Parse a $$math$$ block"""
        self.tokens.append({
            'type': 'block_math',
            'text': m.group(1)
        })

    def parse_latex_environment(self, m):
        self.tokens.append({
            'type': 'latex_environment',
            'name': m.group(1),
            'text': m.group(2)
        })

class MathInlineGrammar(mistune.InlineGrammar):
    math = re.compile(r"^\$(.+?)\$", re.DOTALL)
    block_math = re.compile(r"^\$\$(.+?)\$\$", re.DOTALL)
    text = re.compile(r'^[\s\S]+?(?=[\\<!\[_*`~\$]|https?://| {2,}\n|$)')

class MathInlineLexer(mistune.InlineLexer):
    default_rules = ['block_math', 'math'] + mistune.InlineLexer.default_rules

    def __init__(self, renderer, rules=None, **kwargs):
        if rules is None:
            rules = MathInlineGrammar()
        super(MathInlineLexer, self).__init__(renderer, rules, **kwargs)

    def output_math(self, m):
        return self.renderer.inline_math(m.group(1))

    def output_block_math(self, m):
        return self.renderer.block_math(m.group(1))

class MathRendererMixin(mistune.Renderer):
    def block_code(self, code, lang=None):
        code = code.rstrip('\n')
        if not lang:
            lang = 'text'
        code = mistune.escape(code, quote=True, smart_amp=False)
        return '<pre class="language-%s"><code class="language-%s">%s\n</code></pre>\n' % (lang, lang, code)

    def block_math(self, text):
        return '$$%s$$' % text

    def latex_environment(self, name, text):
        return r'\begin{%s}%s\end{%s}' % (name, text, name)

    def inline_math(self, text):
        return '$%s$' % text

class MarkdownWithMath(mistune.Markdown):
    def __init__(self, renderer, **kwargs):
        if 'inline' not in kwargs:
            kwargs['inline'] = MathInlineLexer
        if 'block' not in kwargs:
            kwargs['block'] = MathBlockLexer
        super(MarkdownWithMath, self).__init__(renderer, **kwargs)

    def output_block_math(self):
        return self.renderer.block_math(self.token['text'])

    def output_latex_environment(self):
        return self.renderer.latex_environment(self.token['name'], self.token['text'])

像下面这么用（content就是最后渲染出的html）：

mk = MarkdownWithMath(renderer=MathRendererMixin())
content = mk(r"{}".format(content))

记住，一定要防止传入content时就被python转义了（~~就这又坑了好一会儿~~）

一些可能会出错的例子：

The entries of $C$ are given by the exact formula:

$C_{ik} = \sum_{j=1}^n A_{ij} B_{jk}$

but there are many ways to _implement_ this computation. $\approx 2mnp$ flops

$m$

$C = \begin{pmatrix} 0 & 0 & 0 & \cdots & 0 & 0 & -c_0 \\ 0 & 0 & 0 & \cdots & 0 & 1 & -c_{m-1} \end{pmatrix}$

$x^m$

$r=\overline{1,n}$

${\bf b}_{i}^{r}(t)=(1-t)\,{\bf b}_{i}^{r-1}(t)+t\,{\bf b}_{i+1}^{r-1}(t),\: i=\overline{0,n-r},$

i.e. the $i^{th}$

以上数学公式的源码（这些样例来自 mistune 的扩展库）：

The entries of $C$ are given by the exact formula:
$$
C_{ik} = \sum_{j=1}^n A_{ij} B_{jk}
$$
but there are many ways to _implement_ this computation.   $\approx 2mnp$ flops

$m$
$$
C = \begin{pmatrix}
          0 & 0 & 0 & \cdots & 0 & 0 & -c_0 \\
          0 & 0 & 0 & \cdots & 0 & 1 & -c_{m-1}
    \end{pmatrix}
$$
$x^m$

$r=\overline{1,n}$
$$ {\bf
b}_{i}^{r}(t)=(1-t)\,{\bf b}_{i}^{r-1}(t)+t\,{\bf b}_{i+1}^{r-1}(t),\:
 i=\overline{0,n-r}, $$
i.e. the $i^{th}$

至于block_code, 由于我用的CodeBlock.js去处理代码块外框和 Prism.js 进行代码高亮，识别的代码块应该是<pre class="language-%s"><code class="language-%s">%s\n</code></pre>\n 这种形式，所以我就简单地重新实现了Renderer的block_code函数。

Mathjax篇

关于Mathjax在前端的配置网上有许多教程，这里不再赘述。

首先可以在head块里加上下面的dns-prefetch，有利于加快cdn速度（~~虽然好像不太明显~~）

<!-- Mathjax JS dns prefetch-->
<link rel="dns-prefetch" href="//cdn.bootcss.com" />

最重要的，在需要渲染数学公式的html后放上Mathjax.js，如果需要配置，可以在引入Mathjax.js前进行配置

<script type="text/x-mathjax-config">
	var articlemathId = document.getElementById("articleContent");
	var commentmathId = document.getElementById("commentlist-container");
	MathJax.Hub.Config({
		tex2jax: {
			inlineMath: [ ['$','$'] ], //行内公式
			displayMath: [ ['$$','$$'] ], //行间公式
			skipTags: ['script', 'noscript', 'style', 'textarea', 'pre','code','a'], //渲染时跳过的html标签
			ignoreClass: "summary", //忽略的class
		}
	});
	MathJax.Hub.Queue(["Typeset", MathJax.Hub, articlemathId, commentmathId]); //指定渲染的html块，可以为多个
</script>
<script src="//cdn.bootcss.com/mathjax/2.7.7/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>

至于为什么script标签src以//开头，这叫做相对URL，相关的标准可以看 RFC 3986 Section 4.2 (~~估计没几个人能认真看完~~)。

简单来说，对于相对URL，浏览器会根据当前的网页协议，自动在 // 前面加上相同的协议。比如我这篇文章是在https协议下，则会在//cdn.bootcss.com/前加上https变成https://cdn.bootcss.com/，其它协议同理。

完成以上内容，你就可以愉快地在 Python 的网站框架上使用支持 Mathjax 的 mistune 解析器了。

教程

Python 指南 Markdown

SpaceSkyNet

https://spaceskynet.top/posts/2004966611.html