在 Ruby 中使用行注释生成 apiDoc

Posted by Echo on August 10, 2017

以前我试过很多生成 api 文档的方法。

起初使用过 Markdown 独自维护一份文档,但是文档和代码不同步的话,会经常忘记更新。

后来尝试过使用 rDoc、Sphinx 等工具对代码注释进行解析直接生成 api 文档,Sphinx 是一种基于 reStructuredText 格式的文档生成工具。

最近发现了 apiDoc ,它相对于 Sphinx 来说,不用学习 rst 语法,学习成本更低,因此在新的项目中,我尝试使用 apiDoc 生成 api 文档。

然而在 apiDoc 官网的示例中,Ruby 语言是通过多行注释的方式生成的的文档,如下面的例子。

=begin
@api {get} /user/:id Request User information
@apiName GetUser
@apiGroup User

@apiParam {Number} id Users unique ID.

@apiSuccess {String} firstname Firstname of the User.
@apiSuccess {String} lastname  Lastname of the User.
=end

但是多行注释有很多缺点,它必须顶格书写,不能缩进,在代码中也不容易辨认。在Ruby 风格指南中不推荐使用,也不符合我们团队的开发规范。

于是我查看了 apiDoc 的源码,想看能否对其进行修改。很开心的是,我看到了它在源码中是这样进行的匹配。

/**
 * apidoc-core/lib/languages/rb.js
 */
module.exports = {
    // find document blocks between '=begin' and '=end'
    docBlocksRegExp: /#\*\*\uffff?(.+?)\uffff?(?:\s*)?#\*|=begin\uffff?(.+?)\uffff?(?:\s*)?=end/g,
    // remove not needed ' # ' and tabs at the beginning
    inlineRegExp: /^(\s*)?(#)[ ]?/gm
};

这个正则看起来有点乱,分析一下。

可以看到,它在支持多行注释的同时,也支持另外一种行注释的方式。

#**
# @api {get} /user/:id Request User information
# @apiName GetUser
# @apiGroup User

# @apiParam {Number} id Users unique ID.

# @apiSuccess {String} firstname Firstname of the User.
# @apiSuccess {String} lastname  Lastname of the User.
#*

因此问题解决。

PS: 在看到这个正则的时候,字符\uFFFF引起了我的注意。我在 Unicode 码表中并未查到这个字符,于是我以为这个字符在 Javascript 中代表了某种特殊的涵义,但是查了一下,也没有查到结果。

直到我在看到一篇文章中引用的一段文字:

In effect, noncharacters can be thought of as application-internal private-use code points. Unlike the private-use characters discussed in Section 16.5, Private-Use Characters, which are assigned characters and which are intended for use in open interchange, subject to interpretation by private agreement, noncharacters are permanently reserved (unassigned) and have no interpretation whatsoever outside of their possible application-internal private uses.

U+FFFF and U+10FFFF. These two noncharacter code points have the attribute of being associated with the largest code unit values for particular Unicode encoding forms. In UTF-16, U+FFFF is associated with the largest 16-bit code unit value, FFFF16. U+10FFFF is associated with the largest legal UTF-32 32-bit code unit value, 10FFFF16. This attribute renders these two noncharacter code points useful for internal purposes as sentinels. For example, they might be used to indicate the end of a list, to represent a value in an index guaranteed to be higher than any valid character value, and so on.

又看了一下代码:

// Replace Linebreak with Unicode
src = src.replace(/\n/g, '\uffff');

于是真相大白。