SoulBook/docs/规则定义.md

## 规则定义说明：

对于各类小说网站的解析，只需按照这个要求进行定义，解析便会尽量自动进行，总规则如下：

```python
# 在项目下config/config.py文件中可以看到
Rules = namedtuple('Rules', 'content_url chapter_selector content_selector')
# 在自定义字典RULES下面可以看到各个网站的规则，仅限class和id
# demo  'name': Rules('content_url', {chapter_selector}, {content_selector})
```

对于各个`namedtuple`里面的`content_url，chapter_selector，content_selector`，下面通过不同类型的网站一一进行解释：

- http://www.bxwx9.org/b/57/57181/

  打开这个网站，审查元素后你会发现：

  - url输入栏显示内容为`http://www.bxwx9.org/b/57/57181/`
  - 主要内容包含在`.TabCss`内
  - 第一章的href值为`8845907.html`，由此可知该章的链接为`http://www.bxwx9.org/b/57/57181/8845907.html`

  最后写出规则：

  ```python
  RULES={
    'www.bxwx9.org': Rules('0', {'class': 'TabCss'}, {})
  }
  ```

  `key`值起判断该网站是否被解析，注意此时`content_url`为0，这是表示在解析小说目录的时候，每章节对应的url在拼接过程中需要的是当前的url。对于`content_url`，请看下表：

  |   值    |          说明          |
  | :----: | :------------------: |
  |   -1   |     不解析，跳转到原本网页      |
  |   0    |  表示章节网页需要当前页面url拼接   |
  |   1    | 表示章节链接使用本身自带的链接，不用拼接 |
  | netloc |       用域名进行拼接        |

  `chapter_selector`解析章节目录的规则

  `content_selector`解析目录内容的规则


- http://www.23us.la/html/0/42/

  ```python
  RULES={
    'www.23us.la': Rules('http://www.23us.la', {'class': 'inner'}, {})
  }
  ```

- http://zetianjiba.net

  ```python
  RULES={
    'zetianjiba.net': Rules('1', {'class': 'bg'}, {})
  }
  ```


`content_selector`值的获取方式同上