SoulBook/docs/规则定义.md
2024-08-01 19:38:07 +08:00

64 lines
1.9 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## 规则定义说明:
对于各类小说网站的解析,只需按照这个要求进行定义,解析便会尽量自动进行,总规则如下:
```python
# 在项目下config/config.py文件中可以看到
Rules = namedtuple('Rules', 'content_url chapter_selector content_selector')
# 在自定义字典RULES下面可以看到各个网站的规则仅限class和id
# demo 'name': Rules('content_url', {chapter_selector}, {content_selector})
```
对于各个`namedtuple`里面的`content_urlchapter_selectorcontent_selector`,下面通过不同类型的网站一一进行解释:
- http://www.bxwx9.org/b/57/57181/
打开这个网站,审查元素后你会发现:
- url输入栏显示内容为`http://www.bxwx9.org/b/57/57181/`
- 主要内容包含在`.TabCss`内
- 第一章的href值为`8845907.html`,由此可知该章的链接为`http://www.bxwx9.org/b/57/57181/8845907.html`
最后写出规则:
```python
RULES={
'www.bxwx9.org': Rules('0', {'class': 'TabCss'}, {})
}
```
`key`值起判断该网站是否被解析,注意此时`content_url`为0这是表示在解析小说目录的时候每章节对应的url在拼接过程中需要的是当前的url。对于`content_url`,请看下表:
| 值 | 说明 |
| :----: | :------------------: |
| -1 | 不解析,跳转到原本网页 |
| 0 | 表示章节网页需要当前页面url拼接 |
| 1 | 表示章节链接使用本身自带的链接,不用拼接 |
| netloc | 用域名进行拼接 |
`chapter_selector`解析章节目录的规则
`content_selector`解析目录内容的规则
- http://www.23us.la/html/0/42/
```python
RULES={
'www.23us.la': Rules('http://www.23us.la', {'class': 'inner'}, {})
}
```
- http://zetianjiba.net
```python
RULES={
'zetianjiba.net': Rules('1', {'class': 'bg'}, {})
}
```
`content_selector`值的获取方式同上