1
0
ScrapTheChan/README.md

48 lines
1.3 KiB
Markdown
Raw Normal View History

2020-07-08 22:53:39 +04:00
This is a tool for scraping files from imageboards' threads.
2020-11-18 23:50:58 +04:00
It extracts the files from a JSON representation of a thread. And then downloads
'em in a specified output directory or if it isn't specified then creates
following directory hierarchy in a working directory:
2020-07-08 22:53:39 +04:00
<imageboard name>
|-<board name>
|-<thread>
|-[!op.txt]
|-...
|-...
# Usage
```bash
2020-07-08 23:13:32 +04:00
scrapthechan [OPTIONS] (<url> | <imageboard> <board> <thread>)
2020-07-08 22:53:39 +04:00
```
2020-07-08 23:13:32 +04:00
`<url>` -- URL of a thread.
`<imageboard> <board> <thread>` -- imageboard name, board name and thread ID
separately. E.g. `4chan b 1100500`.
2020-07-08 22:53:39 +04:00
`-o`, `--output-dir` -- output directory where all files will be dumped to.
2020-11-25 03:36:31 +04:00
`-N`, `--no-op` -- by default OP's post will be saved in a `!op.txt` file. This
flag disables this behaviour. An exclamation mark `!` in a name is for so this
file will be on the top of a directory listing.
2020-07-08 22:53:39 +04:00
`-S <num>`, `--skip-posts <num>` -- skip given number of posts.
2020-11-25 03:36:31 +04:00
`-v`, `--version` prints the version of the program.
`-h`, `--help` prints help for a program.
2020-07-20 04:13:39 +04:00
# Supported imageboards
- [4chan.org](https://4chan.org) since 0.1.0
- [lainchan.org](https://lainchan.org) since 0.1.0
- [2ch.hk](https://2ch.hk) since 0.1.0
- [8kun.top](https://8kun.top) since 0.2.2
# TODO
- Sane rewrite of a program;
- Thread watcher.