1
0
ScrapTheChan/CHANGELOG.md

98 lines
3.1 KiB
Markdown
Raw Normal View History

2020-07-08 22:53:39 +04:00
# Changelog
2021-05-03 02:44:19 +04:00
## 0.5.0 - 2021-05-03
## Added
- Now program makes use of skip_posts argument. Use CLI option `-S <number>`
or `--skip-posts <number>` to set how much posts you want to skip.
## Changed
- Better, minified messages;
- Fixed inheritance of `Scraper`'s subclasses and its sane rewrite that led to
future easy extension with way less repeating.
- Added a general class `TinyboardLikeParser` that implements post parser for
all imageboards based on it or the ones that have identical JSON API. From now
on all such generalisation classes will end with `*LikeParser`;
- Changed `file_base_url` for 8kun.top.
## Removed
- Support for Lolifox, since it's gone.
2021-04-28 02:49:26 +04:00
## 0.4.1 - 2020-12-08
## Fixed
- Now HTTPException from http.client and URLError from urllib.request
are handled;
- 2ch.hk's stickers handling.
2020-11-18 23:50:58 +04:00
## 0.4.0 - 2020-11-18
### Added
- For 2ch.hk check for if a file is a sticker was added;
- Encoding for `!op.txt` file was explicitly set to `utf-8`;
2020-11-19 01:26:35 +04:00
- Handling of connection errors was added so now program won't crash if file
doesn't exist or not accessible for any other reason and if any damaged files
was created then they will be removed;
- Added 3 retries if file was damaged during downloading;
2020-11-18 23:50:58 +04:00
- To a scraper was added matching of hashes of two files that happen to share
same name and size, but hash reported by an imageboard is not the same as of
a file. It results in excessive downloading and hash calculations. Hopefully,
that only the case for 2ch.hk.
### Changed
- FileInfo class is now a frozen dataclass for memory efficiency.
### Fixed
- Found that arguments for match function that matches for `image.ext` pattern
were mixed up in places all over the parsers;
- Also for 2ch.hk checking for if `sub` and `com` was changed to `subject` and
`comment`.
## 0.3.0 - 2020-09-09
### Added
- Parser for lolifox.cc.
### Removed
- BasicScraper. Not needed anymore, there is a faster threaded version.
### Fixed
- Now User-Agent is correctly applied everywhere.
2020-07-20 03:51:41 +04:00
## 0.2.2 - 2020-07-20
### Added
- Parser for 8kun.top.
### Changed
- The way of comparison if that site is supported to just looking for a
substring.
- Edited regex that checks if filename is just an "image.ext" so it only checks
if after "image." only goes 1 to 4 characters.
### Notes
- Consider that issue with size on 2ch.hk. Usually it really tells the size in
kB. The problem is that sometimes it just wrong.
2020-07-18 05:10:31 +04:00
## 0.2.1 - 2020-07-18
2020-07-20 03:51:41 +04:00
### Changed
2020-07-18 05:10:31 +04:00
- Now program tells you what thread doesn't exist or about to be scraped. That
is useful in batch processing with scripts.
## 0.2.0 - 2020-07-18
### Added
- Threaded version of the scraper, so now it is fast as heck!
### Fixed
- Handled situation when OP's post has no comment and/or subject.
2020-07-08 22:53:39 +04:00
## 0.1.0 - 2020-07-08
### Added
- JSON parsers for 4chan.org, lainchan.org and 2ch.hk.
- Basic straightforward scraper that downloads files one by one.
### Issues
- 2ch.hk: I can't figure out what exactly it tells as a size and hash of a file.
Example: file may have a size of 127798 bytes (125K) but 2ch reports 150 and a
hash reported doesn't equal to a computed one.