98 lines
3.1 KiB
Markdown
98 lines
3.1 KiB
Markdown
# Changelog
|
|
|
|
## 0.5.0 - 2021-05-03
|
|
## Added
|
|
- Now program makes use of skip_posts argument. Use CLI option `-S <number>`
|
|
or `--skip-posts <number>` to set how much posts you want to skip.
|
|
|
|
## Changed
|
|
- Better, minified messages;
|
|
- Fixed inheritance of `Scraper`'s subclasses and its sane rewrite that led to
|
|
future easy extension with way less repeating.
|
|
- Added a general class `TinyboardLikeParser` that implements post parser for
|
|
all imageboards based on it or the ones that have identical JSON API. From now
|
|
on all such generalisation classes will end with `*LikeParser`;
|
|
- Changed `file_base_url` for 8kun.top.
|
|
|
|
|
|
## Removed
|
|
- Support for Lolifox, since it's gone.
|
|
|
|
## 0.4.1 - 2020-12-08
|
|
## Fixed
|
|
- Now HTTPException from http.client and URLError from urllib.request
|
|
are handled;
|
|
- 2ch.hk's stickers handling.
|
|
|
|
## 0.4.0 - 2020-11-18
|
|
### Added
|
|
- For 2ch.hk check for if a file is a sticker was added;
|
|
- Encoding for `!op.txt` file was explicitly set to `utf-8`;
|
|
- Handling of connection errors was added so now program won't crash if file
|
|
doesn't exist or not accessible for any other reason and if any damaged files
|
|
was created then they will be removed;
|
|
- Added 3 retries if file was damaged during downloading;
|
|
- To a scraper was added matching of hashes of two files that happen to share
|
|
same name and size, but hash reported by an imageboard is not the same as of
|
|
a file. It results in excessive downloading and hash calculations. Hopefully,
|
|
that only the case for 2ch.hk.
|
|
|
|
### Changed
|
|
- FileInfo class is now a frozen dataclass for memory efficiency.
|
|
|
|
### Fixed
|
|
- Found that arguments for match function that matches for `image.ext` pattern
|
|
were mixed up in places all over the parsers;
|
|
- Also for 2ch.hk checking for if `sub` and `com` was changed to `subject` and
|
|
`comment`.
|
|
|
|
## 0.3.0 - 2020-09-09
|
|
### Added
|
|
- Parser for lolifox.cc.
|
|
|
|
### Removed
|
|
- BasicScraper. Not needed anymore, there is a faster threaded version.
|
|
|
|
### Fixed
|
|
- Now User-Agent is correctly applied everywhere.
|
|
|
|
|
|
## 0.2.2 - 2020-07-20
|
|
### Added
|
|
- Parser for 8kun.top.
|
|
|
|
### Changed
|
|
- The way of comparison if that site is supported to just looking for a
|
|
substring.
|
|
- Edited regex that checks if filename is just an "image.ext" so it only checks
|
|
if after "image." only goes 1 to 4 characters.
|
|
|
|
### Notes
|
|
- Consider that issue with size on 2ch.hk. Usually it really tells the size in
|
|
kB. The problem is that sometimes it just wrong.
|
|
|
|
|
|
## 0.2.1 - 2020-07-18
|
|
### Changed
|
|
- Now program tells you what thread doesn't exist or about to be scraped. That
|
|
is useful in batch processing with scripts.
|
|
|
|
|
|
## 0.2.0 - 2020-07-18
|
|
### Added
|
|
- Threaded version of the scraper, so now it is fast as heck!
|
|
|
|
### Fixed
|
|
- Handled situation when OP's post has no comment and/or subject.
|
|
|
|
|
|
## 0.1.0 - 2020-07-08
|
|
### Added
|
|
- JSON parsers for 4chan.org, lainchan.org and 2ch.hk.
|
|
- Basic straightforward scraper that downloads files one by one.
|
|
|
|
### Issues
|
|
- 2ch.hk: I can't figure out what exactly it tells as a size and hash of a file.
|
|
Example: file may have a size of 127798 bytes (125K) but 2ch reports 150 and a
|
|
hash reported doesn't equal to a computed one.
|