1
0
Fork 0
ScrapTheChan/CHANGELOG.md

72 lines
2.2 KiB
Markdown
Raw Normal View History

2020-07-08 22:53:39 +04:00
# Changelog
2020-11-18 23:50:58 +04:00
## 0.4.0 - 2020-11-18
### Added
- For 2ch.hk check for if a file is a sticker was added;
- Encoding for `!op.txt` file was explicitly set to `utf-8`;
2020-11-19 00:09:56 +04:00
- Handling of HTTP errors and reset connection error was added so now program
won't crash if file doesn't exist or not accessible for any other reason;
2020-11-18 23:50:58 +04:00
- To a scraper was added matching of hashes of two files that happen to share
same name and size, but hash reported by an imageboard is not the same as of
a file. It results in excessive downloading and hash calculations. Hopefully,
that only the case for 2ch.hk.
### Changed
- FileInfo class is now a frozen dataclass for memory efficiency.
### Fixed
- Found that arguments for match function that matches for `image.ext` pattern
were mixed up in places all over the parsers;
- Also for 2ch.hk checking for if `sub` and `com` was changed to `subject` and
`comment`.
## 0.3.0 - 2020-09-09
### Added
- Parser for lolifox.cc.
### Removed
- BasicScraper. Not needed anymore, there is a faster threaded version.
### Fixed
- Now User-Agent is correctly applied everywhere.
2020-07-20 03:51:41 +04:00
## 0.2.2 - 2020-07-20
### Added
- Parser for 8kun.top.
### Changed
- The way of comparison if that site is supported to just looking for a
substring.
- Edited regex that checks if filename is just an "image.ext" so it only checks
if after "image." only goes 1 to 4 characters.
### Notes
- Consider that issue with size on 2ch.hk. Usually it really tells the size in
kB. The problem is that sometimes it just wrong.
2020-07-18 05:10:31 +04:00
## 0.2.1 - 2020-07-18
2020-07-20 03:51:41 +04:00
### Changed
2020-07-18 05:10:31 +04:00
- Now program tells you what thread doesn't exist or about to be scraped. That
is useful in batch processing with scripts.
## 0.2.0 - 2020-07-18
### Added
- Threaded version of the scraper, so now it is fast as heck!
### Fixed
- Handled situation when OP's post has no comment and/or subject.
2020-07-08 22:53:39 +04:00
## 0.1.0 - 2020-07-08
### Added
- JSON parsers for 4chan.org, lainchan.org and 2ch.hk.
- Basic straightforward scraper that downloads files one by one.
### Issues
- 2ch.hk: I can't figure out what exactly it tells as a size and hash of a file.
Example: file may have a size of 127798 bytes (125K) but 2ch reports 150 and a
hash reported doesn't equal to a computed one.