ScrapTheChan/CHANGELOG.md

# Changelog

## 0.5.0 - 2021-05-03
## Added
- Now program makes use of skip_posts argument. Use CLI option `-S <number>`
  or `--skip-posts <number>` to set how much posts you want to skip.

## Changed
- Better, minified messages;
- Fixed inheritance of `Scraper`'s subclasses and its sane rewrite that led to
  future easy extension with way less repeating.
- Added a general class `TinyboardLikeParser` that implements post parser for
  all imageboards based on it or the ones that have identical JSON API. From now
  on all such generalisation classes will end with `*LikeParser`;
- Changed `file_base_url` for 8kun.top.


## Removed
- Support for Lolifox, since it's gone.

## 0.4.1 - 2020-12-08
## Fixed
- Now HTTPException from http.client and URLError from urllib.request
  are handled;
- 2ch.hk's stickers handling.

## 0.4.0 - 2020-11-18
### Added
- For 2ch.hk check for if a file is a sticker was added;
- Encoding for `!op.txt` file was explicitly set to `utf-8`;
- Handling of connection errors was added so now program won't crash if file 
  doesn't exist or not accessible for any other reason and if any damaged files
  was created then they will be removed;
- Added 3 retries if file was damaged during downloading;
- To a scraper was added matching of hashes of two files that happen to share
  same name and size, but hash reported by an imageboard is not the same as of
  a file. It results in excessive downloading and hash calculations. Hopefully,
  that only the case for 2ch.hk.

### Changed
- FileInfo class is now a frozen dataclass for memory efficiency.

### Fixed
- Found that arguments for match function that matches for `image.ext` pattern
  were mixed up in places all over the parsers;
- Also for 2ch.hk checking for if `sub` and `com` was changed to `subject` and
  `comment`.

## 0.3.0 - 2020-09-09
### Added
- Parser for lolifox.cc.

### Removed
- BasicScraper. Not needed anymore, there is a faster threaded version.

### Fixed
- Now User-Agent is correctly applied everywhere.


## 0.2.2 - 2020-07-20
### Added
- Parser for 8kun.top.

### Changed
- The way of comparison if that site is supported to just looking for a
  substring.
- Edited regex that checks if filename is just an "image.ext" so it only checks
  if after "image." only goes 1 to 4 characters.

### Notes
- Consider that issue with size on 2ch.hk. Usually it really tells the size in
  kB. The problem is that sometimes it just wrong.


## 0.2.1 - 2020-07-18
### Changed
- Now program tells you what thread doesn't exist or about to be scraped. That
  is useful in batch processing with scripts.


## 0.2.0 - 2020-07-18
### Added
- Threaded version of the scraper, so now it is fast as heck!

### Fixed
- Handled situation when OP's post has no comment and/or subject.


## 0.1.0 - 2020-07-08
### Added
- JSON parsers for 4chan.org, lainchan.org and 2ch.hk.
- Basic straightforward scraper that downloads files one by one.

### Issues
- 2ch.hk: I can't figure out what exactly it tells as a size and hash of a file.
  Example: file may have a size of 127798 bytes (125K) but 2ch reports 150 and a
  hash reported doesn't equal to a computed one.
Initial commit with all the files. 2020-07-08 22:53:39 +04:00			`# Changelog`

Updated CHANGELOG with version 0.5.0. 2021-05-03 02:44:19 +04:00			`## 0.5.0 - 2021-05-03`
			`## Added`
			- Now program makes use of skip_posts argument. Use CLI option `-S <number>`
			or `--skip-posts <number>` to set how much posts you want to skip.

			`## Changed`
			`- Better, minified messages;`
			- Fixed inheritance of `Scraper`'s subclasses and its sane rewrite that led to
			`future easy extension with way less repeating.`
			- Added a general class `TinyboardLikeParser` that implements post parser for
			`all imageboards based on it or the ones that have identical JSON API. From now`
			on all such generalisation classes will end with `*LikeParser`;
			- Changed `file_base_url` for 8kun.top.


			`## Removed`
			`- Support for Lolifox, since it's gone.`

Changelog update for 0.4.1. 2021-04-28 02:49:26 +04:00			`## 0.4.1 - 2020-12-08`
			`## Fixed`
			`- Now HTTPException from http.client and URLError from urllib.request`
			`are handled;`
			`- 2ch.hk's stickers handling.`

Updated changelog and readme. 2020-11-18 23:50:58 +04:00			`## 0.4.0 - 2020-11-18`
			`### Added`
			`- For 2ch.hk check for if a file is a sticker was added;`
			- Encoding for `!op.txt` file was explicitly set to `utf-8`;
Updated changelog. 2020-11-19 01:26:35 +04:00			`- Handling of connection errors was added so now program won't crash if file`
			`doesn't exist or not accessible for any other reason and if any damaged files`
			`was created then they will be removed;`
			`- Added 3 retries if file was damaged during downloading;`
Updated changelog and readme. 2020-11-18 23:50:58 +04:00			`- To a scraper was added matching of hashes of two files that happen to share`
			`same name and size, but hash reported by an imageboard is not the same as of`
			`a file. It results in excessive downloading and hash calculations. Hopefully,`
			`that only the case for 2ch.hk.`

			`### Changed`
			`- FileInfo class is now a frozen dataclass for memory efficiency.`

			`### Fixed`
			- Found that arguments for match function that matches for `image.ext` pattern
			`were mixed up in places all over the parsers;`
			- Also for 2ch.hk checking for if `sub` and `com` was changed to `subject` and
			`comment`.

			`## 0.3.0 - 2020-09-09`
Added support for lolifox.cc. Fixed User-Agent usage, so it applied correctly everywhere now. 2020-09-09 04:34:41 +04:00			`### Added`
			`- Parser for lolifox.cc.`

			`### Removed`
			`- BasicScraper. Not needed anymore, there is a faster threaded version.`

			`### Fixed`
			`- Now User-Agent is correctly applied everywhere.`


Updated changelog. 2020-07-20 03:51:41 +04:00			`## 0.2.2 - 2020-07-20`
			`### Added`
			`- Parser for 8kun.top.`

			`### Changed`
			`- The way of comparison if that site is supported to just looking for a`
			`substring.`
			`- Edited regex that checks if filename is just an "image.ext" so it only checks`
			`if after "image." only goes 1 to 4 characters.`

			`### Notes`
			`- Consider that issue with size on 2ch.hk. Usually it really tells the size in`
			`kB. The problem is that sometimes it just wrong.`

Added support for lolifox.cc. Fixed User-Agent usage, so it applied correctly everywhere now. 2020-09-09 04:34:41 +04:00
Changelog updated. 2020-07-18 05:10:31 +04:00			`## 0.2.1 - 2020-07-18`
Updated changelog. 2020-07-20 03:51:41 +04:00			`### Changed`
Changelog updated. 2020-07-18 05:10:31 +04:00			`- Now program tells you what thread doesn't exist or about to be scraped. That`
			`is useful in batch processing with scripts.`

Added support for lolifox.cc. Fixed User-Agent usage, so it applied correctly everywhere now. 2020-09-09 04:34:41 +04:00
Version is incremented now and I wrote down the changes. 2020-07-18 04:43:45 +04:00			`## 0.2.0 - 2020-07-18`
			`### Added`
			`- Threaded version of the scraper, so now it is fast as heck!`

			`### Fixed`
			`- Handled situation when OP's post has no comment and/or subject.`


Initial commit with all the files. 2020-07-08 22:53:39 +04:00			`## 0.1.0 - 2020-07-08`
			`### Added`
			`- JSON parsers for 4chan.org, lainchan.org and 2ch.hk.`
			`- Basic straightforward scraper that downloads files one by one.`

			`### Issues`
			`- 2ch.hk: I can't figure out what exactly it tells as a size and hash of a file.`
			`Example: file may have a size of 127798 bytes (125K) but 2ch reports 150 and a`
			`hash reported doesn't equal to a computed one.`