1
0
ScrapTheChan/CHANGELOG.md

2.2 KiB

Changelog

0.4.0 - 2020-11-18

Added

  • For 2ch.hk check for if a file is a sticker was added;
  • Encoding for !op.txt file was explicitly set to utf-8;
  • Handling of HTTP errors was added so now program won't crash if file doesn't exist or not accessible for any other reason;
  • To a scraper was added matching of hashes of two files that happen to share same name and size, but hash reported by an imageboard is not the same as of a file. It results in excessive downloading and hash calculations. Hopefully, that only the case for 2ch.hk.

Changed

  • FileInfo class is now a frozen dataclass for memory efficiency.

Fixed

  • Found that arguments for match function that matches for image.ext pattern were mixed up in places all over the parsers;
  • Also for 2ch.hk checking for if sub and com was changed to subject and comment.

0.3.0 - 2020-09-09

Added

  • Parser for lolifox.cc.

Removed

  • BasicScraper. Not needed anymore, there is a faster threaded version.

Fixed

  • Now User-Agent is correctly applied everywhere.

0.2.2 - 2020-07-20

Added

  • Parser for 8kun.top.

Changed

  • The way of comparison if that site is supported to just looking for a substring.
  • Edited regex that checks if filename is just an "image.ext" so it only checks if after "image." only goes 1 to 4 characters.

Notes

  • Consider that issue with size on 2ch.hk. Usually it really tells the size in kB. The problem is that sometimes it just wrong.

0.2.1 - 2020-07-18

Changed

  • Now program tells you what thread doesn't exist or about to be scraped. That is useful in batch processing with scripts.

0.2.0 - 2020-07-18

Added

  • Threaded version of the scraper, so now it is fast as heck!

Fixed

  • Handled situation when OP's post has no comment and/or subject.

0.1.0 - 2020-07-08

Added

  • JSON parsers for 4chan.org, lainchan.org and 2ch.hk.
  • Basic straightforward scraper that downloads files one by one.

Issues

  • 2ch.hk: I can't figure out what exactly it tells as a size and hash of a file. Example: file may have a size of 127798 bytes (125K) but 2ch reports 150 and a hash reported doesn't equal to a computed one.