1
0
ScrapTheChan/CHANGELOG.md

1.3 KiB

Changelog

0.3 - 2020-09-09

Added

  • Parser for lolifox.cc.

Removed

  • BasicScraper. Not needed anymore, there is a faster threaded version.

Fixed

  • Now User-Agent is correctly applied everywhere.

0.2.2 - 2020-07-20

Added

  • Parser for 8kun.top.

Changed

  • The way of comparison if that site is supported to just looking for a substring.
  • Edited regex that checks if filename is just an "image.ext" so it only checks if after "image." only goes 1 to 4 characters.

Notes

  • Consider that issue with size on 2ch.hk. Usually it really tells the size in kB. The problem is that sometimes it just wrong.

0.2.1 - 2020-07-18

Changed

  • Now program tells you what thread doesn't exist or about to be scraped. That is useful in batch processing with scripts.

0.2.0 - 2020-07-18

Added

  • Threaded version of the scraper, so now it is fast as heck!

Fixed

  • Handled situation when OP's post has no comment and/or subject.

0.1.0 - 2020-07-08

Added

  • JSON parsers for 4chan.org, lainchan.org and 2ch.hk.
  • Basic straightforward scraper that downloads files one by one.

Issues

  • 2ch.hk: I can't figure out what exactly it tells as a size and hash of a file. Example: file may have a size of 127798 bytes (125K) but 2ch reports 150 and a hash reported doesn't equal to a computed one.