Changelog updated with 0.5.1 changes.

Version changed to 0.5.1 in a Makefile.
Version changed to 0.5.1.
2021-05-04 04:04:22 +04:00 · 2021-05-04 03:58:46 +04:00 · 2021-05-04 03:58:02 +04:00 · 2021-05-04 03:56:59 +04:00 · 2021-05-04 03:55:32 +04:00 · 2021-05-03 02:45:41 +04:00
17 changed files with 465 additions and 410 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,6 +1,61 @@
 # Changelog

-## 0.3 - 2020-09-09
+## 0.5.1 - 2021-05-04
+## Added
+- Message when a file cannot be retrieved.
+
+## Fixed
+- Removed excessive hash comparison when files has same name;
+- A string forgotten to set to be a f-string, so now it displays a reason of why
+  thread wasn't found.
+
+## 0.5.0 - 2021-05-03
+## Added
+- Now program makes use of skip_posts argument. Use CLI option `-S <number>`
+  or `--skip-posts <number>` to set how much posts you want to skip.
+
+## Changed
+- Better, minified messages;
+- Fixed inheritance of `Scraper`'s subclasses and its sane rewrite that led to
+  future easy extension with way less repeating.
+- Added a general class `TinyboardLikeParser` that implements post parser for
+  all imageboards based on it or the ones that have identical JSON API. From now
+  on all such generalisation classes will end with `*LikeParser`;
+- Changed `file_base_url` for 8kun.top.
+
+
+## Removed
+- Support for Lolifox, since it's gone.
+
+## 0.4.1 - 2020-12-08
+## Fixed
+- Now HTTPException from http.client and URLError from urllib.request
+  are handled;
+- 2ch.hk's stickers handling.
+
+## 0.4.0 - 2020-11-18
+### Added
+- For 2ch.hk check for if a file is a sticker was added;
+- Encoding for `!op.txt` file was explicitly set to `utf-8`;
+- Handling of connection errors was added so now program won't crash if file 
+  doesn't exist or not accessible for any other reason and if any damaged files
+  was created then they will be removed;
+- Added 3 retries if file was damaged during downloading;
+- To a scraper was added matching of hashes of two files that happen to share
+  same name and size, but hash reported by an imageboard is not the same as of
+  a file. It results in excessive downloading and hash calculations. Hopefully,
+  that only the case for 2ch.hk.
+
+### Changed
+- FileInfo class is now a frozen dataclass for memory efficiency.
+
+### Fixed
+- Found that arguments for match function that matches for `image.ext` pattern
+  were mixed up in places all over the parsers;
+- Also for 2ch.hk checking for if `sub` and `com` was changed to `subject` and
+  `comment`.
+
+## 0.3.0 - 2020-09-09
 ### Added
 - Parser for lolifox.cc.

--- a/2
+++ b/2
@ -1,7 +1,7 @@
 build: scrapthechan README.md setup.cfg
 	python setup.py sdist bdist_wheel
 install:
-	python -m pip install --upgrade dist/scrapthechan-0.3.0-py3-none-any.whl --user
+	python -m pip install --upgrade dist/scrapthechan-0.5.1-py3-none-any.whl --user
 uninstall:
 	# We change directory so pip uninstall will run, it'll fail otherwise.
 	@cd ~/
--- a/README.md
+++ b/README.md
@ -1,8 +1,8 @@
 This is a tool for scraping files from imageboards' threads.

-It extracts the files from a JSON version of a thread. And then downloads 'em
-in a specified output directory or if it isn't specified then creates following
-directory hierarchy in a working directory:
+It extracts the files from a JSON representation of a thread. And then downloads
+'em in a specified output directory or if it isn't specified then creates
+following directory hierarchy in a working directory:

    <imageboard name>
    |-<board name>
@ -24,12 +24,15 @@ separately. E.g. `4chan b 1100500`.

 `-o`, `--output-dir` -- output directory where all files will be dumped to.

-`--no-op` -- by default OP's post will be saved in a `!op.txt` file. This flag
-disables this behaviour. I desided to put an `!` in a name so this file will be
-on the top in a directory listing.
+`-N`, `--no-op` -- by default OP's post will be saved in a `!op.txt` file. This
+flag disables this behaviour. An exclamation mark `!` in a name is for so this
+file will be on the top of a directory listing.

-`-v`, `--version` prints the version of the program, and `-h`, `--help` prints
-help for a program.
+`-S <num>`, `--skip-posts <num>` -- skip given number of posts.
+
+`-v`, `--version` prints the version of the program.
+
+`-h`, `--help` prints help for a program.

 # Supported imageboards

@ -37,4 +40,8 @@ help for a program.
 - [lainchan.org](https://lainchan.org) since 0.1.0
 - [2ch.hk](https://2ch.hk) since 0.1.0
 - [8kun.top](https://8kun.top) since 0.2.2
- [lolifox.cc](https://lolifox.cc) since 0.3
+
+# TODO
+
+- Sane rewrite of a program;
+- Thread watcher.
--- a/scrapthechan/init.py
+++ b/scrapthechan/init.py
@ -1,8 +1,8 @@
-__date__ = "9 September 2020"
-__version__ = "0.3.0"
+__date__ = "4 May 2021"
+__version__ = "0.5.1"
 __author__ = "Alexander \"Arav\" Andreev"
 __email__ = "me@arav.top"
-__copyright__ = f"Copyright (c) 2020 {__author__} <{__email__}>"
+__copyright__ = f"Copyright (c) 2020,2021 {__author__} <{__email__}>"
 __license__ = \
 """This program is licensed under the terms of the MIT license.
 For a copy see COPYING file in a directory of the program, or 
--- a/scrapthechan/cli/scraper.py
+++ b/scrapthechan/cli/scraper.py
@ -3,7 +3,7 @@ from os import makedirs
 from os.path import join, exists
 from re import search
 from sys import argv
-from typing import List
+from typing import List, Optional

 from scrapthechan import VERSION
 from scrapthechan.parser import Parser, ThreadNotFoundError
@ -15,17 +15,18 @@ from scrapthechan.scrapers.threadedscraper import ThreadedScraper
 __all__ = ["main"]


-USAGE = \
+USAGE: str = \
 f"""Usage: scrapthechan [OPTIONS] (URL | IMAGEBOARD BOARD THREAD)

 Options:
-\t-h,--help       -- print this help and exit;
-\t-v,--version    -- print program's version and exit;
-\t-o,--output-dir -- directory where to place scraped files. By default
-\t                   following structure will be created in current directory:
-\t                   <imageboard>/<board>/<thread>;
-\t-N,--no-op      -- by default OP's post will be written in !op.txt file. This
-\t                   option disables this behaviour;
+\t-h,--help             -- print this help and exit;
+\t-v,--version          -- print program's version and exit;
+\t-o,--output-dir       -- directory where to place scraped files. By default
+\t                         following structure will be created in current directory:
+\t                         <imageboard>/<board>/<thread>;
+\t-N,--no-op            -- by default OP's post will be written in !op.txt file. This
+\t                         option disables this behaviour;
+\t-S,--skip-posts <num> -- skip given number of posts.

 Arguments:
 \tURL        -- URL of a thread;
@ -37,15 +38,15 @@ Supported imageboards: {', '.join(SUPPORTED_IMAGEBOARDS)}.
 """


-def parse_common_arguments(args: str) -> dict:
-    r = r"(?P<help>-h|--help)|(?P<version>-v|--version)"
-    args = search(r, args)
-    if not args is None:
-        args = args.groupdict()
-        return {
-            "help": not args["help"] is None,
-            "version": not args["version"] is None }
-    return None
+def parse_common_arguments(args: str) -> Optional[dict]:
+	r = r"(?P<help>-h|--help)|(?P<version>-v|--version)"
+	args = search(r, args)
+	if not args is None:
+		args = args.groupdict()
+		return {
+			"help": not args["help"] is None,
+			"version": not args["version"] is None }
+	return None

 def parse_arguments(args: str) -> dict:
 	rlink = r"^(https?:\/\/)?(?P<site>[\w.-]+)[ \/](?P<board>\w+)(\S+)?[ \/](?P<thread>\w+)"
@ -53,15 +54,21 @@ def parse_arguments(args: str) -> dict:
 	if not link is None:
 		link = link.groupdict()
 	out_dir = search(r"(?=(-o|--output-dir) (?P<outdir>\S+))", args)
+	skip_posts = search(r"(?=(-S|--skip-posts) (?P<skip>\d+))", args)
 	return {
 		"site": None if link is None else link["site"],
 		"board": None if link is None else link["board"],
 		"thread": None if link is None else link["thread"],
+		"skip-posts": None if skip_posts is None else int(skip_posts.group('skip')),
 		"no-op": not search(r"-N|--no-op", args) is None,
 		"output-dir": None if out_dir is None \
 					  else out_dir.groupdict()["outdir"] }

 def main() -> None:
+	if len(argv) == 1:
+		print(USAGE)
+		exit()
+
 	cargs = parse_common_arguments(' '.join(argv[1:]))
 	if not cargs is None:
 		if cargs["help"]:
@ -78,17 +85,21 @@ def main() -> None:
 		exit()

 	try:
-		parser = get_parser_by_site(args["site"], args["board"], args["thread"])
+		if not args["skip-posts"] is None:
+			parser = get_parser_by_site(args["site"], args["board"],
+										args["thread"], args["skip-posts"])
+		else:
+			parser = get_parser_by_site(args["site"], args["board"],
+										args["thread"])
 	except NotImplementedError as ex:
 		print(f"{str(ex)}.")
 		print(f"Supported image boards are {', '.join(SUPPORTED_IMAGEBOARDS)}")
 		exit()
-	except ThreadNotFoundError:
+	except ThreadNotFoundError as e:
 		print(f"Thread {args['site']}/{args['board']}/{args['thread']} " \
-			   "is no longer exist.")
+			   f"not found. Reason: {e.reason}")
 		exit()

-
 	files_count = len(parser.files)

 	if not args["output-dir"] is None:
@ -97,23 +108,22 @@ def main() -> None:
 		save_dir = join(parser.imageboard, parser.board,
 						parser.thread)

-	print(f"There are {files_count} files in " \
-		  f"{args['site']}/{args['board']}/{args['thread']}." \
-		  f"They will be saved in {save_dir}.")
+	print(f"{files_count} files in " \
+		  f"{args['site']}/{args['board']}/{args['thread']}. " \
+		  f"They're going to {save_dir}. ", end="")

 	makedirs(save_dir, exist_ok=True)


 	if not args["no-op"]:
-		print("Writing OP... ", end='')
 		if parser.op is None:
-			print("No text's there.")
+			print("OP's empty.")
 		elif not exists(join(save_dir, "!op.txt")):
-			with open(join(save_dir, "!op.txt"), 'w') as opf:
+			with open(join(save_dir, "!op.txt"), 'w', encoding='utf-8') as opf:
 				opf.write(f"{parser.op}\n")
-			print("Done.")
+			print("OP's written.")
 		else:
-			print("Exists.")
+			print("OP exists.")


 	scraper = ThreadedScraper(save_dir, parser.files, \
--- a/scrapthechan/fileinfo.py
+++ b/scrapthechan/fileinfo.py
@ -1,23 +1,23 @@
-"""FileInfo object stores all needed information about a file."""
+"""FileInfo object stores information about a file."""

+from dataclasses import dataclass

 __all__ = ["FileInfo"]


+@dataclass(frozen=True, order=True)
 class FileInfo:
-	"""Stores all needed information about a file.
+	"""Stores information about a file.

-    Arguments:
-        - `name`       -- name of a file;
-        - `size`       -- size of a file;
-        - `dlurl`      -- full download URL for a file;
-        - `hash_value` -- hash sum of a file;
-        - `hash_algo`  -- hash algorithm used (e.g. md5).
-    """
-	def __init__(self, name: str, size: int, dlurl: str,
-        hash_value: str, hash_algo: str) -> None:
-		self.name = name
-		self.size = size
-		self.dlurl = dlurl
-		self.hash_value = hash_value
-		self.hash_algo = hash_algo
+	Fields:
+		- `name`           -- name of a file;
+		- `size`           -- size of a file;
+		- `download_url`   -- full download URL for a file;
+		- `hash_value`     -- hash sum of a file;
+		- `hash_algorithm` -- hash algorithm used (e.g. md5).
+	"""
+	name: str
+	size: int
+	download_url: str
+	hash_value: str
+	hash_algorithm: str
--- a/scrapthechan/parser.py
+++ b/scrapthechan/parser.py
@ -4,7 +4,7 @@ from itertools import chain
 from json import loads
 from re import findall, match
 from typing import List, Optional
-from urllib.request import urlopen, Request
+from urllib.request import urlopen, Request, HTTPError

 from scrapthechan import USER_AGENT
 from scrapthechan.fileinfo import FileInfo
@ -14,7 +14,12 @@ __all__ = ["Parser", "ThreadNotFoundError"]


 class ThreadNotFoundError(Exception):
-	pass
+	def __init__(self, reason: str = ""):
+		self._reason = reason
+
+	@property
+	def reason(self) -> str:
+		return self._reason


 class Parser:
@ -25,28 +30,42 @@ class Parser:

 	Arguments:
 		board      -- is a name of a board on an image board;
-		thread     -- is a name of a thread inside a board;
-		posts      -- is a list of posts in form of dictionaries exported from a JSON;
+		thread     -- is an id of a thread inside a board;
 		skip_posts -- number of posts to skip.

 	All the extracted files will be stored as the `FileInfo` objects."""
-	__url_thread_json: str = "https://example.org/{board}/{thread}.json"
-	__url_file_link: str = None

-	def __init__(self, board: str, thread: str, posts: List[dict],
+	def __init__(self, board: str, thread: str,
 				 skip_posts: Optional[int] = None) -> None:
-		self._board = board
-		self._thread = thread
-		self._op_post = posts[0]
-		if not skip_posts is None:
-			posts = posts[skip_posts:]
+
+		self._board: str = board
+		self._thread: str = thread
+		self._posts = self._extract_posts_list(self._get_json())
+		self._op_post: dict = self._posts[0]
+		self._posts = self._posts[skip_posts:] if not skip_posts is None else self._posts
 		self._files = list(chain.from_iterable(filter(None, \
-			map(self._parse_post, posts))))
+			map(self._parse_post, self._posts))))
+
+	@property
+	def json_thread_url(self) -> str:
+		raise NotImplementedError
+
+	@property
+	def file_base_url(self) -> str:
+		raise NotImplementedError
+	
+	@property
+	def subject_field(self) -> str:
+		return "sub"
+	
+	@property
+	def comment_field(self) -> str:
+		return "com"

 	@property
 	def imageboard(self) -> str:
 		"""Returns image board's name."""
-		return NotImplementedError
+		raise NotImplementedError

 	@property
 	def board(self) -> str:
@ -62,22 +81,40 @@ class Parser:
 	def op(self) -> str:
 		"""Returns OP's post as combination of subject and comment separated
 		by a new line."""
-		raise NotImplementedError
+		op = ""
+		if self.subject_field in self._op_post:
+			op = f"{self._op_post[self.subject_field]}\n"
+		if self.comment_field in self._op_post:
+			op += self._op_post[self.comment_field]
+		return op if not op == "" else None

 	@property
 	def files(self) -> List[FileInfo]:
 		"""Returns a list of retrieved files as `FileInfo` objects."""
 		return self._files

-	def _get_json(self, thread_url: str) -> dict:
-		"""Gets JSON version of a thread and converts it in a dictionary."""
+	def _extract_posts_list(self, lst: List) -> List[dict]:
+		"""This method must be overridden in child classes where you specify
+		a path in a JSON document where posts are stored. E.g., on 4chan this is
+		['posts'], and on 2ch.hk it's ['threads'][0]['posts']."""
+		return lst
+
+	def _get_json(self) -> dict:
+		"""Retrieves a JSON representation of a thread and converts it in
+		a dictionary."""
 		try:
+			thread_url = self.json_thread_url.format(board=self._board, \
+				thread=self._thread)
 			req = Request(thread_url, headers={'User-Agent': USER_AGENT})
 			with urlopen(req) as url:
 				return loads(url.read().decode('utf-8'))
-		except:
-			raise ThreadNotFoundError
+		except HTTPError as e:
+			raise ThreadNotFoundError(str(e))
+		except Exception as e:
+			raise e

-	def _parse_post(self, post: dict) -> List[FileInfo]:
-		"""Parses a single post and extracts files into `FileInfo` object."""
+	def _parse_post(self, post: dict) -> Optional[List[FileInfo]]:
+		"""Parses a single post and extracts files into `FileInfo` object.
+		Single object is wrapped in a list for convenient insertion into
+		a list."""
 		raise NotImplementedError
--- a/scrapthechan/parsers/init.py
+++ b/scrapthechan/parsers/init.py
@ -1,6 +1,6 @@
 """Here are defined the JSON parsers for imageboards."""
 from re import search
-from typing import List
+from typing import List, Optional

 from scrapthechan.parser import Parser

@ -8,33 +8,31 @@ from scrapthechan.parser import Parser
 __all__ = ["SUPPORTED_IMAGEBOARDS", "get_parser_by_url", "get_parser_by_site"]


+URLRX = r"https?:\/\/(?P<s>[\w\.]+)\/(?P<b>\w+)\/(?:\w+)?\/(?P<t>\w+)"
 SUPPORTED_IMAGEBOARDS: List[str] = ["4chan.org", "lainchan.org", "2ch.hk", \
-	"8kun.top", "lolifox.cc"]
+	"8kun.top"]


-def get_parser_by_url(url: str) -> Parser:
+def get_parser_by_url(url: str, skip_posts: Optional[int] = None) -> Parser:
 	"""Parses URL and extracts from it site name, board and thread.
 	And then returns initialised Parser object for detected imageboard."""
-	URLRX = r"https?:\/\/(?P<s>[\w\.]+)\/(?P<b>\w+)\/(?:\w+)?\/(?P<t>\w+)"
 	site, board, thread = search(URLRX, url).groups()
-	return get_parser_by_site(site, board, thread)
+	return get_parser_by_site(site, board, thread, skip_posts)

-def get_parser_by_site(site: str, board: str, thread: str) -> Parser:
+def get_parser_by_site(site: str, board: str, thread: str,
+					   skip_posts: Optional[int] = None) -> Parser:
 	"""Returns an initialised parser for `site` with `board` and `thread`."""
 	if '4chan' in site:
 		from .fourchan import FourChanParser
-		return FourChanParser(board, thread)
+		return FourChanParser(board, thread, skip_posts)
 	elif 'lainchan' in site:
 		from .lainchan import LainchanParser
-		return LainchanParser(board, thread)
+		return LainchanParser(board, thread, skip_posts)
 	elif '2ch' in site:
 		from .dvach import DvachParser
-		return DvachParser(board, thread)
+		return DvachParser(board, thread, skip_posts)
 	elif '8kun' in site:
 		from .eightkun import EightKunParser
-		return EightKunParser(board, thread)
-	elif 'lolifox' in site:
-		from .lolifox import LolifoxParser
-		return LolifoxParser(board, thread)
+		return EightKunParser(board, thread, skip_posts)
 	else:
 		raise NotImplementedError(f"Parser for {site} is not implemented")
--- a/scrapthechan/parsers/dvach.py
+++ b/scrapthechan/parsers/dvach.py
@ -10,39 +10,54 @@ __all__ = ["DvachParser"]
 class DvachParser(Parser):
 	"""JSON parser for 2ch.hk image board."""

-	__url_thread_json = "https://2ch.hk/{board}/res/{thread}.json"
-	__url_file_link = "https://2ch.hk"
-
 	def __init__(self, board: str, thread: str,
 				 skip_posts: Optional[int] = None) -> None:
-		posts = self._get_json(self.__url_thread_json.format(board=board, \
-			thread=thread))['threads'][0]['posts']
-		super(DvachParser, self).__init__(board, thread, posts, skip_posts)
+		super().__init__(board, thread, skip_posts)
+
+	@property
+	def json_thread_url(self) -> str:
+		return "https://2ch.hk/{board}/res/{thread}.json"
+
+	@property
+	def file_base_url(self) -> str:
+		return "https://2ch.hk"
+
+	@property
+	def subject_field(self) -> str:
+		return "subject"
+
+	@property
+	def comment_field(self) -> str:
+		return "comment"

 	@property
 	def imageboard(self) -> str:
 		return "2ch.hk"

-	@property
-	def op(self) -> Optional[str]:
-		op = ""
-		if 'sub' in self._op_post:
-			op = f"{self._op_post['subject']}\n"
-		if 'com' in self._op_post:
-			op += self._op_post['comment']
-		return op if not op == "" else None
+	def _extract_posts_list(self, lst: List) -> List[dict]:
+		return lst['threads'][0]['posts']

 	def _parse_post(self, post) -> Optional[List[FileInfo]]:
 		if not 'files' in post: return None
+
 		files = []
+
 		for f in post['files']:
-			if match(f['fullname'], r"^image\.\w{1,4}$") is None:
-				fullname = f['fullname']
+			if not 'sticker' in f:
+				if match(r"^image\.\w+$", f['fullname']) is None:
+					fullname = f['fullname']
+				else:
+					fullname = f['name']
 			else:
 				fullname = f['name']
 			# Here's same thing as 4chan. 2ch.hk also has md5 field, so it is
 			# completely fine to hardcode `hash_algo`.
-			files.append(FileInfo(fullname, f['size'],
-				f"{self.__url_file_link}{f['path']}",
-				f['md5'], 'md5'))
+			if 'md5' in f:
+				files.append(FileInfo(fullname, f['size'],
+							 f"{self.file_base_url}{f['path']}",
+							 f['md5'], 'md5'))
+			else:
+				files.append(FileInfo(fullname, f['size'],
+							 f"{self.file_base_url}{f['path']}",
+							 None, None))
 		return files
--- a/scrapthechan/parsers/eightkun.py
+++ b/scrapthechan/parsers/eightkun.py
@ -1,63 +1,25 @@
-from re import match
-from typing import List, Optional
+from typing import Optional

-from scrapthechan.fileinfo import FileInfo
-from scrapthechan.parser import Parser
+from scrapthechan.parsers.tinyboardlike import TinyboardLikeParser

 __all__ = ["EightKunParser"]


-class EightKunParser(Parser):
+class EightKunParser(TinyboardLikeParser):
 	"""JSON parser for 8kun.top image board."""

-	__url_thread_json = "https://8kun.top/{board}/res/{thread}.json"
-	__url_file_link = "https://media.8kun.top/file_store/{filename}"
-
 	def __init__(self, board: str, thread: str,
 				 skip_posts: Optional[int] = None) -> None:
-		posts = self._get_json(self.__url_thread_json.format(board=board, \
-			thread=thread))['posts']
-		super(EightKunParser, self).__init__(board, thread, posts, skip_posts)
+		super().__init__(board, thread, skip_posts)

 	@property
 	def imageboard(self) -> str:
 		return "8kun.top"

 	@property
-	def op(self) -> Optional[str]:
-		op = ""
-		if 'sub' in self._op_post:
-			op = f"{self._op_post['sub']}\n"
-		if 'com' in self._op_post:
-			op += self._op_post['com']
-		return op if not op == "" else None
+	def json_thread_url(self) -> str:
+		return "https://8kun.top/{board}/res/{thread}.json"

-	def _parse_post(self, post: dict) -> List[FileInfo]:
-		if not 'tim' in post: return None
-
-		dlfname = f"{post['tim']}{post['ext']}"
-
-		if "filename" in post:
-			if match(post['filename'], r"^image\.\w{1,4}$") is None:
-				filename = dlfname
-			else:
-				filename = f"{post['filename']}{post['ext']}"
-
-		files = []
-		files.append(FileInfo(filename, post['fsize'],
-			self.__url_file_link.format(board=self.board, filename=dlfname),
-			post['md5'], 'md5'))
-
-		if "extra_files" in post:
-			for f in post["extra_files"]:
-				dlfname = f"{f['tim']}{f['ext']}"
-				if "filename" in post:
-					if match(post['filename'], r"^image\.\w+$") is None:
-						filename = dlfname
-					else:
-						filename = f"{post['filename']}{post['ext']}"
-				dlurl = self.__url_file_link.format(board=self.board, \
-					filename=dlfname)
-				files.append(FileInfo(filename, f['fsize'], \
-					dlurl, f['md5'], 'md5'))
-		return files
+	@property
+	def file_base_url(self) -> str:
+		return "https://media.8kun.top/file_dl/{filename}"
--- a/scrapthechan/parsers/fourchan.py
+++ b/scrapthechan/parsers/fourchan.py
@ -1,51 +1,25 @@
-from re import match
-from typing import List, Optional
+from typing import Optional

-from scrapthechan.fileinfo import FileInfo
-from scrapthechan.parser import Parser
+from scrapthechan.parsers.tinyboardlike import TinyboardLikeParser

 __all__ = ["FourChanParser"]


-class FourChanParser(Parser):
+class FourChanParser(TinyboardLikeParser):
 	"""JSON parser for 4chan.org image board."""

-	__url_thread_json = "https://a.4cdn.org/{board}/thread/{thread}.json"
-	__url_file_link = "https://i.4cdn.org/{board}/{filename}"
-
 	def __init__(self, board: str, thread: str,
 				 skip_posts: Optional[int] = None) -> None:
-		posts = self._get_json(self.__url_thread_json.format(board=board, \
-			thread=thread))['posts']
-		super(FourChanParser, self).__init__(board, thread, posts, skip_posts)
+		super().__init__(board, thread, skip_posts)

 	@property
 	def imageboard(self) -> str:
 		return "4chan.org"

 	@property
-	def op(self) -> Optional[str]:
-		op = ""
-		if 'sub' in self._op_post:
-			op = f"{self._op_post['sub']}\n"
-		if 'com' in self._op_post:
-			op += self._op_post['com']
-		return op if not op == "" else None
+	def json_thread_url(self) -> str:
+		return "https://a.4cdn.org/{board}/thread/{thread}.json"

-	def _parse_post(self, post: dict) -> List[FileInfo]:
-		if not 'tim' in post: return None
-
-		dlfname = f"{post['tim']}{post['ext']}"
-
-		if "filename" in post:
-			if match(post['filename'], r"^image\.\w{1,4}$") is None:
-				filename = dlfname
-			else:
-				filename = f"{post['filename']}{post['ext']}"
-
-		# Hash algorithm is hardcoded since it is highly unlikely that it will
-		# be changed in foreseeable future. And if it'll change then this line
-		# will be necessarily updated anyway.
-		return [FileInfo(filename, post['fsize'],
-			self.__url_file_link.format(board=self.board, filename=dlfname),
-			post['md5'], 'md5')]
+	@property
+	def file_base_url(self) -> str:
+		return "https://i.4cdn.org/{board}/{filename}"
--- a/scrapthechan/parsers/lainchan.py
+++ b/scrapthechan/parsers/lainchan.py
@ -1,66 +1,25 @@
-from re import match
-from typing import List, Optional
+from typing import Optional

-from scrapthechan.parser import Parser
-from scrapthechan.fileinfo import FileInfo
+from scrapthechan.parsers.tinyboardlike import TinyboardLikeParser

 __all__ = ["LainchanParser"]


-class LainchanParser(Parser):
-	"""JSON parser for lainchan.org image board.
-	JSON structure is identical to 4chan.org's, so this parser is just inherited
-	from 4chan.org's parser and only needed things are redefined.
-	"""
-
-	__url_thread_json = "https://lainchan.org/{board}/res/{thread}.json"
-	__url_file_link = "https://lainchan.org/{board}/src/{filename}"
+class LainchanParser(TinyboardLikeParser):
+	"""JSON parser for lainchan.org image board."""

 	def __init__(self, board: str, thread: str,
 				 skip_posts: Optional[int] = None) -> None:
-		posts = self._get_json(self.__url_thread_json.format(board=board, \
-			thread=thread))['posts']
-		super(LainchanParser, self).__init__(board, thread, posts, skip_posts)
+		super().__init__(board, thread, skip_posts)

 	@property
 	def imageboard(self) -> str:
 		return "lainchan.org"
-	
+
 	@property
-	def op(self) -> Optional[str]:
-		op = ""
-		if 'sub' in self._op_post:
-			op = f"{self._op_post['sub']}\n"
-		if 'com' in self._op_post:
-			op += self._op_post['com']
-		return op if not op == "" else None
+	def json_thread_url(self) -> str:
+		return "https://lainchan.org/{board}/res/{thread}.json"

-	def _parse_post(self, post) -> List[FileInfo]:
-		if not 'tim' in post: return None
-
-		dlfname = f"{post['tim']}{post['ext']}"
-
-		if "filename" in post:
-			if match(post['filename'], r"^image\.\w{1,4}$") is None:
-				filename = dlfname
-			else:
-				filename = f"{post['filename']}{post['ext']}"
-
-		files = []
-		files.append(FileInfo(filename, post['fsize'],
-			self.__url_file_link.format(board=self.board, filename=dlfname),
-			post['md5'], 'md5'))
-
-		if "extra_files" in post:
-			for f in post["extra_files"]:
-				dlfname = f"{f['tim']}{f['ext']}"
-				if "filename" in post:
-					if match(post['filename'], r"^image\.\w+$") is None:
-						filename = dlfname
-					else:
-						filename = f"{post['filename']}{post['ext']}"
-				dlurl = self.__url_file_link.format(board=self.board, \
-					filename=dlfname)
-				files.append(FileInfo(filename, f['fsize'], \
-					dlurl, f['md5'], 'md5'))
-		return files
+	@property
+	def file_base_url(self) -> str:
+		return "https://lainchan.org/{board}/src/{filename}"
--- a/scrapthechan/parsers/lolifox.py
+++ b/scrapthechan/parsers/lolifox.py
@ -1,65 +0,0 @@
-from re import match
-from typing import List, Optional
-
-from scrapthechan.parser import Parser
-from scrapthechan.fileinfo import FileInfo
-
-__all__ = ["LolifoxParser"]
-
-
-class LolifoxParser(Parser):
-	"""JSON parser for lolifox.cc image board.
-	JSON structure is identical to lainchan.org.
-	"""
-
-	__url_thread_json = "https://lolifox.cc/{board}/res/{thread}.json"
-	__url_file_link = "https://lolifox.cc/{board}/src/{filename}"
-
-	def __init__(self, board: str, thread: str,
-				 skip_posts: Optional[int] = None) -> None:
-		posts = self._get_json(self.__url_thread_json.format(board=board, \
-			thread=thread))['posts']
-		super(LolifoxParser, self).__init__(board, thread, posts, skip_posts)
-
-	@property
-	def imageboard(self) -> str:
-		return "lolifox.cc"
-	
-	@property
-	def op(self) -> Optional[str]:
-		op = ""
-		if 'sub' in self._op_post:
-			op = f"{self._op_post['sub']}\n"
-		if 'com' in self._op_post:
-			op += self._op_post['com']
-		return op if not op == "" else None
-
-	def _parse_post(self, post) -> List[FileInfo]:
-		if not 'tim' in post: return None
-
-		dlfname = f"{post['tim']}{post['ext']}"
-
-		if "filename" in post:
-			if match(post['filename'], r"^image\.\w{1,4}$") is None:
-				filename = dlfname
-			else:
-				filename = f"{post['filename']}{post['ext']}"
-
-		files = []
-		files.append(FileInfo(filename, post['fsize'],
-			self.__url_file_link.format(board=self.board, filename=dlfname),
-			post['md5'], 'md5'))
-
-		if "extra_files" in post:
-			for f in post["extra_files"]:
-				dlfname = f"{f['tim']}{f['ext']}"
-				if "filename" in post:
-					if match(post['filename'], r"^image\.\w+$") is None:
-						filename = dlfname
-					else:
-						filename = f"{post['filename']}{post['ext']}"
-				dlurl = self.__url_file_link.format(board=self.board, \
-					filename=dlfname)
-				files.append(FileInfo(filename, f['fsize'], \
-					dlurl, f['md5'], 'md5'))
-		return files
--- a/scrapthechan/parsers/tinyboardlike.py
+++ b/scrapthechan/parsers/tinyboardlike.py
@ -0,0 +1,51 @@
+from re import match
+from typing import List, Optional
+
+from scrapthechan.parser import Parser
+from scrapthechan.fileinfo import FileInfo
+
+
+__all__ = ["TinyboardLikeParser"]
+
+
+class TinyboardLikeParser(Parser):
+	"""Base parser for imageboards that are based on Tinyboard, or have similar
+	JSON API."""
+	def __init__(self, board: str, thread: str,
+				 skip_posts: Optional[int] = None) -> None:
+		super().__init__(board, thread, skip_posts)
+
+	def _extract_posts_list(self, lst: List) -> List[dict]:
+		return lst['posts']
+
+	def _parse_post(self, post: dict) -> Optional[List[FileInfo]]:
+		if not 'tim' in post: return None
+
+		dlfname = f"{post['tim']}{post['ext']}"
+
+		if "filename" in post:
+			if match(r"^image\.\w+$", post['filename']) is None:
+				filename = dlfname
+			else:
+				filename = f"{post['filename']}{post['ext']}"
+
+		files = []
+
+		files.append(FileInfo(filename, post['fsize'],
+			self.file_base_url.format(board=self.board, filename=dlfname),
+			post['md5'], 'md5'))
+
+		if "extra_files" in post:
+			for f in post["extra_files"]:
+				dlfname = f"{f['tim']}{f['ext']}"
+				if "filename" in post:
+					if match(r"^image\.\w+$", post['filename']) is None:
+						filename = dlfname
+					else:
+						filename = f"{post['filename']}{post['ext']}"
+				dlurl = self.file_base_url.format(board=self.board, \
+					filename=dlfname)
+				files.append(FileInfo(filename, f['fsize'], \
+					dlurl, f['md5'], 'md5'))
+
+		return files
--- a/scrapthechan/scraper.py
+++ b/scrapthechan/scraper.py
@ -5,8 +5,9 @@ from os import remove, stat
 from os.path import exists, join, getsize
 import re
 from typing import List, Callable
-from urllib.request import urlretrieve, URLopener
+from urllib.request import urlretrieve, URLopener, HTTPError, URLError
 import hashlib
+from http.client import HTTPException

 from scrapthechan import USER_AGENT
 from scrapthechan.fileinfo import FileInfo
@ -15,83 +16,131 @@ __all__ = ["Scraper"]


 class Scraper:
-    """Base class for all scrapers that will actually do the job.
-    
-    Arguments:
-        save_directory             -- a path to a directory where file will be
-                                      saved;
-        files                      -- a list of FileInfo objects;
-        download_progress_callback -- a callback function that will be called
-                                      for each file started downloading.
-    """
-    def __init__(self, save_directory: str, files: List[FileInfo],
-        download_progress_callback: Callable[[int], None] = None) -> None:
-        self._save_directory = save_directory
-        self._files = files
-        self._url_opener = URLopener()
-        self._url_opener.addheaders = [('User-Agent', USER_AGENT)]
-        self._url_opener.version = USER_AGENT
-        self._progress_callback = download_progress_callback
+	"""Base class for all scrapers that will actually do the job.
+	
+	Arguments:
+		save_directory             -- a path to a directory where file will be
+									  saved;
+		files                      -- a list of FileInfo objects;
+		download_progress_callback -- a callback function that will be called
+									  for each file started downloading.
+	"""
+	def __init__(self, save_directory: str, files: List[FileInfo],
+		download_progress_callback: Callable[[int], None] = None) -> None:
+		self._save_directory = save_directory
+		self._files = files
+		self._url_opener = URLopener()
+		self._url_opener.addheaders = [('User-Agent', USER_AGENT)]
+		self._url_opener.version = USER_AGENT
+		self._progress_callback = download_progress_callback

-    def run(self):
-        raise NotImplementedError
+	def run(self):
+		raise NotImplementedError

-    def _same_filename(self, filename: str, path: str) -> str:
-        """Check if there is a file with same name. If so then add incremental
-        number enclosed in brackets to a name of a new one."""
-        newname = filename
-        while exists(join(path, newname)):
-            has_extension = newname.rfind(".") != -1
-            if has_extension:
-                l, r = newname.rsplit(".", 1)
-                lbracket = l.rfind("(")
-                if lbracket == -1:
-                    newname = f"{l}(1).{r}"
-                else:
-                    num = l[lbracket+1:-1]
-                    if num.isnumeric():
-                        newname = f"{l[:lbracket]}({int(num)+1}).{r}"
-                    else:
-                        newname = f"{l}(1).{r}"
-            else:
-                lbracket = l.rfind("(")
-                if lbracket == -1:
-                    newname = f"{newname}(1)"
-                else:
-                    num = newname[lbracket+1:-1]
-                    if num.isnumeric():
-                        newname = f"{newname[:lbracket]}({int(num)+1})"
-        return newname
+	def _same_filename(self, filename: str, path: str) -> str:
+		"""Check if there is a file with same name. If so then add incremental
+		number enclosed in brackets to a name of a new one."""
+		newname = filename
+		while exists(join(path, newname)):
+			has_extension = newname.rfind(".") != -1
+			if has_extension:
+				l, r = newname.rsplit(".", 1)
+				lbracket = l.rfind("(")
+				if lbracket == -1:
+					newname = f"{l}(1).{r}"
+				else:
+					num = l[lbracket+1:-1]
+					if num.isnumeric():
+						newname = f"{l[:lbracket]}({int(num)+1}).{r}"
+					else:
+						newname = f"{l}(1).{r}"
+			else:
+				lbracket = l.rfind("(")
+				if lbracket == -1:
+					newname = f"{newname}(1)"
+				else:
+					num = newname[lbracket+1:-1]
+					if num.isnumeric():
+						newname = f"{newname[:lbracket]}({int(num)+1})"
+		return newname

-    def _hash_file(self, filename: str, hash_algo: str = "md5",
-                   blocksize: int = 1048576) -> (str, str):
-        """Compute hash of a file."""
-        hash_func = hashlib.new(hash_algo)
-        with open(filename, 'rb') as f:
-            buf = f.read(blocksize)
-            while len(buf) > 0:
-                hash_func.update(buf)
-                buf = f.read(blocksize)
-        return hash_func.hexdigest(), hash_func.digest()
+	def _hash_file(self, filepath: str, hash_algorithm: str = "md5",
+				   blocksize: int = 1048576) -> (str, str):
+		"""Compute hash of a file."""
+		if hash_algorithm is None:
+			return None
+		hash_func = hashlib.new(hash_algorithm)
+		with open(filepath, 'rb') as f:
+			buf = f.read(blocksize)
+			while len(buf) > 0:
+				hash_func.update(buf)
+				buf = f.read(blocksize)
+		return hash_func.hexdigest(), b64encode(hash_func.digest()).decode()

-    def _is_file_ok(self, f: FileInfo, filepath: str) -> bool:
-        """Check if a file exist and isn't broken."""
-        if not exists(filepath):
-            return False
-        computed_size = getsize(filepath)
-        is_size_match = f.size == computed_size \
-                        or f.size == round(computed_size / 1024)
-        hexdig, dig = self._hash_file(filepath, f.hash_algo)
-        is_hash_match = f.hash_value == hexdig \
-                        or f.hash_value == b64encode(dig).decode()
-        return is_size_match and is_hash_match
+	def _check_file(self, f: FileInfo, filepath: str) -> bool:
+		"""Check if a file exist and isn't broken."""
+		if not exists(filepath):
+			return False
+		computed_size = getsize(filepath)
+		if not (f.size == computed_size \
+				or f.size == round(computed_size / 1024)):
+			return False
+		if not f.hash_algorithm is None:
+			hexdig, dig = self._hash_file(filepath, f.hash_algorithm)
+			return f.hash_value == hexdig or f.hash_value == dig
+		return True

-    def _download_file(self, f: FileInfo):
-        """Download a single file."""
-        filepath = join(self._save_directory, f.name)
-        if self._is_file_ok(f, filepath):
-            return True
-        elif exists(filepath):
-            filepath = join(self._save_directory, \
-                self._same_filename(f.name, self._save_directory))
-        self._url_opener.retrieve(f.dlurl, filepath)
+	def _download_file(self, f: FileInfo):
+		"""Download a single file."""
+		is_same_filename = False
+		filepath = join(self._save_directory, f.name)
+		orig_filepath = filepath
+		if self._check_file(f, filepath):
+			return
+		elif exists(filepath):
+			is_same_filename = True
+			filepath = join(self._save_directory, \
+				self._same_filename(f.name, self._save_directory))
+		try:
+			retries = 3
+			while retries > 0:
+				self._url_opener.retrieve(f.download_url, filepath)
+				if not self._check_file(f, filepath):
+					remove(filepath)
+					retries -= 1
+				else:
+					break
+			if retries == 0:
+				print(f"Cannot retrieve {f.download_url}, {filepath}.")
+				return
+			if is_same_filename:
+				_, f1_dig = self._hash_file(orig_filepath, f.hash_algorithm)
+				_, f2_dig = self._hash_file(filepath, f.hash_algorithm)
+				if f1_dig == f2_dig:
+					remove(filepath)
+		except FileNotFoundError as e:
+		 	print("File Not Found", filepath)
+		except HTTPError as e:
+			print("HTTP Error", e.code, e.reason, f.download_url)
+			if exists(filepath):
+				remove(filepath)
+		except HTTPException:
+			print("HTTP Exception for", f.download_url)
+			if exists(filepath):
+				remove(filepath)
+		except URLError as e:
+			print("URL Error for", f.download_url)
+			if exists(filepath):
+				remove(filepath)
+		except ConnectionResetError:
+			print("Connection reset for", f.download_url)
+			if exists(filepath):
+				remove(filepath)
+		except ConnectionRefusedError:
+			print("Connection refused for", f.download_url)
+			if exists(filepath):
+				remove(filepath)
+		except ConnectionAbortedError:
+			print("Connection aborted for", f.download_url)
+			if exists(filepath):
+				remove(filepath)
--- a/scrapthechan/scrapers/threadedscraper.py
+++ b/scrapthechan/scrapers/threadedscraper.py
@ -7,25 +7,26 @@ from multiprocessing.pool import ThreadPool
 from scrapthechan.scraper import Scraper
 from scrapthechan.fileinfo import FileInfo

+
 __all__ = ["ThreadedScraper"]

+
 class ThreadedScraper(Scraper):
-    def __init__(self, save_directory: str, files: List[FileInfo],
-        download_progress_callback: Callable[[int], None] = None) -> None:
-        super(ThreadedScraper, self).__init__(save_directory, files,
-            download_progress_callback)
-        self._files_downloaded = 0
-        self._files_downloaded_mutex = Lock()
+	def __init__(self, save_directory: str, files: List[FileInfo],
+		download_progress_callback: Callable[[int], None] = None) -> None:
+		super().__init__(save_directory, files, download_progress_callback)
+		self._files_downloaded = 0
+		self._files_downloaded_mutex = Lock()

-    def run(self):
-        pool = ThreadPool(cpu_count() * 2)
-        pool.map(self._thread_run, self._files)
-        pool.close()
-        pool.join()
+	def run(self):
+		pool = ThreadPool(cpu_count() * 2)
+		pool.map(self._thread_run, self._files)
+		pool.close()
+		pool.join()

-    def _thread_run(self, f: FileInfo):
-        with self._files_downloaded_mutex:
-            self._files_downloaded += 1
-            if not self._progress_callback is None:
-                self._progress_callback(self._files_downloaded)
-        self._download_file(f)
+	def _thread_run(self, f: FileInfo):
+		if not self._progress_callback is None:
+			with self._files_downloaded_mutex:
+				self._files_downloaded += 1
+				self._progress_callback(self._files_downloaded)
+		self._download_file(f)
--- a/setup.cfg
+++ b/setup.cfg
@ -1,7 +1,7 @@
 [metadata]
 name = scrapthechan
 version = attr: scrapthechan.__version__
-description = Scrap the files posted in a thread on an imageboard.
+description = Scrap the files from the imageboards.
 long_description = file: README.md
 long_description_content_type = text/markdown
 author = Alexander "Arav" Andreev
@ -14,18 +14,20 @@ keywords =
    2ch.hk
    lainchan.org
    8kun.top
-    lolifox.cc
 license = MIT
 license_file = COPYING
 classifiers =
-    Development Status :: 2 - Pre-Alpha
+    Development Status :: 3 - Alpha
    Environment :: Console
    Intended Audience :: End Users/Desktop
-    License :: Other/Proprietary License
+    License :: OSI Approved :: MIT License
    Natural Language :: English
    Operating System :: OS Independent
    Programming Language :: Python :: 3.7
-    Programming Language :: Python :: 3.8
+    Topic :: Communications :: BBS
+    Topic :: Internet :: WWW/HTTP
+    Topic :: Internet :: WWW/HTTP :: Dynamic Content :: Message Boards
+    Topic :: Text Processing
    Topic :: Utilities

 [options]
Author	SHA1	Message	Date
Alexander "Arav" Andreev	43909c2b29	Changelog updated with 0.5.1 changes.	2021-05-04 04:04:22 +04:00
Alexander "Arav" Andreev	acbfaefa9c	Version changed to 0.5.1 in a Makefile.	2021-05-04 03:58:46 +04:00
Alexander "Arav" Andreev	86ef44aa07	Version changed to 0.5.1.	2021-05-04 03:58:02 +04:00
Alexander "Arav" Andreev	419fb2b673	Removed excessive comparison of hash. Added message when file cannot be retrieved.	2021-05-04 03:56:59 +04:00
Alexander "Arav" Andreev	0287d3a132	Turned a string into f-string.	2021-05-04 03:55:32 +04:00
Alexander "Arav" Andreev	245e33f40d	README updated. lolifox.cc removed. Option --skip-posts added.	2021-05-03 02:45:41 +04:00
Alexander "Arav" Andreev	e092c905b2	Makefile updated to version 0.5.0.	2021-05-03 02:44:37 +04:00
Alexander "Arav" Andreev	90338073ed	Updated CHANGELOG with version 0.5.0.	2021-05-03 02:44:19 +04:00
Alexander "Arav" Andreev	cdcc184de8	Lolifox removed. Development Status classifier is changed to Alpha. Python 3.7 classifier left to represent oldest supported version.	2021-05-03 02:43:49 +04:00
Alexander "Arav" Andreev	b335891097	Copyright, date, and version are updated.	2021-05-03 02:41:32 +04:00
Alexander "Arav" Andreev	1213cef776	Lolifox removed. Added skip_posts handling.	2021-05-03 02:40:57 +04:00
Alexander "Arav" Andreev	78d4a62c17	IB parsers rewritten accordingly to fixed Parser class.	2021-05-03 02:40:21 +04:00
Alexander "Arav" Andreev	f3ef07af68	Rewrite of Parser class because it was fucked up. Now there's no problems with inheritance and its subclasses now more pleasant to write. ThreadNotFoundError now has a reason field.	2021-05-03 02:38:46 +04:00
Alexander "Arav" Andreev	6373518dc3	Added order=True for FIleInfo to make sure that order of fields is preserved.	2021-05-03 02:36:17 +04:00
Alexander "Arav" Andreev	caf18a1bf0	Added option --skip-posts and messages are now takes just one line.	2021-05-03 02:35:31 +04:00
Alexander "Arav" Andreev	751549f575	A new generalised class for all imageboards based on Tinyboard or having identical API.	2021-05-03 02:34:38 +04:00
Alexander "Arav" Andreev	38b5740d73	Removing lolifox.cc parser because this board is dead.	2021-05-03 02:33:52 +04:00
Alexander "Arav" Andreev	2f9d26427c	Now incrementing _files_downloaded happens when _progress_callback is set. And made super() with no args.	2021-05-03 02:33:14 +04:00
Alexander "Arav" Andreev	e7cf2e7c4b	Added a missing return True statement in _check_file	2021-05-03 02:30:31 +04:00
Alexander "Arav" Andreev	4f6f56ae7b	Version in a Makefile is changed to 0.4.1.	2021-04-28 02:50:38 +04:00
Alexander "Arav" Andreev	503eb9959b	Version updated to 0.4.1.	2021-04-28 02:49:59 +04:00
Alexander "Arav" Andreev	cb2e0d77f7	Changelog update for 0.4.1.	2021-04-28 02:49:26 +04:00
Alexander "Arav" Andreev	93e442939a	Dvach's stickers handling.	2021-04-28 02:48:36 +04:00
Alexander "Arav" Andreev	6022c9929a	Added HTTP and URL exceptions handling.	2021-04-28 02:47:41 +04:00
Alexander "Arav" Andreev	f79abcc310	In classifiers licence was fixed and added more topics related to a program.	2020-11-25 03:37:24 +04:00
Alexander "Arav" Andreev	9cdb510325	A little fix for README.	2020-11-25 03:36:31 +04:00
Alexander "Arav" Andreev	986fdbe7a7	Handling of no arguments passed.	2020-11-19 01:30:47 +04:00
Alexander "Arav" Andreev	2e6352cb13	Updated changelog.	2020-11-19 01:26:35 +04:00
Alexander "Arav" Andreev	7b2fcf0899	Improved error handling, retries for damaged files.	2020-11-19 01:26:19 +04:00
Alexander "Arav" Andreev	21837c5335	Updated changelog.	2020-11-19 00:09:56 +04:00
Alexander "Arav" Andreev	b970973018	ConnectionResetError handling.	2020-11-19 00:09:39 +04:00
Alexander "Arav" Andreev	6dab626084	Version is changed to 0.4.0.	2020-11-18 23:51:18 +04:00
Alexander "Arav" Andreev	86b6278657	Updated changelog and readme.	2020-11-18 23:50:58 +04:00
Alexander "Arav" Andreev	7754a90313	FileInfo is now a frozen dataclass for efficiency.	2020-11-18 23:48:38 +04:00
Alexander "Arav" Andreev	bb47b50c5f	_is_file_ok now is _check_file and modified to be more efficient. Also added check for if files happened to share same name and size, but IB said wrong hash.	2020-11-18 23:47:26 +04:00
Alexander "Arav" Andreev	8403fcf0f2	Now op file is explicitly in utf-8.	2020-11-18 23:45:06 +04:00
Alexander "Arav" Andreev	647a787974	FIxed arguments for a match function.	2020-11-18 23:44:36 +04:00
Alexander "Arav" Andreev	6a54b88498	sub and com ->subject and comment. Fixed arguments for match function.	2020-11-18 23:43:43 +04:00