Past Paper Crawler
(Python 3.7 Ver.)
Multitasking is implemented during the download. This allows super-fast download speed.
Updates you the progress constantly.
Allow you to cancel the download when the internet connection is poor.
Connection Broken? No worries.
Retry in one click.
The Swift Version.
Optimized for Mac OS.
Better GUI. Cache local files.
Made by Raymond Wu, SCIE.pro co-founder.
Responsible for the GUI section and paper filter.
Responsible for the multitasking download section.
Responsible for the crawler section.
Responsible for the GUI design.
The main (level 1) thread.
A friendly graphic user interface that allow you to filter the paper by your requirement and download all of them by one click!
There are more functions for you to explore.
- Imports DownloadModule and Crawler Module and Filter Module
- os — to join the file path
- wx — the GUI
Implemented as a level 2 thread. The level 2 thread does the monitoring. For example, it clears overtime tasks, and starts new tasks when available. It also communicates with the main thread through flags.
Tasks are generated as level 3 threads
- Threading — multitasking download
- SSL — overriding the certificate settings
- time — access system times, to clear overtime tasks
- urllib.request — to download the paper
- urllib.error — to catch errors
The website used is GCEguide.com.
Two functions are implemented. One to access subjects and the other to access papers.
- Requests – receive response from url
- BeautifulSoup (BS4) – extract information from the response
Used to check the year, season, paper number, and region number of each file by matching the name of the file with the general naming rule of past paper for filtering.
- re – match the correct format