Past Paper Crawler
(Python 3.7 Ver.)
Features

Bulk Download
Multitasking is implemented during the download. This allows super-fast download speed.

Progress Bar
Updates you the progress constantly.
Allow you to cancel the download when the internet connection is poor.

Failed Retry
Connection Broken? No worries.
Retry in one click.
Download
Current build: 1.2
For Mac
For Windows
See Also
The Swift Version.
(MacOS Only)
Optimized for Mac OS.
Better GUI. Cache local files.
Made by Raymond Wu, SCIE.pro co-founder.
Development Team
Teresa
Team Leader.
Responsible for the GUI section and paper filter.
John
Responsible for the multitasking download section.
Ethan
Responsible for the crawler section.
Peter
Responsible for the GUI design.
Technical Details
GUI
The main (level 1) thread.
A friendly graphic user interface that allow you to filter the paper by your requirement and download all of them by one click!
There are more functions for you to explore.
Modules Used:
- Imports DownloadModule and Crawler Module and Filter Module
- os — to join the file path
- wx — the GUI
Download Module
Implemented as a level 2 thread. The level 2 thread does the monitoring. For example, it clears overtime tasks, and starts new tasks when available. It also communicates with the main thread through flags.
Tasks are generated as level 3 threads
Modules Used
- Threading — multitasking download
- SSL — overriding the certificate settings
- time — access system times, to clear overtime tasks
- urllib.request — to download the paper
- urllib.error — to catch errors
Crawler Module
The website used is GCEguide.com.
Two functions are implemented. One to access subjects and the other to access papers.
Modules Used
- Requests – receive response from url
- BeautifulSoup (BS4) – extract information from the response
Filter Module
Used to check the year, season, paper number, and region number of each file by matching the name of the file with the general naming rule of past paper for filtering.
Modules Used:
- re – match the correct format