Past Paper Crawler

(Python 3.7 Ver.)


Bulk Download

Multitasking is implemented during the download. This allows super-fast download speed.

Progress Bar

Updates you the progress constantly.
Allow you to cancel the download when the internet connection is poor.

Failed Retry

Connection Broken? No worries.
Retry in one click.


Current build: 1.2

For Mac

For Windows

See Also

The Swift Version.
(MacOS Only)

Optimized for Mac OS.

Better GUI. Cache local files.

Made by Raymond Wu, co-founder.

Learn More

Development Team


Team Leader.

Responsible for the GUI section and paper filter.


Responsible for the multitasking download section.


Responsible for the crawler section.


Responsible for the GUI design.

Technical Details


The main (level 1) thread.

A friendly graphic user interface that allow you to filter the paper by your requirement and download all of them by one click!

There are more functions for you to explore.

Modules Used:

  • Imports DownloadModule and Crawler Module and Filter Module
  • os — to join the file path
  • wx — the GUI

Download Module

Implemented as a level 2 thread. The level 2 thread does the monitoring. For example, it clears overtime tasks, and starts new tasks when available. It also communicates with the main thread through flags.

Tasks are generated as level 3 threads

    Modules Used

    • Threading — multitasking download
    • SSL — overriding the certificate settings
    • time — access system times, to clear overtime tasks
    • urllib.request — to download the paper
    • urllib.error — to catch errors

    Crawler Module

    The website used is

    Two functions are implemented. One to access subjects and the other to access papers.

      Modules Used

      • Requests – receive response from url
      • BeautifulSoup (BS4) – extract information from the response

      Filter Module

      Used to check the year, season, paper number, and region number of each file by matching the name of the file with the general naming rule of past paper for filtering.

        Modules Used:

        • re – match the correct format
