Past Paper Crawler

Past Paper Crawler
(Python 3.7 Ver.)

Features

Bulk Download

Multitasking is implemented during the download. This allows super-fast download speed.

Progress Bar

Updates you the progress constantly.
Allow you to cancel the download when the internet connection is poor.

Failed Retry

Connection Broken? No worries.
Retry in one click.

Download

Current build: 1.2


For Mac


For Windows

See Also

The Swift Version.
(MacOS Only)

Optimized for Mac OS.

Better GUI. Cache local files.

Made by Raymond Wu, SCIE.pro co-founder.


Learn More

Development Team

Teresa

Team Leader.

Responsible for the GUI section and paper filter.

John

Responsible for the multitasking download section.

Ethan

Responsible for the crawler section.

Peter

Responsible for the GUI design.

Technical Details

GUI

The main (level 1) thread.

A friendly graphic user interface that allow you to filter the paper by your requirement and download all of them by one click!

There are more functions for you to explore.

Modules Used:

  • Imports DownloadModule and Crawler Module and Filter Module
  • os — to join the file path
  • wx — the GUI

Download Module

Implemented as a level 2 thread. The level 2 thread does the monitoring. For example, it clears overtime tasks, and starts new tasks when available. It also communicates with the main thread through flags.

Tasks are generated as level 3 threads

    Modules Used

    • Threading — multitasking download
    • SSL — overriding the certificate settings
    • time — access system times, to clear overtime tasks
    • urllib.request — to download the paper
    • urllib.error — to catch errors

    Crawler Module

    The website used is GCEguide.com.

    Two functions are implemented. One to access subjects and the other to access papers.

      Modules Used

      • Requests – receive response from url
      • BeautifulSoup (BS4) – extract information from the response

      Filter Module

      Used to check the year, season, paper number, and region number of each file by matching the name of the file with the general naming rule of past paper for filtering.

        Modules Used:

        • re – match the correct format
        Great! You've successfully subscribed.
        Great! Next, complete checkout for full access.
        Welcome back! You've successfully signed in.
        Success! Your account is fully activated, you now have access to all content.