Python Automation Engineer Needed – Web Scraping + Data Extraction + Spreadsheet + Basecamp Calendar Integration (MVP 2–

Please login or register as jobseeker to apply for this job.

: TYPE OF WORK

Gig

: WAGE / SALARY

150-300

: HOURS PER WEEK

TBD

: DATE UPDATED

May 31, 2026

JOB OVERVIEW

Good—this changes the project from “scraping job” into a **full automation pipeline with scheduling + external system integration (Basecamp + spreadsheet sync)**. That needs to be explicitly stated or you’ll get someone who only builds scrapers and stops there.

Here’s your updated job description with that requirement integrated cleanly.

---

# ???? Job Title

**Python Automation Engineer Needed – Web Scraping + Data Extraction + Spreadsheet + Basecamp Calendar Integration (MVP 2–4 Days)**

---

# ???? Project Overview

Looking for a Python developer to build an automated system that:

1. Scrapes structured data from multiple websites (HTML, JS-rendered, and PDF sources)
2. Extracts key information from legal notices / listings
3. Stores the data in a structured spreadsheet
4. Automatically creates scheduled events in Basecamp calendar

This is a **fast MVP build (2–4 days)** focused on working automation, not enterprise-level architecture.

---

# ???? Core Objective

Build an end-to-end automation pipeline that:

* Collects data from multiple web sources
* Converts unstructured legal notice content into structured records
* Writes cleaned data into a spreadsheet (Google Sheets or CSV)
* Pushes relevant entries into Basecamp as calendar events or to-dos
* Runs automatically on a daily schedule

---

# ?? Required Features

## 1. Web Scraping Engine

* Python-based implementation
* Must use **Playwright** for JavaScript-heavy sites
* Support:

* Login/session handling where required
* Pagination and infinite scroll
* Modular per-source scraper design

---

## 2. Data Extraction Layer

From each notice/listing, extract where available:

* Title / notice type
* Property or listing name
* Address or location
* Auction or event date/time
* Case or reference number
* Parties involved (if present)
* Source URL
* Raw text backup

---

## 3. Data Normalization + Spreadsheet Output

* Convert all scraped data into a unified schema
* Write to:

* Google Sheets (preferred) OR
* CSV file

Spreadsheet must include structured rows per record.

---

## 4. Basecamp Integration (IMPORTANT)

For each valid record:

* Create a **Basecamp calendar event or to-do item**
* Include:

* Title (cleaned notice title or property name)
* Date/time (auction or event date)
* Description (key extracted details)
* Link back to source

Must use **Basecamp API** (authentication via token)

---

## 5. PDF Parsing Support

* Extract data from PDF notices
* Convert unstructured text into structured rows
* Tools: pdfplumber or PyMuPDF

---

## 6. Deduplication

* Prevent duplicate entries across all sources
* Use hash/composite key (address + date + case number or similar)

---

## 7. Automation / Scheduling

* System must run daily automatically
* Can use:

* cron job OR
* Python scheduler script

---

# ???? Optional Enhancements (if time allows)

* LLM-based extraction to clean legal text into structured fields
* Logging system for failures/retries
* Docker containerization
* Simple admin config file for adding new sources

---

# ???? Tech Stack

Required:

* Python
* Playwright
* BeautifulSoup
* pdfplumber / PyMuPDF
* Pandas

Integration:

* Google Sheets API (or CSV fallback)
* Basecamp API (mandatory)

Optional:

* Cron / scheduling tools

---

# ? Timeline

This is a **fast MVP build (2–4 days)**:

* Day 1: Core scraping framework + 1–2 sources
* Day 2: Expand sources + spreadsheet output
* Day 3: Basecamp integration + PDF parsing
* Day 4: Testing, cleanup, automation

---

# ???? Key Challenges

* Mixed data formats (HTML, JS, PDFs)
* Some sites may require authentication
* Extracting consistent structured data from legal text
* Reliable Basecamp API event creation
* Avoiding duplicates across multiple sources

---

# ???? Deliverables

* Working Python project
* Modular scraping system
* Spreadsheet integration (Google Sheets or CSV)
* Basecamp automation (calendar/tasks creation)
* PDF parsing module
* Setup instructions

---

# ???? Ideal Candidate

* Strong Python automation experience
* Expert in Playwright scraping
* Experience with APIs (especially Basecamp or similar project tools)
* Comfortable handling messy/unstructured data
* Able to deliver quickly with minimal supervision

---

# ???? Budget Guidance (global contractors)

* MVP range: **$ ---------- total**
* Higher end only if:

* Basecamp integration is fully working
* Multiple dynamic sources are stable
* PDF extraction is reliable

---

# ???? Application Requirements

Applicants must include:

* Relevant scraping + automation experience
* API integration examples (especially task/calendar systems)
* Confirmation of 2–4 day delivery capability
* Brief approach to handling scraping + scheduling pipeline

SKILL REQUIREMENT

HTML Python JSON

SHARE THIS POST

Job Post	Reporter	Rank	Reason	Date

BENCHMARKS

Loading Time: Base Classes	0.0008
Controller Execution Time ( Jobseekers / Job )	0.0496
Total Execution Time	0.0510

GET DATA

No GET data exists

MEMORY USAGE

1,520,336 bytes

POST DATA

No POST data exists

URI STRING

jobseekers/job/Python-Automation-Engineer-Needed-Web-Scraping-Data-Extraction-Spreadsheet-Basecamp-Calendar-Integration-MVP-2-1658584

CLASS/METHOD

jobseekers/job

DATABASE: onlinejobs (Jobseekers:$db) QUERIES: 13 (0.0395 seconds) (Hide)

0.0003	`SELECT * FROM exrates WHERE rate_name = 'USD-PHP'`
0.0004	SELECT * FROM `employer_jobs` WHERE `job_id` = 1658584 LIMIT 1
0.0009	SELECT * FROM `employers` WHERE `employer_id` = 661598 LIMIT 1
0.0243	SELECT COUNT(DISTINCT t.id) as cnt FROM `t_thread` `t` INNER JOIN `t_message` `m` ON `t`.`id` = `m`.`thread_id` INNER JOIN `t_message_employer` `e` ON `m`.`id` = `e`.`message_id` LEFT JOIN `t_thread_misc` `misc` ON `t`.`id` = `misc`.`thread_id` WHERE `t`.`job_id` = 1658584 AND `misc`.`id` IS NULL
0.0006	SELECT e.business_name, e.logo, e.website, e.rebill_date, e.date_added member_date, hits, DATEDIFF('2026-06-21',ej.date_added) duration_days, DATEDIFF('2026-06-21',e.rebill_date) duration_rebill, ej.*, e.deactivate FROM employers e, employer_jobs ej WHERE e.employer_id = ej.employer_id AND ((e.user_level >= '500' AND ej.date_added <= e.rebill_date) OR e.employer_id = '' OR (ej.date_approved <> '2000-01-01' and DATEDIFF('2026-06-21',ej.date_added) <= 14 )) AND e.deactivate != 1 AND ej.deleted = 0 AND job_id = '1658584'
0.0008	SELECT * FROM `employer_jobs_skills` `ejs` LEFT JOIN `skills_categories` `sc` ON `ejs`.`skill_id` = `sc`.`id` WHERE `job_id` = 1658584
0.0019	`UPDATE employer_jobs SET hit_counts = '*May-31-2026=985Jun-01-2026=89Jun-02-2026=42Jun-03-2026=19Jun-04-2026=15Jun-05-2026=11Jun-06-2026=8Jun-07-2026=17Jun-08-2026=9Jun-09-2026=13Jun-10-2026=13Jun-11-2026=9Jun-12-2026=10Jun-13-2026=9Jun-14-2026=6Jun-15-2026=6Jun-16-2026=7Jun-17-2026=7Jun-18-2026=4Jun-19-2026=5Jun-20-2026=5*Jun-21-2026=1' WHERE job_id= '1658584'`
0.0007	`UPDATE employer_jobs SET monthly_hits = '*May-2026=967*Jun-2026=305' WHERE job_id= '1658584'`
0.0009	`SELECT date_sent FROM jobseeker_sent_emails WHERE jobseeker_id = '' AND job_id = '1658584' AND status LIKE 'sent%' ORDER BY id DESC`
0.0003	SELECT * FROM `employer_jobs_skills` `ejs` LEFT JOIN `skills_categories` `sc` ON `ejs`.`skill_id` = `sc`.`id` WHERE `job_id` = 1658584
0.0079	SELECT COUNT(*) AS `numrows` FROM `employer_jobs` WHERE `employer_id` = '661598' AND `date_added` >= '2022-06-08'
0.0003	`select * from teasers`
0.0003	`SELECT * FROM skill_categories WHERE skill_cat_id=''`

HTTP HEADERS (Show)

SESSION DATA (Show)

__ci_last_regenerate	1782043655
last_page	https://v1.stage.onlinejobs.ph/jobseekers/job/Python-Automation-Engineer-Needed-Web-Scraping-Data-Extraction-Spreadsheet-Basecamp-Calendar-Integration-MVP-2-1658584
csrf-token	06c2cdf832396a2e55323a1727481d29

CONFIG VARIABLES (Show)

base_url	https://v1.stage.onlinejobs.ph
log_threshold	2
enable_profiler
index_page
uri_protocol	PATH_INFO
url_suffix
language	english
charset	UTF-8
enable_hooks	1
subclass_prefix	MY_
composer_autoload	vendor/autoload.php
permitted_uri_chars	a-z 0-9~%.:_\=\-\+\@?\&
enable_query_strings
controller_trigger	c
function_trigger	m
directory_trigger	d
allow_get_array	1
log_path
log_file_extension
log_file_permissions	420
log_date_format	Y-m-d H:i:s
error_views_path
cache_path
cache_query_string
encryption_key	OdBUArjiWg9I7u7bvAwQ7Fu35VB1kzga
sess_driver	files
sess_cookie_name	ci_session
sess_expiration	2678400
sess_save_path
sess_match_ip
sess_time_to_update	300
sess_regenerate_destroy
cookie_prefix
cookie_domain
cookie_path	/
cookie_secure
cookie_httponly
standardize_newlines
global_xss_filtering	1
csrf_protection
csrf_token_name	csrf_test_name
csrf_cookie_name	csrf_cookie_name
csrf_expire	7200
csrf_regenerate	1
csrf_exclude_uris	Array ( )
compress_output
time_reference	local
rewrite_short_tags
proxy_ips
ojadmin	JpjZkQN5A8l@^L
salt	+UFjSAT49tPZLtmU2CIG2FYN7pRhgsWyLHgSyQa6k3I=
queue_enabled
twilio	Array ( [sid] => AC00c0594045c6eef9407e8fff01f3d467 [token] => f0bfc0b73444a077894a43f5f75e6d41 [length] => 6 [from] => +639221200200 [code_expiry_seconds] => 14400 )
maintenance	Array ( [admin] => Array ( [verification] => ) [announcement_bar] => Array ( [show_slow_issue_message] => ) )
trustpilot	Array ( [to_email] => onlinejobs.ph+3d844ebf71@invite.trustpilot.com )
services	Array ( [google_tag_manager] => Array ( [id] => GTM-T5CQMS6P ) [chatgpt] => Array ( [secret] => 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 ) [claude] => Array ( [secret] => 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 ) )
v2_url	https://v2.onlinejobs.ph
v2	Array ( [enabled] => 1 [url] => https://v2.stage.app.onlinejobs.ph [api_url] => https://v2.stage.api.onlinejobs.ph [cookie_domain] => .onlinejobs.ph [paths] => Array ( [myaccount] => 1 ) )
replacemyself_secret_key	Kk1WpgTMkc4wBQOqC5rqCssdLhACKrsJeTtO1ywUkT4=
app	Array ( [app] => Array ( [command] => Array ( [map] => Array ( [OnlineJobs\Mailer\Command\SendEmailCommand] => OnlineJobs\Mailer\Handler\SendEmailHandler [OnlineJobs\Mail\Command\SendEmailCommand] => OnlineJobs\Mail\Handler\SendEmailHandler [OnlineJobs\Mail\Command\QueueEmailPendingCommand] => OnlineJobs\Mail\Handler\QueueEmailPendingHandler [OnlineJobs\Mail\Command\QueueJobEmailPendingCommand] => OnlineJobs\Mail\Handler\QueueJobEmailPendingHandler [OnlineJobs\Mail\Command\QueueUnapprovedEmailCommand] => OnlineJobs\Mail\Handler\QueueUnapprovedEmailHandler [OnlineJobs\Mail\Command\ProcessEmailQueueCommand] => OnlineJobs\Mail\Handler\ProcessEmailQueueHandler [OnlineJobs\Mail\Command\PutJobseekerEmailOnHoldCommand] => OnlineJobs\Mail\Handler\PutJobseekerEmailOnHoldHandler [OnlineJobs\Purchase\Command\RealEstateVaCourseCommand] => OnlineJobs\Purchase\Handler\RealEstateVaCourseHandler [OnlineJobs\Test\Command\HelloCommand] => OnlineJobs\Test\Handler\HelloHandler ) [middleware] => Array ( ) ) [providers] => Array ( [0] => OnlineJobs\Bus\BusServiceProvider [1] => OnlineJobs\Database\DbServiceProvider [2] => OnlineJobs\Mailer\MailerServiceProvider [3] => OnlineJobs\Support\SupportServiceProvider [4] => OnlineJobs\Employer\ServiceProvider [5] => OnlineJobs\Jobseeker\ServiceProvider [6] => OnlineJobs\Mail\ServiceProvider [7] => OnlineJobs\Postal\ServiceProvider [8] => OnlineJobs\Queue\ServiceProvider [9] => OnlineJobs\Courier\ServiceProvider [10] => OnlineJobs\Job\ServiceProvider [11] => OnlineJobs\Purchase\ServiceProvider ) ) )
meta	Array ( [facebook] => Array ( [input] => 1 [connect] => 2 [timezone] => 3 [url] => 30 [id] => 31 [photo] => 32 [name] => 33 [email] => 34 [review] => 35 ) [trust] => Array ( [government_id] => 4 [utility_bill] => 5 [selfie_photo] => 6 [profile_picture] => 7 [phone_number] => 8 [facebook] => 9 ) [verify] => Array ( [profile_picture] => 10 [government_id] => 11 [selfie_photo] => 12 [utility_bill] => 13 [address] => 14 [name] => 15 [reviewed] => 16 [government_group] => 28 [address_group] => 29 [mobile] => 38 ) [address] => Array ( [complete] => 17 [room_unit_no] => 20 [house_no] => 21 [street] => 22 [subdivision] => 23 [barangay] => 24 [city] => 25 [province] => 26 [postcode] => 27 ) [reviewed] => Array ( [similar_names] => 1 ) [name] => 18 [phone] => 19 [sms_verification_code] => 36 [resend_verification_code] => 37 [address_read] => 39 [first_name] => 100 [middle_name] => 101 [last_name] => 102 [edit] => Array ( [government_id_group] => 40 [address_group] => 41 [mobile_group] => 42 ) [salary_re_entry] => 103 [id_proof_recalibrate] => 104 [email] => 44 [mobile_verification_type] => 45 )
honeypot	Array ( [name_field] => my_name [enabled] => 1 )

HTTP_ACCEPT	/
HTTP_USER_AGENT	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
HTTP_CONNECTION	Upgrade
SERVER_PORT	80
SERVER_NAME	v1.stage.onlinejobs.ph
REMOTE_ADDR	127.0.0.1
SERVER_SOFTWARE	Apache/2.4.58 (Ubuntu)
HTTP_ACCEPT_LANGUAGE
SCRIPT_NAME	/index.php
REQUEST_METHOD	GET
HTTP_HOST
REMOTE_HOST
CONTENT_TYPE
SERVER_PROTOCOL	HTTP/1.1
QUERY_STRING
HTTP_ACCEPT_ENCODING	gzip, br, zstd, deflate
HTTP_X_FORWARDED_FOR	216.73.217.68
HTTP_DNT

Python Automation Engineer Needed – Web Scraping + Data Extraction + Spreadsheet + Basecamp Calendar Integration (MVP 2–

Please login or register as jobseeker to apply for this job.

TYPE OF WORK

WAGE / SALARY

HOURS PER WEEK

DATE UPDATED

Why is this blurred?