Immediate Hire: Multipage Web Scraper / Web Crawler

Please login or register as jobseeker to apply for this job.

TYPE OF WORK

Gig

SALARY

TBD

HOURS PER WEEK

TBD

DATE UPDATED

May 7, 2025

JOB OVERVIEW

U.S. based company seeking an experienced programmer to develop a multipage web scraper that efficiently extracts data from a complex U.S. government website. The scraper should also have the capability to utilize the scraped data to interact with another related government site.

The job requires scraping 4-5 websites that are interconnected with one another. Two of the five websites are U.S. government websites. At the start of the scraper, we need the ability to manually solve a captcha, pick a document type and add a date range.

The ideal candidate will have a strong understanding of web scraping techniques and experience in handling challenging data structures. This project requires attention to detail and the ability to work with dynamic content.

Before you get hired for this job, we will require you to show that you can scrape the government website on an elementary level.

VIEW OTHER JOB POSTS FROM:
SHARE THIS POST
facebook linkedin
  BENCHMARKS  
Loading Time: Base Classes  0.0010
Controller Execution Time ( Jobseekers / Job )  0.0133
Total Execution Time  0.0159
  GET DATA  
No GET data exists
  MEMORY USAGE  
1,495,752 bytes
  POST DATA  
No POST data exists
  URI STRING  
jobseekers/job/Immediate-Hire-Multipage-Web-Scraper-Web-Crawler-1371789
  CLASS/METHOD  
jobseekers/job
  DATABASE:  onlinejobs (Jobseekers:$db)   QUERIES: 13 (0.0061 seconds)  (Hide)
0.0004   SELECT *
                                
FROM exrates
                                WHERE rate_name 
'USD-PHP' 
0.0003   SELECT *
FROM `employer_jobs`
WHERE `job_id` = 1371789
 LIMIT 1 
0.0003   SELECT *
FROM `employers`
WHERE `employer_id` = 697784
 LIMIT 1 
0.0005   SELECT COUNT(*) AS `numrows`
FROM `t_thread` `t`
LEFT JOIN `t_thread_misc` `miscON `t`.`id` = `misc`.`thread_id`
WHERE `t`.`job_id` = 1371789
AND `misc`.`idIS NULL 
0.0005   SELECT e.business_namee.logoe.websitee.rebill_datee.date_added member_datehitsDATEDIFF('2026-04-22',ej.date_added) duration_daysDATEDIFF('2026-04-22',e.rebill_date) duration_rebillej.*, e.deactivate FROM employers eemployer_jobs ej WHERE e.employer_id ej.employer_id AND
                                   ((
e.user_level >= '500' AND ej.date_added <= e.rebill_date)
                                   OR 
e.employer_id '' OR (ej.date_approved <> '2000-01-01' and DATEDIFF('2026-04-22',ej.date_added) <= 14 ))
                                   AND 
e.deactivate != AND ej.deleted AND job_id '1371789' 
0.0003   SELECT *
FROM `employer_jobs_skills` `ejs`
LEFT JOIN `skills_categories` `scON `ejs`.`skill_id` = `sc`.`id`
WHERE `job_id` = 1371789 
0.0014   UPDATE employer_jobs SET hit_counts '***May-07-2025=107***May-08-2025=28***May-09-2025=11***May-10-2025=1***May-11-2025=11***May-12-2025=3***May-14-2025=6***May-15-2025=5***May-16-2025=7***May-17-2025=1***May-18-2025=5***May-19-2025=6***May-20-2025=9***May-21-2025=4***May-22-2025=9***May-23-2025=7***May-24-2025=3***May-25-2025=1***May-26-2025=7***May-27-2025=3***May-28-2025=3***May-29-2025=6***May-30-2025=4***May-31-2025=1***Jun-01-2025=1***Jun-02-2025=3***Jun-03-2025=5***Jun-04-2025=6***Jun-05-2025=5***Jun-06-2025=6***Jun-07-2025=6***Jun-08-2025=3***Jun-09-2025=3***Jun-10-2025=1***Jun-11-2025=4***Jun-12-2025=7***Jun-13-2025=2***Jun-14-2025=1***Jun-15-2025=1***Jun-16-2025=2***Jun-17-2025=3***Jun-18-2025=1***Jun-19-2025=2***Jun-20-2025=2***Jun-21-2025=3***Jun-22-2025=2***Jun-23-2025=2***Jun-25-2025=4***Jun-26-2025=2***Jun-27-2025=5***Jun-28-2025=2***Jun-29-2025=2***Jun-30-2025=3***Jul-01-2025=2***Jul-02-2025=4***Jul-03-2025=4***Jul-04-2025=1***Jul-05-2025=1***Jul-08-2025=1***Jul-09-2025=4***Jul-10-2025=2***Jul-11-2025=2***Jul-12-2025=1***Jul-13-2025=1***Jul-15-2025=4***Jul-16-2025=5***Jul-17-2025=3***Jul-18-2025=4***Jul-19-2025=3***Jul-20-2025=2***Jul-21-2025=3***Jul-22-2025=3***Jul-23-2025=6***Jul-25-2025=2***Jul-26-2025=5***Jul-27-2025=3***Jul-29-2025=2***Jul-30-2025=2***Jul-31-2025=2***Aug-01-2025=2***Aug-02-2025=1***Aug-03-2025=5***Aug-04-2025=8***Aug-05-2025=1***Aug-06-2025=2***Aug-07-2025=3***Aug-12-2025=2***Aug-13-2025=1***Aug-14-2025=3***Aug-15-2025=3***Aug-16-2025=2***Aug-18-2025=4***Aug-19-2025=2***Aug-20-2025=1***Aug-22-2025=1***Aug-23-2025=2***Aug-24-2025=2***Aug-26-2025=1***Aug-27-2025=4***Aug-28-2025=2***Aug-30-2025=3***Aug-31-2025=2***Sep-02-2025=1***Sep-03-2025=3***Sep-04-2025=1***Sep-06-2025=1***Sep-07-2025=3***Sep-08-2025=1***Sep-09-2025=1***Sep-10-2025=1***Sep-11-2025=1***Sep-12-2025=1***Sep-13-2025=4***Sep-14-2025=1***Sep-15-2025=3***Sep-16-2025=4***Sep-18-2025=2***Sep-20-2025=2***Sep-21-2025=2***Sep-22-2025=1***Sep-23-2025=2***Sep-24-2025=3***Sep-25-2025=2***Sep-28-2025=1***Oct-01-2025=1***Oct-02-2025=1***Oct-03-2025=1***Oct-04-2025=3***Oct-05-2025=1***Oct-09-2025=1***Oct-11-2025=1***Oct-14-2025=1***Oct-16-2025=1***Oct-17-2025=1***Oct-19-2025=1***Oct-20-2025=1***Oct-21-2025=2***Oct-22-2025=1***Oct-23-2025=1***Oct-25-2025=6***Oct-26-2025=3***Oct-27-2025=2***Oct-28-2025=3***Oct-29-2025=1***Nov-01-2025=1***Nov-03-2025=2***Nov-04-2025=1***Nov-05-2025=2***Nov-06-2025=1***Nov-08-2025=2***Nov-10-2025=1***Nov-11-2025=2***Nov-13-2025=3***Nov-14-2025=2***Nov-15-2025=2***Nov-16-2025=1***Nov-17-2025=2***Nov-19-2025=2***Nov-22-2025=1***Nov-24-2025=1***Nov-28-2025=1***Dec-04-2025=1***Dec-05-2025=4***Dec-07-2025=2***Dec-08-2025=1***Dec-11-2025=1***Dec-12-2025=1***Dec-14-2025=1***Dec-16-2025=1***Dec-17-2025=5***Dec-18-2025=1***Dec-19-2025=4***Dec-20-2025=3***Dec-21-2025=2***Dec-22-2025=4***Dec-25-2025=3***Dec-26-2025=4***Dec-28-2025=1***Dec-29-2025=1***Dec-30-2025=1***Jan-02-2026=3***Jan-03-2026=1***Jan-04-2026=2***Jan-06-2026=3***Jan-07-2026=1***Jan-08-2026=2***Jan-09-2026=1***Jan-10-2026=1***Jan-12-2026=1***Jan-16-2026=1***Jan-18-2026=2***Jan-21-2026=1***Jan-31-2026=2***Feb-01-2026=1***Feb-02-2026=2***Feb-06-2026=2***Feb-13-2026=1***Feb-16-2026=1***Feb-19-2026=1***Feb-26-2026=1***Feb-27-2026=1***Mar-05-2026=1***Mar-07-2026=2***Mar-08-2026=1***Mar-10-2026=1***Mar-11-2026=1***Mar-13-2026=1***Mar-14-2026=3***Mar-16-2026=1***Mar-17-2026=1***Mar-22-2026=1***Mar-23-2026=2***Mar-24-2026=2***Mar-25-2026=1***Mar-26-2026=2***Mar-27-2026=2***Mar-29-2026=1***Apr-01-2026=2***Apr-02-2026=1***Apr-03-2026=1***Apr-07-2026=1***Apr-11-2026=3***Apr-15-2026=1***Apr-22-2026=2' WHERE job_id'1371789'  
0.0006   UPDATE employer_jobs SET monthly_hits '***May-2025=248***Jun-2025=89***Jul-2025=72***Aug-2025=57***Sep-2025=41***Oct-2025=33***Nov-2025=27***Dec-2025=41***Jan-2026=21***Feb-2026=10***Mar-2026=23***Apr-2026=11' WHERE job_id'1371789'  
0.0003   SELECT date_sent FROM jobseeker_sent_emails WHERE jobseeker_id '' AND job_id '1371789' AND status LIKE 'sent%' ORDER BY id DESC  
0.0002   SELECT *
FROM `employer_jobs_skills` `ejs`
LEFT JOIN `skills_categories` `scON `ejs`.`skill_id` = `sc`.`id`
WHERE `job_id` = 1371789 
0.0003   SELECT COUNT(*) AS `numrows`
FROM `employer_jobs`
WHERE `employer_id` = '697784'
AND `date_added` >= '2022-06-08' 
0.0004   select from teasers 
0.0005   SELECT FROM skill_categories WHERE skill_cat_id='' 
  HTTP HEADERS  (Show)
  SESSION DATA  (Show)
  CONFIG VARIABLES  (Show)