The latest version of WebHarvy Web Scraper supports 2 new types of pagination styles for scraping data from multiple pages of websites.
Pages where pagination links are shown in sets
In these types of pages the pagination links are provided in sets. For example the first 5 pages will have direct links to load each of them at the bottom of the page. To load pages 6 to 10, an additional link should be clicked. Now each of the pages 6 to 10 will have direct links to load any of them at their page end, and also a link to load the next set of 5 pages.
WebHarvy Online Help : Scraping pages where pagination links are displayed in sets
The following video demonstrates how these types of pages can be configured and mined using WebHarvy.
When each page URL contains the page number
Suppose the pages from which you need to scrape multiple listings of data have the following format.
http://www.example.com/search/listing?keywords&pageNumber=1
http://www.example.com/search/listing?keywords&pageNumber=2
http://www.example.com/search/listing?keywords&pageNumber=3
http://www.example.com/search/listing?keywords&pageNumber=4
etc..
Pagination in this case can be handled easily by following the method below :-
1. Open WebHarvy and load http://www.example.com/search/listing?keywords&pageNumber=1.
2. Start Config
3. Select required data from the page, Follow links and select data if required.
4. Select Edit menu > Edit Options > Add/Remove URLs from Configuration
5. Paste the following URL and Apply.
http://www.example.com/search/listing?keywords&pageNumber=%%pagenumber%%
Note that the actual page number is replaced by %%pagenumber%% in the above string.
6. Stop Config
7. Start Mine. You should specify the number of pages to mine since ‘Mine all pages’ option will be disabled. WebHarvy will automatically find and load the next pages and extract data.
WebHarvy Online Help : URL page-number based auto pagination
The latest version of WebHarvy Visual Web Scraper can be downloaded from https://www.webharvy.com/download.html. Try and in case you need any assistance please do not hesitate to contact our support team.
