# An opinionated web-scraping library for PHP
by Peter Thaleikis (opens new window)
Accessing the web from PHP can done easier. This is an opinionated wrapper around some great libraries.
The examples tell the story much better. Have a look!
# The Idea 💡️
Accessing websites and collecting basic information of the web is too complex. This wrapper around Goutte (opens new window) makes it easier. It saves you from XPath and co., giving you direct access to everything you need.
# Supporters 💪️
This project is sponsored by:
Want to sponsor this project? Contact me (opens new window).
# Examples
Here are some examples of what the library can do at this point:
# Scrape Meta Information:
$web = new \spekulatius\phpscraper();
/**
* Navigate to the test page. It contains:
*
* <meta name="author" content="Lorem ipsum" />
* <meta name="keywords" content="Lorem,ipsum,dolor" />
* <meta name="description" content="Lorem ipsum dolor etc." />
* <meta name="image" content="https://test-pages.phpscraper.de/assets/cat.jpg" />
*/
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');
// Get the information:
echo $web->author; // "Lorem ipsum"
echo $web->description; // "Lorem ipsum dolor etc."
echo $web->image; // "https://test-pages.phpscraper.de/assets/cat.jpg"
Most other information can be accessed directly - either as string or an array.
# Scrape Content, such as Images:
$web = new \spekulatius\phpscraper();
/**
* Navigate to the test page. This page contains two images:
*
* <img src="https://test-pages.phpscraper.de/assets/cat.jpg" alt="absolute path">
* <img src="/assets/cat.jpg" alt="relative path">
*/
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');
var_dump($web->imagesWithDetails);
/**
* [
* 'url' => 'https://test-pages.phpscraper.de/assets/cat.jpg',
* 'alt' => 'absolute path',
* 'width' => null,
* 'height' => null,
* ],
* [
* 'url' => 'https://test-pages.phpscraper.de/assets/cat.jpg',
* 'alt' => 'relative path',
* 'width' => null,
* 'height' => null,
* ]
*/
Some information optionally is returned as an array with details. For this example, a simple list of images is available using $web->images
too.
More example code can be found in the sidebar or the tests.
# Installation
As usual, done via composer:
composer require spekulatius/phpscraper
This automatically ensures the package is loaded. You can now use any of the above noted examples.
# Contributing
Awesome, if you would like contribute please check the guidelines before getting started.
# Tests
The code is roughly covered with end-to-end tests. For this, simple web-pages are hosted under https://test-pages.phpscraper.de/, loaded and parsed using PHPUnit (opens new window). These tests are also suitable as examples - see tests/
!
This being said, there are probably edge cases which aren't working and may cause trouble. If you find one, please raise a bug on GitHub.