# An opinionated web-access library for PHP

by Peter Thaleikis

Accessing the web from PHP can done easier. This is an opinionated wrapper around some great libraries.

The examples tell the story much better. Have a look!

# Sponsors

This project is sponsored by:

Want to sponsor this project? Contact me.

# Idea

Access websites and collecting basic information of the web is too complex. This wrapper around Goutte makes it easier. It saves you from XPath and co., giving you direct access to everything you need.

# Examples

Here are some examples of what the library can do at this point:

# Scrape Meta Information:

$web = new \spekulatius\phpscraper();

/**
 * Navigate to the test page. It contains:
 *
 * <meta name="author" content="Lorem ipsum" />
 * <meta name="keywords" content="Lorem,ipsum,dolor" />
 * <meta name="description" content="Lorem ipsum dolor etc." />
 * <meta name="image" content="https://test-pages.phpscraper.de/assets/cat.jpg" />
 */
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');

// Get the information:
echo $web->author;          // "Lorem ipsum"
echo $web->description;     // "Lorem ipsum dolor etc."
echo $web->image;           // "https://test-pages.phpscraper.de/assets/cat.jpg"

Most other information can be accessed directly - either as string or an array.

# Scrape Content, such as Images:

$web = new \spekulatius\phpscraper();

/**
 * Navigate to the test page. This page contains two images:
 *
 * <img src="https://test-pages.phpscraper.de/assets/cat.jpg" alt="absolute path">
 * <img src="/assets/cat.jpg" alt="relative path">
 */
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');

var_dump($web->imagesWithDetails);
/**
 * [
 *     'url' => 'https://test-pages.phpscraper.de/assets/cat.jpg',
 *     'alt' => 'absolute path',
 *     'width' => null,
 *     'height' => null,
 * ],
 * [
 *     'url' => 'https://test-pages.phpscraper.de/assets/cat.jpg',
 *     'alt' => 'relative path',
 *     'width' => null,
 *     'height' => null,
 * ]
 */

Some information optionally is returned as an array with details. For this example, a simple list of images is avaiable using $web->images too.

More example code can be found in the sidebar or the tests.

# Installation

As usual, done via composer:

composer require spekulatius/phpscraper

This automatically ensures the package is loaded. You can now use any of the above noted examples.

# Contributing

Awesome, if you would like contribute please check the guidelines before getting started.

# Tests

The code is roughly covered with end-to-end tests. For this, simple web-pages are hosted under https://test-pages.phpscraper.de/, loaded and parsed using PHPUnit. These tests are also suitable as examples - see tests/!

This being said, there are probably edge cases which aren't working and may cause trouble. If you find one, please raise a bug on GitHub.