# Scrape Links

The scraping of links works very similar to image scraping. You can retrieve a list of URL without any additional information as well as a detailed list containing rel, target as well as other attributes.

The following example parses a web-page for any links and returns an array of absolute URLs:

$web = new \Spekulatius\PHPScraper\PHPScraper;

/**
 * Navigate to the test page. It contains 6 links to placekitten.com with different attributes:
 *
 * <h2>Different ways to wrap the attributes</h2>
 * <p><a href="https://placekitten.com/408/287" target=_blank>external kitten</a></p>
 * <p><a href="https://placekitten.com/444/333" target="_blank">external kitten</a></p>
 * <p><a href="https://placekitten.com/444/321" target='_blank'>external kitten</a></p>
 *
 * <h2>Named frame/window/tab</h2>
 * <p><a href="https://placekitten.com/408/287" target=kitten>external kitten</a></p>
 * <p><a href="https://placekitten.com/444/333" target="kitten">external kitten</a></p>
 * <p><a href="https://placekitten.com/444/321" target='kitten'>external kitten</a></p>
 */
$web->go('https://test-pages.phpscraper.de/links/target.html');

// Print the number of links.
echo "This page contains " . count($web->links) . " links.\n\n";

// Loop through the links
foreach ($web->links as $link) {
    echo " - " . $link . "\n";
}

/**
 * Combined this will print out:
 *
 * This page contains 6 links.
 *
 * - https://placekitten.com/408/287
 * - https://placekitten.com/444/333
 * - https://placekitten.com/444/321
 * - https://placekitten.com/408/287
 * - https://placekitten.com/444/333
 * - https://placekitten.com/444/321
 */

If the page shouldn't contain any links, an empty array is returned.

If you are in need of more details you can access these in a similar way as on the images. Below is an example to access the detailed data of the first link on the page:

$web = new \Spekulatius\PHPScraper\PHPScraper;

/**
 * Navigate to the test page. This page contains several links with different rel attributes. To save space only the first one:
 *
 * <a href="https://placekitten.com/432/287" rel="nofollow">external kitten</a>
 */
$web->go('https://test-pages.phpscraper.de/links/rel.html');

// Get the first link on the page.
$firstLink = $web->linksWithDetails[0];

/**
 * $firstLink contains now:
 *
 * [
 *     'url' => 'https://placekitten.com/432/287',
 *     'protocol' => 'https',
 *     'text' => 'external kitten',
 *     'title' => null,
 *     'target' => null,
 *     'rel' => 'nofollow',
 *     'isNofollow' => true,
 *     'isUGC' => false,
 *     'isNoopener' => false,
 *     'isNoreferrer' => false,
 * ]
 */

If you require more data, you will either need to extend the library or submit an issue for consideration.

PHPScraper allows to return only internal or external links. The following demonstrates both:

$web = new \Spekulatius\PHPScraper\PHPScraper;

// Navigate to the test page.
$web->go('https://test-pages.phpscraper.de/links/base-href.html');

// Get the list of internal links (in the example an image is linked)
var_dump($web->internalLinks);
/**
 * [
 *     'https://test-pages.phpscraper.de/assets/cat.jpg'
 * ]
 */

// Get the list of external links
var_dump($web->externalLinks);
/**
 * [
 *     'https://placekitten.com/408/287'
 * ]
 */