# Scrape Header Tags

The header tags often contain useful information about a web-page and how it fits into the overall structure of the website it is part of. The following examples show how to access particular pieces of information from the <head> and collections around these.

# Charset

To access the defined charset, you can use the following method:

$web = new \Spekulatius\PHPScraper\PHPScraper;

/**
 * Navigate to the test page. It contains:
 *
 * <meta charset="utf-8" />
 */
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');

// Print the charset
echo $web->charset;     // "utf-8"

# Viewport

In some cases, such as the viewport and the meta keywords, the string is representing an array and will be provided as such:

$web = new \Spekulatius\PHPScraper\PHPScraper;

/**
 * Navigate to the test page. It contains:
 *
 * <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no, maximum-scale=1, user-scalable=no" />
 */
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');

/**
 * Returns the viewport as an array. For the example it contains:
 *
 * [
 *     'width=device-width',
 *     'initial-scale=1',
 *     'shrink-to-fit=no',
 *     'maximum-scale=1',
 *     'user-scalable=no'
 * ],
 */
var_dump($web->viewport);

If you need to access the original "viewport"-string, you can use viewportString:

$web = new \Spekulatius\PHPScraper\PHPScraper;
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');

/**
 * Returns the viewport as a string. It prints:
 *
 * "width=device-width, initial-scale=1, shrink-to-fit=no, maximum-scale=1, user-scalable=no"
 */
echo $web->viewportString;

# Canonical URL

The canonical URL, if given, can be accessed as shown in the example below:

$web = new \Spekulatius\PHPScraper\PHPScraper;

/**
 * Navigate to the test page. It contains:
 *
 * <link rel="canonical" href="https://test-pages.phpscraper.de/navigation/2.html" />
 */
$web->go('https://test-pages.phpscraper.de/navigation/1.html');

// Print the canonical URL:
echo $web->canonical;       // "https://test-pages.phpscraper.de/navigation/2.html"

TIP

If no canonical link is set, the method returns null.

# Content-Type

To access the content type you can use the following functionality:

$web = new \Spekulatius\PHPScraper\PHPScraper;

/**
 * Navigate to the test page. It contains:
 *
 * <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
 */
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');

// Print the contentType:
echo $web->contentType;     // "text/html; charset=utf-8"

# CSFR Token

The CSFR token method assumes that the token is stored in a meta tag with the name "csrf-token". This is the default for Laravel. You can access it using the following code:

$web = new \Spekulatius\PHPScraper\PHPScraper;

/**
 * Navigate to the test page. It contains:
 *
 * <meta name="csrf-token" content="token" />
 */
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');

// Returns the csrfToken:
echo $web->csrfToken;     // "token"

# Combined Header Tags

You can use the headers-method if you want all of the above-mentioned methods. It is defined as:

/**
 * @return array
 */
public function headers()
{
    return [
        'charset' => $this->charset(),
        'contentType' => $this->contentType(),
        'viewport' => $this->viewport(),
        'canonical' => $this->canonical(),
        'csrfToken' => $this->csrfToken(),
    ];
}

More information on accessing the meta tags.