# Scrape Header Tags
The header tags often contain useful information about a web-page and how it fits into the overall structure of the website it is part of. The following examples show how to access particular pieces of information from the <head>
and collections around these.
# Charset
To access the defined charset, you can use the following method:
$web = new \Spekulatius\PHPScraper\PHPScraper;
/**
* Navigate to the test page. It contains:
*
* <meta charset="utf-8" />
*/
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');
// Print the charset
echo $web->charset; // "utf-8"
# Viewport
In some cases, such as the viewport and the meta keywords, the string is representing an array and will be provided as such:
$web = new \Spekulatius\PHPScraper\PHPScraper;
/**
* Navigate to the test page. It contains:
*
* <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no, maximum-scale=1, user-scalable=no" />
*/
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');
/**
* Returns the viewport as an array. For the example it contains:
*
* [
* 'width=device-width',
* 'initial-scale=1',
* 'shrink-to-fit=no',
* 'maximum-scale=1',
* 'user-scalable=no'
* ],
*/
var_dump($web->viewport);
If you need to access the original "viewport"-string, you can use viewportString
:
$web = new \Spekulatius\PHPScraper\PHPScraper;
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');
/**
* Returns the viewport as a string. It prints:
*
* "width=device-width, initial-scale=1, shrink-to-fit=no, maximum-scale=1, user-scalable=no"
*/
echo $web->viewportString;
# Canonical URL
The canonical URL, if given, can be accessed as shown in the example below:
$web = new \Spekulatius\PHPScraper\PHPScraper;
/**
* Navigate to the test page. It contains:
*
* <link rel="canonical" href="https://test-pages.phpscraper.de/navigation/2.html" />
*/
$web->go('https://test-pages.phpscraper.de/navigation/1.html');
// Print the canonical URL:
echo $web->canonical; // "https://test-pages.phpscraper.de/navigation/2.html"
TIP
If no canonical link is set, the method returns null
.
# Content-Type
To access the content type you can use the following functionality:
$web = new \Spekulatius\PHPScraper\PHPScraper;
/**
* Navigate to the test page. It contains:
*
* <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
*/
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');
// Print the contentType:
echo $web->contentType; // "text/html; charset=utf-8"
# CSFR Token
The CSFR token method assumes that the token is stored in a meta tag with the name "csrf-token". This is the default for Laravel. You can access it using the following code:
$web = new \Spekulatius\PHPScraper\PHPScraper;
/**
* Navigate to the test page. It contains:
*
* <meta name="csrf-token" content="token" />
*/
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');
// Returns the csrfToken:
echo $web->csrfToken; // "token"
# Combined Header Tags
You can use the headers
-method if you want all of the above-mentioned methods. It is defined as:
/**
* @return array
*/
public function headers()
{
return [
'charset' => $this->charset(),
'contentType' => $this->contentType(),
'viewport' => $this->viewport(),
'canonical' => $this->canonical(),
'csrfToken' => $this->csrfToken(),
];
}
More information on accessing the meta tags.