first commit

This commit is contained in:
2025-02-24 22:33:42 +01:00
commit 737c037e85
18358 changed files with 5392983 additions and 0 deletions

View File

@@ -0,0 +1,19 @@
<?php
$finder = PhpCsFixer\Finder::create()
->exclude('tests')
->in(__DIR__)
;
$config = PhpCsFixer\Config::create();
$config
->setRules([
'@PSR2' => true,
'array_syntax' => [
'syntax' => 'short',
],
])
->setFinder($finder)
;
return $config;

View File

@@ -0,0 +1,182 @@
# dom-util-for-webp
[![Latest Stable Version](https://img.shields.io/packagist/v/rosell-dk/dom-util-for-webp.svg?style=flat-square)](https://packagist.org/packages/rosell-dk/dom-util-for-webp)
[![Minimum PHP Version](https://img.shields.io/badge/php-%3E%3D%205.6-8892BF.svg?style=flat-square)](https://php.net)
[![Build Status](https://img.shields.io/github/actions/workflow/status/rosell-dk/dom-util-for-webp/ci.yml?branch=master&logo=GitHub&style=flat-square&label=build)](https://github.com/rosell-dk/dom-util-for-webp/actions/workflows/ci.yml)
[![Coverage](https://img.shields.io/endpoint?url=https://little-b.it/dom-util-for-webp/code-coverage/coverage-badge.json)](http://little-b.it/dom-util-for-webp/code-coverage/coverage/index.html)
[![Software License](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat-square)](https://github.com/rosell-dk/dom-util-for-webp/blob/master/LICENSE)
*Replace image URLs found in HTML*
This library can do two things:
1) Replace image URLs in HTML
2) Replace *&lt;img&gt;* tags with *&lt;picture&gt;* tags, adding webp versions to sources
To setup with composer, run ```composer require rosell-dk/dom-util-for-webp```.
## 1. Replacing image URLs in HTML
The *ImageUrlReplacer::replace($html)* method accepts a piece of HTML and returns HTML where where all image URLs have been replaced - even those in inline styles.
*Usage:*
```php
$modifiedHtml = ImageUrlReplacer::replace($html);
```
### Example replacements:
*input:*
```html
<img src="image.jpg">
<img src="1.jpg" srcset="2.jpg 1000w">
<picture>
<source srcset="1.jpg" type="image/webp">
<source srcset="2.png" type="image/webp">
<source src="3.gif"> <!-- gifs are skipped in default behaviour -->
<source src="4.jpg?width=200"> <!-- urls with query string are skipped in default behaviour -->
</picture>
<div style="background-image: url('image.jpeg')"></div>
<style>
#hero {
background: lightblue url("image.png") no-repeat fixed center;;
}
</style>
<input type="button" src="1.jpg">
<img data-src="image.jpg"> <!-- any attribute starting with "data-" are replaced (if it ends with "jpg", "jpeg" or "png"). For lazy-loading -->
```
*output:*
```html
<img src="image.jpg.webp">
<img src="1.jpg.webp" srcset="2.jpg.webp 1000w">
<picture>
<source srcset="1.jpg.webp" type="image/webp">
<source srcset="2.jpg.webp" type="image/webp">
<source srcset="3.gif"> <!-- gifs are skipped in default behaviour -->
<source srcset="4.jpg?width=200"> <!-- urls with query string are skipped in default behaviour -->
</picture>
<div style="background-image: url('image.jpeg.webp')"></div>
<style>
#hero {
background: lightblue url("image.png.webp") no-repeat fixed center;;
}
</style>
<input type="button" src="1.jpg.webp">
<img data-src="image.jpg.webp"> <!-- any attribute starting with "data-" are replaced (if it ends with "jpg", "jpeg" or "png"). For lazy-loading -->
```
Default behaviour of *ImageUrlReplacer::replace*:
- The modified URL is the same as the original, with ".webp" appended (to change, override the `replaceUrl` function)
- Only replaces URLs that ends with "png", "jpg" or "jpeg" (no query strings either) (to change, override the `replaceUrl` function)
- Attribute search/replace limits to these tags: *&lt;img&gt;*, *&lt;source&gt;*, *&lt;input&gt;* and *&lt;iframe&gt;* (to change, override the `$searchInTags` property)
- Attribute search/replace limits to these attributes: "src", "src-set" and any attribute starting with "data-" (to change, override the `attributeFilter` function)
- Urls inside styles are replaced too (*background-image* and *background* properties)
The behaviour can be modified by extending *ImageUrlReplacer* and overriding public methods such as *replaceUrl*
ImageUrlReplacer uses the `Sunra\PhpSimple\HtmlDomParser`[library](https://github.com/sunra/php-simple-html-dom-parser) for parsing and modifying HTML. It wraps [simplehtmldom](http://simplehtmldom.sourceforge.net/). Simplehtmldom supports invalid HTML (it does not touch the invalid parts)
### Example: Customized behaviour
```php
class ImageUrlReplacerCustomReplacer extends ImageUrlReplacer
{
public function replaceUrl($url) {
// Only accept urls ending with "png", "jpg", "jpeg" and "gif"
if (!preg_match('#(png|jpe?g|gif)$#', $url)) {
return;
}
// Only accept full urls (beginning with http:// or https://)
if (!preg_match('#^https?://#', $url)) {
return;
}
// PS: You probably want to filter out external images too...
// Simply append ".webp" after current extension.
// This strategy ensures that "logo.jpg" and "logo.gif" gets counterparts with unique names
return $url . '.webp';
}
public function attributeFilter($attrName) {
// Don't allow any "data-" attribute, but limit to attributes that smells like they are used for images
// The following rule matches all attributes used for lazy loading images that we know of
return preg_match('#^(src|srcset|(data-[^=]*(lazy|small|slide|img|large|src|thumb|source|set|bg-url)[^=]*))$#i', $attrName);
// If you want to limit it further, only allowing attributes known to be used for lazy load,
// use the following regex instead:
//return preg_match('#^(src|srcset|data-(src|srcset|cvpsrc|cvpset|thumb|bg-url|large_image|lazyload|source-url|srcsmall|srclarge|srcfull|slide-img|lazy-original))$#i', $attrName);
}
}
$modifiedHtml = ImageUrlReplacerCustomReplacer::replace($html);
```
## 2. Replacing *&lt;img&gt;* tags with *&lt;picture&gt;* tags
The *PictureTags::replace($html)* method accepts a piece of HTML and returns HTML where where all &lt;img&gt; tags have been replaced with &lt;picture&gt; tags, adding webp versions to sources
Usage:
```php
$modifiedHtml = PictureTags::replace($html);
```
#### Example replacements:
*Input:*
```html
<img src="1.png">
<img srcset="3.jpg 1000w" src="3.jpg">
<img data-lazy-src="9.jpg" style="border:2px solid red" class="something">
<figure class="wp-block-image">
<img src="12.jpg" alt="" class="wp-image-6" srcset="12.jpg 492w, 12-300x265.jpg 300w" sizes="(max-width: 492px) 100vw, 492px">
</figure>
```
*Output*:
```html
<picture><source srcset="1.png.webp" type="image/webp"><img src="1.png" class="webpexpress-processed"></picture>
<picture><source srcset="3.jpg.webp 1000w" type="image/webp"><img srcset="3.jpg 1000w" src="3.jpg" class="webpexpress-processed"></picture>
<picture><source data-lazy-src="9.jpg.webp" type="image/webp"><img data-lazy-src="9.jpg" style="border:2px solid red" class="something webpexpress-processed"></picture>
<figure class="wp-block-image">
<picture><source srcset="12.jpg.webp 492w, 12-300x265.jpg.webp 300w" sizes="(max-width: 492px) 100vw, 492px" type="image/webp"><img src="12.jpg" alt="" class="wp-image-6 webpexpress-processed" srcset="12.jpg 492w, 12-300x265.jpg 300w" sizes="(max-width: 492px) 100vw, 492px"></picture>
</figure>'
```
Note that with the picture tags, it is still the img tag that shows the selected image. The picture tag is just a wrapper.
So it is correct behaviour not to copy the *style*, *width*, *class* or any other attributes to the picture tag. See [issue #9](https://github.com/rosell-dk/dom-util-for-webp/issues/9).
As with `ImageUrlReplacer`, you can override the *replaceUrl* function. There is however currently no other methods to override.
`PictureTags` currently uses regular expressions to do the replacing. There are plans to change implementation to use `Sunra\PhpSimple\HtmlDomParser`, like our `ImageUrlReplacer` class does.
## Platforms
Works on (at least):
- OS: Ubuntu (22.04, 20.04, 18.04), Windows (2022, 2019), Mac OS (13, 12, 11, 10.15)
- PHP: 5.6 - 8.2 (also tested 8.3 and 8.4 development versions in October 2023)
Each new release will be tested on all combinations of OSs and PHP versions that are [supported](https://github.com/marketplace/actions/setup-php-action) by GitHub-hosted runners. Except that we do not below PHP 5.6.\
Status: [![Build Status](https://img.shields.io/github/actions/workflow/status/rosell-dk/dom-util-for-webp/release.yml?branch=master&logo=GitHub&style=flat-square&label=Giant%20test)](https://github.com/rosell-dk/dom-util-for-webp/actions/workflows/release.yml)
Testing consists of running the unit tests. The code in this library is almost completely covered by tests (~95% coverage).
We also test future versions of PHP monthly, in order to catch problems early.\
Status:
[![PHP 8.3](https://img.shields.io/github/actions/workflow/status/rosell-dk/dom-util-for-webp/php83.yml?branch=master&logo=GitHub&style=flat-square&label=PHP%208.3)](https://github.com/rosell-dk/dom-util-for-webp/actions/workflows/php83.yml)
[![PHP 8.4](https://img.shields.io/github/actions/workflow/status/rosell-dk/dom-util-for-webp/php84.yml?branch=master&logo=GitHub&style=flat-square&label=PHP%208.4)](https://github.com/rosell-dk/dom-util-for-webp/actions/workflows/php84.yml)
## Do you like what I do?
Perhaps you want to support my work, so I can continue doing it :)
- [Become a backer or sponsor on Patreon](https://www.patreon.com/rosell).
- [Buy me a Coffee](https://ko-fi.com/rosell)

View File

@@ -0,0 +1,66 @@
{
"name": "rosell-dk/dom-util-for-webp",
"description": "Replace image URLs found in HTML",
"type": "library",
"license": "MIT",
"minimum-stability": "stable",
"keywords": ["webp", "replace", "images", "html"],
"scripts": {
"ci": [
"@build",
"@test-cov-console",
"@phpcs-all",
"@composer validate --no-check-all --strict",
"@phpstan"
],
"cs-fix-all": [
"php-cs-fixer fix src"
],
"cs-fix": "php-cs-fixer fix",
"cs-dry": "php-cs-fixer fix --dry-run --diff",
"test": "phpunit --coverage-text=build/coverage.txt --coverage-clover=build/coverage.clover --coverage-html=build/coverage --whitelist=src tests",
"test-cov-console": "phpunit --coverage-text --whitelist=src tests",
"test-41": "phpunit --coverage-text --configuration 'phpunit-41.xml.dist'",
"test-no-cov": "phpunit --no-coverage tests",
"phpunit": "phpunit --no-coverage",
"phpcs": "phpcs --standard=phpcs-ruleset.xml",
"phpcs-all": "phpcs --standard=phpcs-ruleset.xml src",
"phpcbf": "phpcbf --standard=phpcs-ruleset.xml",
"phpstan": "vendor/bin/phpstan analyse src --level=4"
},
"extra": {
"scripts-descriptions": {
"ci": "Run tests before CI",
"phpcs": "Checks coding styles (PSR2) of file/dir, which you must supply. To check all, supply 'src'",
"phpcbf": "Fix coding styles (PSR2) of file/dir, which you must supply. To fix all, supply 'src'",
"cs-fix-all": "Fix the coding style of all the source files, to comply with the PSR-2 coding standard",
"cs-fix": "Fix the coding style of a PHP file or directory, which you must specify.",
"test": "Launches the preconfigured PHPUnit"
}
},
"autoload": {
"psr-4": { "DOMUtilForWebP\\": "src/" }
},
"autoload-dev": {
"psr-4": { "DOMUtilForWebPTests\\": "tests/" }
},
"authors": [
{
"name": "Bjørn Rosell",
"homepage": "https://www.bitwise-it.dk/contact",
"role": "Project Author"
}
],
"require-dev": {
"friendsofphp/php-cs-fixer": "^2.11",
"phpstan/phpstan": "^1.5",
"phpunit/phpunit": "^9.3",
"squizlabs/php_codesniffer": "3.*"
},
"config": {
"sort-packages": true
},
"require": {
"kub-at/php-simple-html-dom-parser": "^1.9"
}
}

View File

@@ -0,0 +1,43 @@
# Development
## Setting up the environment.
First, clone the repository:
```
cd whatever/folder/you/want
git clone git@github.com:rosell-dk/dom-util-for-webp.git
```
Then install the dev tools with composer:
```
composer install
```
If you don't have composer yet:
- Get it ([download phar](https://getcomposer.org/composer.phar) and move it to /usr/local/bin/composer)
- PS: PHPUnit requires php-xml, php-mbstring and php-curl. To install: `sudo apt install php-xml php-mbstring curl php-curl`
Make sure you have [xdebug](https://xdebug.org/docs/install) installed, if you want phpunit tog generate code coverage report
## Unit Testing
To run all the unit tests do this:
```
composer test
```
This also runs tests on the builds.
If you do not the coverage report:
```
composer phpunit
```
Individual test files can be executed like this:
```
composer phpunit tests/ImageUrlReplacerTest.php
composer phpunit tests/PictureTagsTest.php
```
Note:
The code coverage requires [xdebug](https://xdebug.org/docs/install)

View File

@@ -0,0 +1,8 @@
<?xml version="1.0"?>
<ruleset name="Custom Standard">
<description>PSR2 without line ending rule - let git manage the EOL cross the platforms</description>
<rule ref="PSR2" />
<rule ref="Generic.Files.LineEndings">
<exclude name="Generic.Files.LineEndings.InvalidEOLChar"/>
</rule>
</ruleset>

View File

@@ -0,0 +1,38 @@
<?xml version="1.0" encoding="UTF-8"?>
<phpunit xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://schema.phpunit.de/4.1/phpunit.xsd"
backupGlobals="false"
backupStaticAttributes="false"
colors="true"
convertErrorsToExceptions="true"
convertNoticesToExceptions="true"
convertWarningsToExceptions="false"
processIsolation="false"
stopOnFailure="false"
bootstrap="vendor/autoload.php"
>
<testsuites>
<testsuite name="Dom util for WebP Test Suite">
<directory>./tests/</directory>
</testsuite>
</testsuites>
<filter>
<whitelist>
<directory suffix=".php">src/</directory>
<exclude>
<directory>./vendor</directory>
<directory>./tests</directory>
</exclude>
</whitelist>
</filter>
<logging>
<log type="junit" target="build/report.junit.xml"/>
<log type="coverage-clover" target="build/logs/clover.xml"/>
<log type="coverage-text" target="build/coverage.txt"/>
<!--<log type="coverage-html" target="build/coverage"/>-->
</logging>
</phpunit>

View File

@@ -0,0 +1,21 @@
<?xml version="1.0" encoding="UTF-8"?>
<phpunit
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://schema.phpunit.de/9.3/phpunit.xsd"
backupGlobals="false"
backupStaticAttributes="false"
colors="true"
convertErrorsToExceptions="true"
convertNoticesToExceptions="true"
convertWarningsToExceptions="true"
convertDeprecationsToExceptions="true"
processIsolation="true"
stopOnFailure="false"
bootstrap="vendor/autoload.php"
failOnWarning="true"
failOnRisky="false">
<testsuites>
<testsuite name="Dom util for WebP Test Suite">
<directory>./tests/</directory>
</testsuite>
</testsuites>
</phpunit>

View File

@@ -0,0 +1,247 @@
<?php
namespace DOMUtilForWebP;
//use Sunra\PhpSimple\HtmlDomParser;
use KubAT\PhpSimple\HtmlDomParser;
/**
* Highly configurable class for replacing image URLs in HTML (both src and srcset syntax)
*
* Uses http://simplehtmldom.sourceforge.net/ - a library for easily manipulating HTML by means of a DOM.
* The great thing about this library is that it supports working on invalid HTML and it only applies the changes you
* make - very gently (however, not as gently as we do in PictureTags).
* PS: The library is a bit old, so perhaps we should look for another.
* ie https://packagist.org/packages/masterminds/html5 ??
*
* Behaviour can be customized by overriding the public methods (replaceUrl, $searchInTags, etc)
*
* Default behaviour:
* - The modified URL is the same as the original, with ".webp" appended (replaceUrl)
* - Limits to these tags: <img>, <source>, <input> and <iframe> ($searchInTags)
* - Limits to these attributes: "src", "src-set" and any attribute starting with "data-" (attributeFilter)
* - Only replaces URLs that ends with "png", "jpg" or "jpeg" (no query strings either) (replaceUrl)
*
*
*/
class ImageUrlReplacer
{
// define tags to be searched.
// The div and li are on the list because these are often used with lazy loading
// should we add <meta> ?
// Probably not for open graph images or twitter
// so not these:
// - <meta property="og:image" content="[url]">
// - <meta property="og:image:secure_url" content="[url]">
// - <meta name="twitter:image" content="[url]">
// Meta can also be used in schema.org micro-formatting, ie:
// - <meta itemprop="image" content="[url]">
//
// How about preloaded images? - yes, suppose we should replace those
// - <link rel="prefetch" href="[url]">
// - <link rel="preload" as="image" href="[url]">
public static $searchInTags = ['img', 'source', 'input', 'iframe', 'div', 'li', 'link', 'a', 'section', 'video'];
/**
* Empty constructor for preventing child classes from creating constructors.
*
* We do this because otherwise the "new static()" call inside the ::replace() method
* would be unsafe. See #21
* @return void
*/
final public function __construct()
{
}
/**
*
* @return string|null webp url or, if URL should not be changed, return nothing
**/
public function replaceUrl($url)
{
if (!preg_match('#(png|jpe?g)$#', $url)) {
return null;
}
return $url . '.webp';
}
public function replaceUrlOr($url, $returnValueIfDenied)
{
$url = $this->replaceUrl($url);
return (isset($url) ? $url : $returnValueIfDenied);
}
/*
public function isValidUrl($url)
{
return preg_match('#(png|jpe?g)$#', $url);
}*/
public function handleSrc($attrValue)
{
return $this->replaceUrlOr($attrValue, $attrValue);
}
public function handleSrcSet($attrValue)
{
// $attrValue is ie: <img data-x="1.jpg 1000w, 2.jpg">
$srcsetArr = explode(',', $attrValue);
foreach ($srcsetArr as $i => $srcSetEntry) {
// $srcSetEntry is ie "image.jpg 520w", but can also lack width, ie just "image.jpg"
// it can also be ie "image.jpg 2x"
$srcSetEntry = trim($srcSetEntry);
$entryParts = preg_split('/\s+/', $srcSetEntry, 2);
if (count($entryParts) == 2) {
list($src, $descriptors) = $entryParts;
} else {
$src = $srcSetEntry;
$descriptors = null;
}
$webpUrl = $this->replaceUrlOr($src, false);
if ($webpUrl !== false) {
$srcsetArr[$i] = $webpUrl . (isset($descriptors) ? ' ' . $descriptors : '');
}
}
return implode(', ', $srcsetArr);
}
/**
* Test if attribute value looks like it has srcset syntax.
* "image.jpg 100w" does for example. And "image.jpg 1x". Also "image1.jpg, image2.jpg 1x"
* Mixing x and w is invalid (according to
* https://stackoverflow.com/questions/26928828/html5-srcset-mixing-x-and-w-syntax)
* But we accept it anyway
* It is not the job of this function to see if the first part is an image URL
* That will be done in handleSrcSet.
*
*/
public function looksLikeSrcSet($value)
{
if (preg_match('#\s\d*(w|x)#', $value)) {
return true;
}
return false;
}
public function handleAttribute($value)
{
if (self::looksLikeSrcSet($value)) {
return self::handleSrcSet($value);
}
return self::handleSrc($value);
}
public function attributeFilter($attrName)
{
$attrName = strtolower($attrName);
if (($attrName == 'src') || ($attrName == 'srcset') || (strpos($attrName, 'data-') === 0)) {
return true;
}
return false;
}
public function processCSSRegExCallback($matches)
{
list($all, $pre, $quote, $url, $post) = $matches;
return $pre . $this->replaceUrlOr($url, $url) . $post;
}
public function processCSS($css)
{
$declarations = explode(';', $css);
foreach ($declarations as $i => &$declaration) {
if (preg_match('#(background(-image)?)\\s*:#', $declaration)) {
// https://regexr.com/46qdg
//$regex = '#(url\s*\(([\"\']?))([^\'\";\)]*)(\2\s*\))#';
$parts = explode(',', $declaration);
//print_r($parts);
foreach ($parts as &$part) {
//echo 'part:' . $part . "\n";
$regex = '#(url\\s*\\(([\\"\\\']?))([^\\\'\\";\\)]*)(\\2\\s*\\))#';
$part = preg_replace_callback(
$regex,
'\DOMUtilForWebP\ImageUrlReplacer::processCSSRegExCallback',
$part
);
//echo 'result:' . $part . "\n";
}
$declarations[$i] = implode(',', $parts);
}
}
return implode(';', $declarations);
}
public function replaceHtml($html)
{
if ($html == '') {
return '';
}
// https://stackoverflow.com/questions/4812691/preserve-line-breaks-simple-html-dom-parser
// function str_get_html($str, $lowercase=true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET,
// $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
$dom = HtmlDomParser::str_get_html($html, false, true, 'UTF-8', false);
//$dom = str_get_html($html, false, false, 'UTF-8', false);
// MAX_FILE_SIZE is defined in simple_html_dom.
// For safety sake, we make sure it is defined before using
defined('MAX_FILE_SIZE') || define('MAX_FILE_SIZE', 600000);
if ($dom === false) {
if (strlen($html) > MAX_FILE_SIZE) {
return '<!-- Alter HTML was skipped because the HTML is too big to process! ' .
'(limit is set to ' . MAX_FILE_SIZE . ' bytes) -->' . "\n" . $html;
}
return '<!-- Alter HTML was skipped because the helper library refused to process the html -->' .
"\n" . $html;
}
// Replace attributes (src, srcset, data-src, etc)
foreach (self::$searchInTags as $tagName) {
$elems = $dom->find($tagName);
foreach ($elems as $index => $elem) {
$attributes = $elem->getAllAttributes();
foreach ($elem->getAllAttributes() as $attrName => $attrValue) {
if ($this->attributeFilter($attrName)) {
$elem->setAttribute($attrName, $this->handleAttribute($attrValue));
}
}
}
}
// Replace <style> elements
$elems = $dom->find('style');
foreach ($elems as $index => $elem) {
$css = $this->processCSS($elem->innertext);
if ($css != $elem->innertext) {
$elem->innertext = $css;
}
}
// Replace "style attributes
$elems = $dom->find('*[style]');
foreach ($elems as $index => $elem) {
$css = $this->processCSS($elem->style);
if ($css != $elem->style) {
$elem->style = $css;
}
}
return $dom->save();
}
/* Main replacer function */
public static function replace($html)
{
/*if (!function_exists('str_get_html')) {
require_once __DIR__ . '/../src-vendor/simple_html_dom/simple_html_dom.inc';
}*/
$iur = new static();
return $iur->replaceHtml($html);
}
}

View File

@@ -0,0 +1,337 @@
<?php
namespace DOMUtilForWebP;
//use Sunra\PhpSimple\HtmlDomParser;
use KubAT\PhpSimple\HtmlDomParser;
/**
* Class PictureTags - convert an <img> tag to a <picture> tag and add the webp versions of the images
* Code is based on code from the ShortPixel plugin, which in turn used code from Responsify WP plugin
*
* It works like this:
*
* 1. Remove existing <picture> tags and their content - replace with tokens in order to reinsert later
* 2. Process <img> tags.
* - The tags are found with regex.
* - The attributes are parsed with DOMDocument if it exists, otherwise with the Simple Html Dom library,
* which is included inside this library
* 3. Re-insert the existing <picture> tags
*
* This procedure is very gentle and needle-like. No need for a complete parse - so invalid HTML is no big issue
*
* PS:
* https://packagist.org/packages/masterminds/html5
*/
class PictureTags
{
/**
* Empty constructor for preventing child classes from creating constructors.
*
* We do this because otherwise the "new static()" call inside the ::replace() method
* would be unsafe. See #21
* @return void
*/
final public function __construct()
{
$this->existingPictureTags = [];
}
private $existingPictureTags;
public function replaceUrl($url)
{
if (!preg_match('#(png|jpe?g)$#', $url)) {
return;
}
return $url . '.webp';
}
public function replaceUrlOr($url, $returnValueIfDenied)
{
$url = $this->replaceUrl($url);
return (isset($url) ? $url : $returnValueIfDenied);
}
/**
* Look for attributes such as "data-lazy-src" and "data-src" and prefer them over "src"
*
* @param array $attributes an array of attributes for the element
* @param string $attrName ie "src", "srcset" or "sizes"
*
* @return array an array with "value" key and "attrName" key. ("value" is the value of the attribute and
* "attrName" is the name of the attribute used)
*
*/
private static function lazyGet($attributes, $attrName)
{
return array(
'value' =>
(isset($attributes['data-lazy-' . $attrName]) && strlen($attributes['data-lazy-' . $attrName])) ?
trim($attributes['data-lazy-' . $attrName])
: (isset($attributes['data-' . $attrName]) && strlen($attributes['data-' . $attrName]) ?
trim($attributes['data-' . $attrName])
: (isset($attributes[$attrName]) && strlen($attributes[$attrName]) ?
trim($attributes[$attrName]) : false)),
'attrName' =>
(isset($attributes['data-lazy-' . $attrName]) && strlen($attributes['data-lazy-' . $attrName])) ?
'data-lazy-' . $attrName
: (isset($attributes['data-' . $attrName]) && strlen($attributes['data-' . $attrName]) ?
'data-' . $attrName
: (isset($attributes[$attrName]) && strlen($attributes[$attrName]) ? $attrName : false))
);
}
/**
* Look for attribute such as "src", but also with prefixes such as "data-lazy-src" and "data-src"
*
* @param array $attributes an array of all attributes for the element
* @param string $attrName ie "src", "srcset" or "sizes"
*
* @return array an array with "value" key and "attrName" key. ("value" is the value of the attribute and
* "attrName" is the name of the attribute used)
*
*/
private static function findAttributesWithNameOrPrefixed($attributes, $attrName)
{
$tryThesePrefixes = ['', 'data-lazy-', 'data-'];
$result = [];
foreach ($tryThesePrefixes as $prefix) {
$name = $prefix . $attrName;
if (isset($attributes[$name]) && strlen($attributes[$name])) {
/*$result[] = [
'value' => trim($attributes[$name]),
'attrName' => $name,
];*/
$result[$name] = trim($attributes[$name]);
}
}
return $result;
}
/**
* Convert to UTF-8 and encode chars outside of ascii-range
*
* Input: html that might be in any character encoding and might contain non-ascii characters
* Output: html in UTF-8 encding, where non-ascii characters are encoded
*
*/
private static function textToUTF8WithNonAsciiEncoded($html)
{
if (function_exists("mb_convert_encoding")) {
$html = mb_convert_encoding($html, 'UTF-8');
$html = mb_encode_numericentity($html, array (0x7f, 0xffff, 0, 0xffff), 'UTF-8');
}
return $html;
}
private static function getAttributes($html)
{
if (class_exists('\\DOMDocument')) {
$dom = new \DOMDocument();
if (function_exists("mb_encode_numericentity")) {
// I'm in doubt if I should add the following line (see #41)
// $html = mb_convert_encoding($html, 'UTF-8');
$html = mb_encode_numericentity($html, array (0x7f, 0xffff, 0, 0xffff)); // #41
}
@$dom->loadHTML($html);
$image = $dom->getElementsByTagName('img')->item(0);
$attributes = [];
foreach ($image->attributes as $attr) {
$attributes[$attr->nodeName] = $attr->nodeValue;
}
return $attributes;
} else {
// Convert to UTF-8 because HtmlDomParser::str_get_html needs to be told the
// encoding. As UTF-8 might conflict with the charset set in the meta, we must
// encode all characters outside the ascii-range.
// It would perhaps have been better to try to guess the encoding rather than
// changing it (see #39), but I'm reluctant to introduce changes.
$html = self::textToUTF8WithNonAsciiEncoded($html);
$dom = HtmlDomParser::str_get_html($html, false, true, 'UTF-8', false);
if ($dom !== false) {
$elems = $dom->find('img,IMG');
foreach ($elems as $index => $elem) {
$attributes = [];
foreach ($elem->getAllAttributes() as $attrName => $attrValue) {
$attributes[strtolower($attrName)] = $attrValue;
}
return $attributes;
}
}
return [];
}
}
/**
* Makes a string with all attributes.
*
* @param array $attribute_array
* @return string
*/
private static function createAttributes($attribute_array)
{
$attributes = '';
foreach ($attribute_array as $attribute => $value) {
$attributes .= $attribute . '="' . $value . '" ';
}
if ($attributes == '') {
return '';
}
// Removes the extra space after the last attribute. Add space before
return ' ' . substr($attributes, 0, -1);
}
/**
* Replace <img> tag with <picture> tag.
*/
private function replaceCallback($match)
{
$imgTag = $match[0];
// Do nothing with images that have the 'webpexpress-processed' class.
if (strpos($imgTag, 'webpexpress-processed')) {
return $imgTag;
}
$imgAttributes = self::getAttributes($imgTag);
$srcInfo = self::lazyGet($imgAttributes, 'src');
$srcsetInfo = self::lazyGet($imgAttributes, 'srcset');
$sizesInfo = self::lazyGet($imgAttributes, 'sizes');
$srcSetAttributes = self::findAttributesWithNameOrPrefixed($imgAttributes, 'srcset');
$srcAttributes = self::findAttributesWithNameOrPrefixed($imgAttributes, 'src');
if ((!isset($srcSetAttributes['srcset'])) && (!isset($srcAttributes['src']))) {
// better not mess with this html...
return $imgTag;
}
// add the exclude class so if this content is processed again in other filter,
// the img is not converted again in picture
$imgAttributes['class'] = (isset($imgAttributes['class']) ? $imgAttributes['class'] . " " : "") .
"webpexpress-processed";
// Process srcset (also data-srcset etc)
$atLeastOneWebp = false;
$sourceTagAttributes = [];
foreach ($srcSetAttributes as $attrName => $attrValue) {
$srcsetArr = explode(', ', $attrValue);
$srcsetArrWebP = [];
foreach ($srcsetArr as $i => $srcSetEntry) {
// $srcSetEntry is ie "http://example.com/image.jpg 520w"
$result = preg_split('/\s+/', trim($srcSetEntry));
$src = trim($srcSetEntry);
$width = null;
if ($result && count($result) >= 2) {
list($src, $width) = $result;
}
$webpUrl = $this->replaceUrlOr($src, false);
if ($webpUrl == false) {
// We want ALL of the sizes as webp.
// If we cannot have that, it is better to abort! - See #42
return $imgTag;
} else {
if (substr($src, 0, 5) != 'data:') {
$atLeastOneWebp = true;
$srcsetArrWebP[] = $webpUrl . (isset($width) ? ' ' . $width : '');
}
}
}
$sourceTagAttributes[$attrName] = implode(', ', $srcsetArrWebP);
}
foreach ($srcAttributes as $attrName => $attrValue) {
if (substr($attrValue, 0, 5) == 'data:') {
// ignore tags with data urls, such as <img src="data:...
return $imgTag;
}
// Make sure not to override existing srcset with src
if (!isset($sourceTagAttributes[$attrName . 'set'])) {
$srcWebP = $this->replaceUrlOr($attrValue, false);
if ($srcWebP !== false) {
$atLeastOneWebp = true;
}
$sourceTagAttributes[$attrName . 'set'] = $srcWebP;
}
}
if ($sizesInfo['value']) {
$sourceTagAttributes[$sizesInfo['attrName']] = $sizesInfo['value'];
}
if (!$atLeastOneWebp) {
// We have no webps for you, so no reason to create <picture> tag
return $imgTag;
}
return '<picture>'
. '<source' . self::createAttributes($sourceTagAttributes) . ' type="image/webp">'
. '<img' . self::createAttributes($imgAttributes) . '>'
. '</picture>';
}
/*
*
*/
public function removePictureTagsTemporarily($content)
{
//print_r($content);
$this->existingPictureTags[] = $content[0];
return 'PICTURE_TAG_' . (count($this->existingPictureTags) - 1) . '_';
}
/*
*
*/
public function insertPictureTagsBack($content)
{
$numberString = $content[1];
$numberInt = intval($numberString);
return $this->existingPictureTags[$numberInt];
}
/**
*
*/
public function replaceHtml($content)
{
if (!class_exists('\\DOMDocument') && function_exists('mb_detect_encoding')) {
// PS: Correctly identifying Windows-1251 encoding only works on some systems
// But at least I'm not aware of any false positives
if (mb_detect_encoding($content, ["ASCII", "UTF8", "Windows-1251"]) == 'Windows-1251') {
$content = mb_convert_encoding($content, 'UTF-8', 'Windows-1251');
}
}
$this->existingPictureTags = [];
// Tempororily remove existing <picture> tags
$content = preg_replace_callback(
'/<picture[^>]*>.*?<\/picture>/is',
array($this, 'removePictureTagsTemporarily'),
$content
);
// Replace "<img>" tags
$content = preg_replace_callback('/<img[^>]*>/i', array($this, 'replaceCallback'), $content);
// Re-insert <picture> tags that was removed
$content = preg_replace_callback('/PICTURE_TAG_(\d+)_/', array($this, 'insertPictureTagsBack'), $content);
return $content;
}
/* Main replacer function */
public static function replace($html)
{
$pt = new static();
return $pt->replaceHtml($html);
}
}