Files
2025-02-24 22:33:42 +01:00

7.2 KiB

Pattern used for image urls: in attributes: https://regexr.com/46jat in css: https://regexr.com/46jcg

In case reqexr.com should be down, here are the content:

in attributes

pattern: (?<=(?:<(img|source|input|iframe)[^>]\s+(src|srcset|data-[^=])\s*=\s*["']?))(?:[^"'>]+)(.png|.jp[e]?g)(\s\d+w)?(?=/?["'\s>])

text: Notice: The pattern is meant for PHP and contains syntax which only works in some browsers. It works in Chrome. Not in Firefox.

The following should produce matches: hello

In srcset, the whole attribute must be matched

Common lazy load attributes are matched:

The following should NOT produce matches:

Ignore URLs with query string:

nice-jpg src="http://example.com/header.jpeg"
<script src="http://example.com/script.js?preload=image.jpg">

I use another pattern for matching image urls in styles: https://regexr.com/46jcg

It matches stuff like this:

<style>#myphoto {background: url("http://example.com/image2.jpg")}</style>

I have another pattern where we allow QS here: https://regexr.com/46ivi

PS: The rules are used for the WebP Express plugin for Wordpress

PPS: This regex is used in WPFastestCache (not just images) // $content = preg_replace_callback("/(srcset|src|href|data-cvpsrc|data-cvpset|data-thumb|data-bg-url|data-large_image|data-lazyload|data-source-url|data-srcsmall|data-srclarge|data-srcfull|data-slide-img|data-lazy-original)\s{0,2}='"['"]/i", array($this, 'cdn_replace_urls'), $content);

PPPS: As we are limiting to a few tags (img, source, input, etc), and only match image urls ending with (png|jpe?g), I deem it ok to match in ANY "data-" attribute. But if you want to limit it to attributes that smells like they are used for images you can do this: (src|srcset|data-[^=](lazy|small|slide|img|large|src|thumb|source|set|bg-url)[^=]) That will catch the following known and more: data-cvpsrc|data-cvpset|data-thumb|data-bg-url|data-large_image|data-lazyload|data-source-url|data-srcsmall|data-srclarge|data-srcfull|data-slide-img|data-lazy-original

in style

pattern: ((?<=(?:((style\s*=)|(<\sstyle)).background(-image)?\s:\surl\s*(["']?)|(((style\s*=)|(<\sstyle)).url.,\surl(["']?))[^"']*.(jpe?g|png))(?=["'\s>)])

text: Notice: The pattern is meant for PHP and contains syntax which only works in some browsers. It works in Chrome. Not in Firefox.

The following should produce matches:

<style>#myphoto {background: url("http://example.com/image2.jpg")}</style>
<style>#myphoto {background: url("http://example.com/image2.jpg"), url("image2.jpeg"}</style>

Not these:

GIFs are disallowed:

Querystrings are disallowed:

HTML attributes disallowed:

Go with style: background: url("http://example.com/image2.jpg")

And none of this either:

hello nice-jpg src="http://example.com/header.jpeg"