white_list and advanced tag filtering

Hi all,

I have a form that uses tinymce for users to be able to enter text. The rules are:

Users should be able to use IMG SRC= tags to inline any graphic stored on the same webserver the site is hosted from Users must not be able to use IMG SRC tags to inline any other graphics. TinyMCE emotion icons are allowed (really just a subset of #1, because TinyMCE emotes are just little graphic files stored in the TinyMCE code tree).

So for exmaple, given that a user is filling out a form at: www.example.com/post/new

The following form input:

Could you just drop the http://…/ portion of src on all images (so the first one becomes /nasty.jpg and the second becomes /images/grandma) ?

Fred

I suppose so, but then that will leave messy broken images on my pages and clog up my logs with requests for invalid graphics.

I'm wrangling a regex, but I can figure out how to negate a string. What I want is something like this:

STARTS WITH "<img " AND THEN IS NOT "src=/" AND THEN IS "* >" That will trap the contents of <img src=http...> or <img src="http...> etc. all in one reference for deletion with gsub.

/<img [^src=\/]*>/ obviously doesn't work because it's matching 's' OR 'r' OR 'c'... I tried enclosing "src=\/" in quotes, single quotes, back quotes, curly braces, etc. but can't figure this out.

I suck at regex and google isn't helping this time.

Any ideas, group?

Digging this thread up from the grave, as it is reaching critical mass.

Refresh of the problem: In user submitted textareas the <img src=...> tag should be stripped from user input, unless they are sourcing a image on the application's server. So <img src="/...> and <img src=/...> should be allowed, any other <img src=..."> should be stripped.

I'm using the whitelist plugin, natch.

I had a lightbulb go off over my head, and I added 'img' back into WhiteListHelper.tags, took 'src' out of WhiteListHelper.attributes, and added 'src="/ src=/' to WhiteListHelper.attributes. I figured it would strip any src tag that didn't start with "/ or / and that would be good enough for my purposes.

Not so much. WhiteList always strips the src part of the tag, leaving the lone <img>, even if the tag is <img src=/photos...>

So, any ideas?

Thanks!

Andrew Ohnstad wrote:

I suppose so, but then that will leave messy broken images on my pages and clog up my logs with requests for invalid graphics.

I'm wrangling a regex, but I can figure out how to negate a string. What I want is something like this:

STARTS WITH "<img " AND THEN IS NOT "src=/" AND THEN IS "* >" That will trap the contents of <img src=http...> or <img src="http...> etc. all in one reference for deletion with gsub.

/<img [^src=\/]*>/ obviously doesn't work because it's matching 's' OR 'r' OR 'c'... I tried enclosing "src=\/" in quotes, single quotes, back quotes, curly braces, etc. but can't figure this out.

I suck at regex and google isn't helping this time.

On Apr 4, 11:47�am, Frederick Cheung <frederick.che...@gmail.com>

Might try something like:

string=<<EOF   <img src=/...>   <img src="/...>   <img src=http://…> EOF

img_reg = Regexp.new('<img src=[^"/].*>') img_reg.match(string) puts match.to_s

This is more of a blacklist solution however. Hope it helps get you on the right track.