Censor Search Queries

No, I don't want to censor what people are querying, I want to censor the searches I display.

I display the last 5 queries entered in the search box on my website. I do not want to display offensive terms - such as, the '7 words you can't say on TV. I am already censoring the words themselves, but I want to censor terms that contain those words.

In case you need a reason why: the webpage is on the website of a major University. I want to provide students with search topic tips, but do not want to display any offensive words.

This is what I am doing: censor = get_censor # returns an array of offensive words unless censor.include?(query) mod.update_search

How do I extend this to filter out queries that contain the censored terms, I am thinking of MYSQL pattern matching with like but am not sure how to do it.

Thanks in advance, K

Come on - this is an interesting problem. Too easy? Too hard? Too controversial?

Kim wrote:

Come on - this is an interesting problem. Too easy? Too hard? Too controversial?

Well it certainly is interesting and can be a can of worms as it will require regular maintenance. This is because there are many patterns can be used to work around your censor.

I'd suggest you get to know regex pattern matching and see how it can be done using MySQL functions. I think performance may be an issue if you have a lot of content to filter.

If you can try to filter as you store content to DB or run an independent background process that cleans up the DB content while you sleep.

-- Long http://MeandmyCity.com/ - Free online business directory for local communities http://edgesoft.ca/blog/read/2 - No-Cookie Session Support plugin for Rails

Yes my idea is to filter as I store in the DB. It seems like I should be able to use wildcards around a value to match it to a list of words. For example, if a user enters ‘ass’ the query will get filtered out and not displayed, but if they enter ‘assmonkey’ then it will not get filtered. If wildcards were used then it should filter it.

I do not expect to be able to filter every offensive query, but if I can get some of them I would be happier.

I am familiar with reg ex, but how would I apply it? See my first post.

Any other takers?

Am I correct in assuming you just want to match any censored phrase in the query?

if so, it should be as simple as:

/#{censor.join('|')}/i.match(query)

-Shawn

Kim Griggs wrote:

Yes my idea is to filter as I store in the DB. It seems like I should be able to use wildcards around a value to match it to a list of words. For example, if a user enters 'ass' the query will get filtered out and not displayed, but if they enter 'assmonkey' then it will not get filtered. If wildcards were used then it should filter it.

I do not expect to be able to filter every offensive query, but if I can get some of them I would be happier.

I agree. Though you may want to consider if the following is acceptable. a s s a # s # s a.s.s etc.

I hope you get the idea.

I am familiar with reg ex, but how would I apply it? See my first post.

Any other takers?

I think Shawn may have eluded to the start of a possible solution...

-- Long

I did a little searching, looks like there is a plugin that does much of what you need:

http://locusfoc.us/2007/2/13/name-nanny-plugin

-Shawn

Shawn Roske wrote:

I did a little searching, looks like there is a plugin that does much of what you need:

http://locusfoc.us/2007/2/13/name-nanny-plugin

Good find Shawn!

-- Long

looks good - I will try it out. Thanks to all.