Sunday, September 5, 2010

The Ruby Way to do URL Validation

As we know, to do URL validation we can use regular expression such as:
my_url =~ /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$/ix

After I read blog post from Michael Bleigh, I realized that there is a Ruby way to do URL validation. The secret is regexp method of URI module. It will regenerate a regular expression based on the protocol name parameter that you pass in. URI::regexp will return 0 if URL is valid and return nil if URL is not valid.
require 'open-uri'
"http://google.com" =~ URI::regexp("ftp") # => nil
"http://google.com" =~ URI::regexp("http") # => 0
"google.com" =~ URI::regexp("ftp") # => nil
"google.com" =~ URI::regexp(%w(ftp http)) # => nil
"http://google.com" =~ URI::regexp(["ftp", "http", "https"]) # => 0

If you use Rails, URI::regexp can be plugged directly into your model validation.
class ExampleModel < ActiveRecord::Base
  validates_format_of :site, :with => URI::regexp(%w(http https))
end
Thank You Michael Bleigh for sharing this.

Update:
This approach seems flawed. When pass "http://" =~ URI::regexp("http") it will returns 0 indicating the URL to be valid. So, I recommend to use the regular expression provided at the beginning of the post.

"http://" =~ URI::regexp("http") # => 0
"http://" =~ /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$/ix # => nil
Thanks to Losk, who points out in the comments below.

3 comments:

  1. The approach seems flawed. For example if I open console, require 'open-uri' and then pass "http://" =~ URI::regexp("http"), it returns 0 indicating the URL to be valid, and while it may be in terms of what open-uri's doing, it doesn't seem like a URL I'd want to associate to any user who's entering information on my site.

    It's worth mentioning that when using the regular expression provided at the beginning of the post , it finds the example "http://" to be invalid.

    ReplyDelete
  2. you're right. when pass "http://" =~ URI::regexp("http"), it returns 0 and it means the "http://" is valid. Thanks for your pointer, Losk.

    ReplyDelete
  3. Your regexp doesn't support port numbers?

    ReplyDelete

Popular Posts