Sunday, September 5, 2010

The Ruby Way to do URL Validation

As we know, to do URL validation we can use regular expression such as:
my_url =~ /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$/ix

After I read blog post from Michael Bleigh, I realized that there is a Ruby way to do URL validation. The secret is regexp method of URI module. It will regenerate a regular expression based on the protocol name parameter that you pass in. URI::regexp will return 0 if URL is valid and return nil if URL is not valid.
require 'open-uri'
"http://google.com" =~ URI::regexp("ftp") # => nil
"http://google.com" =~ URI::regexp("http") # => 0
"google.com" =~ URI::regexp("ftp") # => nil
"google.com" =~ URI::regexp(%w(ftp http)) # => nil
"http://google.com" =~ URI::regexp(["ftp", "http", "https"]) # => 0

If you use Rails, URI::regexp can be plugged directly into your model validation.
class ExampleModel < ActiveRecord::Base
  validates_format_of :site, :with => URI::regexp(%w(http https))
end
Thank You Michael Bleigh for sharing this.

Update:
This approach seems flawed. When pass "http://" =~ URI::regexp("http") it will returns 0 indicating the URL to be valid. So, I recommend to use the regular expression provided at the beginning of the post.

"http://" =~ URI::regexp("http") # => 0
"http://" =~ /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$/ix # => nil
Thanks to Losk, who points out in the comments below.

5 comments:

  1. The approach seems flawed. For example if I open console, require 'open-uri' and then pass "http://" =~ URI::regexp("http"), it returns 0 indicating the URL to be valid, and while it may be in terms of what open-uri's doing, it doesn't seem like a URL I'd want to associate to any user who's entering information on my site.

    It's worth mentioning that when using the regular expression provided at the beginning of the post , it finds the example "http://" to be invalid.

    ReplyDelete
  2. you're right. when pass "http://" =~ URI::regexp("http"), it returns 0 and it means the "http://" is valid. Thanks for your pointer, Losk.

    ReplyDelete
  3. Your regexp doesn't support port numbers?

    ReplyDelete
  4. The RFC in which URIs were originally introduced specifies that "http://" is a valid URI:

    The URI syntax does not require that the scheme-specific-part have
    any general structure or set of semantics which is common among all
    URI. However, a subset of URI do share a common syntax for
    representing hierarchical relationships within the namespace. This
    "generic URI" syntax consists of a sequence of four main components:

    [scheme]://[authority][path]?[query]

    each of which, except [scheme], may be absent from a particular URI.

    http://www.ietf.org/rfc/rfc2396.txt

    So, URI::regexp works as advertised.

    ReplyDelete
  5. I've improved your initial regexp code to include ftp/ftps and also username/password that can be used on http/ftp situations...

    here it is:

    /^(http|https|ftp|ftps):\/\/(([a-z0-9]+\:)?[a-z0-9]+\@)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$/ix

    thanks for sharing, it helped a lot :)

    ReplyDelete

© Railsmine