Pages

Saturday, June 6, 2009

Checking Link Validity with ASP.NET

If you have an index of links, it can be important to check that a link really exists before showing it off on your site. Here's a way to do it.

This is really not a difficult proposition. Think about it - all you have to do is try to reach the page, and if you can't reach it, chances are it's not there. ASP.NET provides plenty of network functionality as part of the framework - this is one of the things the framework was designed for, after all - network computing - so why not?

In this case, the class we need to look at is the WebRequest class, a rather aptly named class which allows you to request documents over HTTP or HTTPS. The ASP.NET documentation (which you shoudl have handy, if you don't want to recieve everyone's wrath online) explains it quite well, but here's my example.

First, we set up a little form into which one can enter a URL. Let's pretend this is part of my 'links' section on this site. Into this textbox goes the URL, and the form gets submitted. Then, on submission, we simply request the page. Now, if it's successful, there's all manner of things you can do. You could use Regular expression to strip out the title tag and meta tags (this is explained in the ASP section), you could cache the page for later. You could pull all the text and perform analysis on it, like a mini-google to rank the page based on its content. Of course, if the requestisn't successful, then you can reject the link outright, and save yourself from the old dead-link syndrome.

Here's the simple checkURL function which is called on submission.

void checkURL(Object o, EventArgs e)
{

pnlDone.Visible = true;
pnlStart.Visible = false;
WebRequest objRequest = WebRequest.Create(strURL.Value);
lblExists.Text = "Link unchecked";
try
{
WebResponse objResponse = objRequest.GetResponse();
lblExists.Text = "Link exists";
objResponse.Close();
}
catch(WebException ex)
{
lblExists.Text = "Link doesn't exist ";
}
}

Things to note. The first thing to note is that my form is in one asp:panel, and the results are displayed in another, hence I do the show/hide shuffle at the start. Then I request the URL in a try/catch block. If the page is not found, the code will throw a WebException, which will usually in this context be a ProtocolError exception, though we're not concerned with that in this simple code. IF the exception is thrown, we tell the user. If it's not, well, we can go on to add the link to our database, carrying out whichever operations we want to on the way. Very simple, eh?

This article was inspired by a recent thread on ASP.NET, as a number of coming articles will be. These articles won't be long on content, but you can be sure the answers will be relevant to real-world problems, so keep checking back for some solutions inspired by real people.

No comments:

Post a Comment