Im writing some python that gets a webpages content and then uses xpath to navigate the contents using urllib2 and its simple enough. The issue I am having is the page I am fetching redirects to another page which sometimes throws a 404 and all that is wrong is a small part of the redirected url is incorrect.
What i'm trying to do (but failing) is attempt to load the page, if a 404 occurs on redirect, get the url that caused the 404 as it will not be the one I called, do some url modification and then retry.
If I am understanding you correctly, for info regarding urllib2 redirect handling you might want to take a look here.
Example 11.11 shows you how to handle the redirects with custom handlers. I would suggest parsing the location header from the 301/302 responses to get the redirect url(s). Once you reach the 404 then the last redirect url should be the one that sent you there.