404 - actual url

yoadster_gmail.com · February 27, 2014, 2:10pm

Hi, really hoping someone can help me out.

Im writing some python that gets a webpages content and then uses xpath to navigate the contents using urllib2 and its simple enough. The issue I am having is the page I am fetching redirects to another page which sometimes throws a 404 and all that is wrong is a small part of the redirected url is incorrect.

What i'm trying to do (but failing) is attempt to load the page, if a 404 occurs on redirect, get the url that caused the 404 as it will not be the one I called, do some url modification and then retry.

Can anyone help?

SHoTT · February 28, 2014, 5:20pm

@yoadster,

If I am understanding you correctly, for info regarding urllib2 redirect handling you might want to take a look here.

Example 11.11 shows you how to handle the redirects with custom handlers. I would suggest parsing the location header from the 301/302 responses to get the redirect url(s). Once you reach the 404 then the last redirect url should be the one that sent you there.

Hope this helps!

system · December 21, 2019, 1:25am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
WebVideoItem and HTTP Response Status 301 Dev/API Corner plugin-dev	3	437	December 20, 2019
HTTPError 500 and possible bug in urllib2_new.py Dev/API Corner plugin-dev	1	182	January 8, 2020
[Suggestion] FollowRedirects Parameter for HTTP Dev/API Corner plugin-dev	5	109	December 20, 2019
Help with Metadata Agent HTTP Error 302: I'm stumped Dev/API Corner scanner-agent-dev	2	102	December 20, 2019
Issue with urllib2 python package - no SNI within HTTPS request Dev/API Corner plugin-dev	4	441	February 4, 2019

404 - actual url

Related topics