xpath coding

If this is your HTML and you'd like to select the li with id "123":

  • Test

  • w00t

  • Bla

  • Hihi

You can do it like this:

my_id = '123'
html = HTML.ElementFromURL(url)

list_item = html.xpath(’//div/ul/li[@id="%s"]’ % my_id) # This xpath query will become //div/ul/li[@id=“123”]

http://docs.python.org/2/library/stdtypes.html#string-formatting

Thanks so much. 

I had done a web search and couldn't found anything on adding a variable. I ended up just using an if statement to only return the video objects if the value of the id matched my variable.

But it is good to know that I can add a variable to an xpath command.

Is there any good trick for dealing with carousels? I cannot seem to figure out how to get all the data on all the pages.

I did noticed that if you can pull up the last section of a carousel, then xpath will pick up everything in all sections of that carousel, but I do not know how you could hard code it to go to the last section.

For carousels, it’s usually better to try to uncover the APi that the webpage uses to load the carousel. Often, the HTTP traffic will reveal a request and JSON response when new items are loaded to the carousel. If you can find the request, then you can manipulate it to serve your own purposes.

Nice thread that suite to my problem with xpath. I need to parse an HTML page and to retrieve everything between two "h2" tags.

The HTML looks like this :

       

    Season 1

        
  •      ...
         
         
  •      ...
         
       

    Season 2

        
  •      ...
         
         
         ...
         
         ...
     
    I tried to use the xpath functions "preceding-sibling" & "following-sibling" who works without any problem in xpath tester but provoques an error in Plex.
    The exact sentence I used is : 
    //*[preceding-sibling::h2[@class='replay-title-type' and text()='Season 1'] and following-sibling::h2[@class='replay-title-type' and text()='Season 2']]

    Is there another way to realise the same operation or is there something wrong with my syntax?

    Nice thread that suite to my problem with xpath. I need to parse an HTML page and to retrieve everything between two "h2" tags.

    The HTML looks like this :

         

      Season 1

          
    •      ...
           
           
    •      ...
           
         

      Season 2

          
    •      ...
           
           
           ...
           
           ...
       
      I tried to use the xpath functions "preceding-sibling" & "following-sibling" who works without any problem in xpath tester but provoques an error in Plex.
      The exact sentence I used is : 
      //*[preceding-sibling::h2[@class='replay-title-type' and text()='Season 1'] and following-sibling::h2[@class='replay-title-type' and text()='Season 2']]

      Is there another way to realise the same operation or is there something wrong with my syntax?

      Are you trying to parse the "Season 1" and "Season 2" text or the content in the following

    • tags?

    • Are you trying to parse the "Season 1" and "Season 2" text or the content in the following

    • tags?

    • The content in the following

    • tags ;-)

    • There are lots of ways to do it... depends on whether you want to explicitly grab the content for a specific season or build a least of all the content, subdivided by season.

      where html_page is the content of the webpage:

      explicitly grab contents of season one...

      season_one = html_page.xpath('//h2[text()="Season 1"]/following-sibling::li')
      

      That will create a list of

    • tags. Then you would either iterate through them with a for loop or explicitly refer to them with an index (season_one[0] is the first
    • tag).
    • There are lots of ways to do it... depends on whether you want to explicitly grab the content for a specific season or build a least of all the content, subdivided by season.

      where html_page is the content of the webpage:

      explicitly grab contents of season one...

      season_one = html_page.xpath('//h2[text()="Season 1"]/following-sibling::li')
      

      That will create a list of

    • tags. Then you would either iterate through them with a for loop or explicitly refer to them with an index (season_one[0] is the first
    • tag).

    • Hi Mikedm139, Thx a lot for the reaction. It's really clear and I better now understand the powerfull of xPath.

      I did try your example but I still face the same problem : a critical error in the log. This is the error :

      2014-08-07 08:03:17,677 (2c1c) :  CRITICAL (core:572) - Exception when writing response for request '/video/TF1/mainmenu/S%C3%A9rie%20TF1' (most recent call last):
      Plex Media Server\Plug-ins\Framework.bundle\Contents\Resources\Versions\2\Python\Framework\interfaces\socketinterface.py", line 104, in _handle_request
          status, headers, body = type(self)._core.runtime.handle_request(self.request)
      Plex Media Server\Plug-ins\Framework.bundle\Contents\Resources\Versions\2\Python\Framework\components\runtime.py", line 918, in handle_request
          self._core.log.debug("Response: [%d] %s%d bytes", status, (original_type.__name__ + ", ") if original_type else '', len(body))
      UnboundLocalError: local variable 'body' referenced before assignment
       
      With simple xpath calls (this means without call to axes function of xPath), I don't get this error.
       
      Any idea how to solve this?

      Hi Mikedm139, Thx a lot for the reaction. It's really clear and I better now understand the powerfull of xPath.

      I did try your example but I still face the same problem : a critical error in the log. This is the error :

      2014-08-07 08:03:17,677 (2c1c) :  CRITICAL (core:572) - Exception when writing response for request '/video/TF1/mainmenu/S%C3%A9rie%20TF1' (most recent call last):
      Plex Media Server\Plug-ins\Framework.bundle\Contents\Resources\Versions\2\Python\Framework\interfaces\socketinterface.py", line 104, in _handle_request
          status, headers, body = type(self)._core.runtime.handle_request(self.request)
      Plex Media Server\Plug-ins\Framework.bundle\Contents\Resources\Versions\2\Python\Framework\components\runtime.py", line 918, in handle_request
          self._core.log.debug("Response: [%d] %s%d bytes", status, (original_type.__name__ + ", ") if original_type else '', len(body))
      UnboundLocalError: local variable 'body' referenced before assignment
       
      With simple xpath calls (this means without call to axes function of xPath), I don't get this error.
       
      Any idea how to solve this?

      Okay, the reason of the error is the French accentuation (é, è, ...).

      The use of the function unicode() solved the problem and like this I'm able to use my first xpath solution who give exact the result I want ;-)

      This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.