ptath
1
noob questions
Finished [this](http://forums.plexapp.com/index.php?/topic/18002-plugin-for-non-imdb-site/), and now trying to parse html page for metadata.
This ((http://www.kinopoisk.ru/level/1/film/251733/) for example) works good
kinopoiskHtml = HTML.ElementFromURL(kinopoiskUrl)<br />
metadata.summary = str(kinopoiskHtml.xpath("//span[@class='_reachbanner_']")[0].text)<br />
metadata.tagline = str(kinopoiskHtml.xpath("//td[@style='color: #555']")[0].text)<br />
But I'm completely noob with other cases, especially when regex needed. Cannot find good examples how to parse something like this:
<br />
<tr><td class="type">год</td><td class=""><a href="/level/10/m_act%5Byear%5D/2009/">2009</a></td></tr><br />
<tr><td class="type">жанр</td><td><a href="/level/10/m_act%5Bgenre%5D/2/">фантастика</a>, <a href="/level/10/m_act%5Bgenre%5D/3/">боевик</a>, <a href="/level/10/m_act%5Bgenre%5D/8/">драма</a>, <a href="/level/10/m_act%5Bgenre%5D/10/">приключения</a>, <a href="/level/92/film/251733/">...</a></td></tr><br />
Year is inside *
* tag and is a part of *href* parameter. Please help me with xpath.
sander1
2
Hi! You probably don’t need regex in this case (pfew ;)). Using the contains function with your xpath can help you find the right a tag, like so:
kinopoiskHtml = HTML.ElementFromURL(kinopoiskUrl)<br />
...<br />
...<br />
year = int(kinopoiskHtml.xpath('//a[contains(@href, "year")]')[0].text)<br />
This searches for the string "year" inside all href attributes of a tags.
ptath
3
Oh thank you, it works =)
Any hint how to deal with lists (genres, actors etc like in html code above)?
sander1
4
I haven't worked with "genres" yet, but by looking at the Cine-Passion agent, it should be something like this:
<br />
metadata.genres.clear()<br />
genres = kinopoiskHtml.xpath('//a[contains(@href, "genre")]')<br />
<br />
for genre in genres:<br />
metadata.genres.add( genre.text.strip() )<br />
ptath
5
system
Closed
8
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.