xpath works really well for this stuff as it allows you to drill down by specifying tags (including id and class parameters) and get lists of items from the HTML that way.
If you have a look at some of the existing plugins you’ll find a lot of them use it and it should give you a good idea of how it works. Personally I found looking at actual code from other plugins a lot easier than googling for it too.
I just started “learning” to make my own plug-in exactly the way you suggested … looking at existing code. On the way I have questions and try to find answerers here
I will start a thread where I just write what I discover … this should hopefully help get more folks interested creating new plug-ins.
I think you may find that using lxml.etree will be pretty tedious compared to just going with xpath, not that I want to talk you out of going with lxml if you have a good reason for doing so. I would note that the added performance of lxml is kind of irrelevant in this case due to the fact that you are only working with single (or perhaps couple depending on the context) pages on a desktop machine. The performance with this kind of thing only really becomes an issue when you’re thinking about handling thousands of documents for say data mining or in a server context, so I wouldn’t make performance too high a priority.