When you need to extract data from web pages, you usually parse HTML documents into a DOM tree and then use libraries like BeautifulSoup or the ElementTree API to extract data from it. Some libraries also support XPath expressions which can express more complex traversal and search patterns.
Everything about XPath 1.0 is defined in W3C lengthly specification but it can be obscure to read at first. The basics are quite simple to grasp though, and this talk will go over the most useful syntax patterns you need to get started.
What we'll cover: - axes and how to look for specific tags, parent element, children or siblings nodes - predicates and selecting nodes based on attribute or content values - built-in string functions that you should know about - EXSLT extensions supported by lxml and how they can solve tricky lookups