Monday, January 31, 2011

XPath at HTML in Python

This technique allows to treat HTML code as XML (if even HTML is not totally valid) and use XPath expressions over it.
from lxml import etree

content = '... some html ...'
# use the HTML parser explicitly to provide encoding
parser = etree.HTMLParser(encoding='utf-8')
# load the content using the parser
tree = etree.fromstring(content, parser)
# we've got a XML tree from HTML
# now get all links in the doc
links = tree.xpath(".//*/a")
for link in links:
    href = link.get('href') # get tag's attribute
    name = link.text() # text between open and close tags

Some links:
API reference
Usage tutorial

Monday, January 17, 2011

How to add your own favicon to your blogger.com blog

To get your own favicon to your blog:

  1. Store a picture you want to be the favicon in PNG and ICO formats in any web-accessible place. I used Google Docs for this because it allows to store any kind of files in public access.
  2. Put in your template this code
    <link href="url/to/favicon.png" rel="icon" type="image/png"></link>
    <link href="url/to/favicon.ico" rel="shortcut icon"></link>
  3. Notice that you must place the code above after
    <b:include data='blog' name='all-head-content'/>
    
    only. Otherwise blogger.com writes its own favicon and you waste your efforts.