python - Adding spaces on word boundaries in text extraction with lxml -

- August 15, 2014

an example lxml.html documentation:

>>> lxml import html >>> root = html.fragment_fromstring('<p>hello<br>world!</p>') >>> html.tostring(root,method='text') 'helloworld!'

my question: there easy (or "right") way producing 'hello world!' string instead?

you can try approach:

from lxml import html doc = html.document_fromstring('<p>hello<br>world!</p>')  br in doc.xpath("*//br"):     br.tail = " " + br.tail if br.tail else " "  doc.text_content()

this prints:

'hello world!'

Search This Blog

EIght

python - Adding spaces on word boundaries in text extraction with lxml -

Comments

Post a Comment

Popular posts from this blog

windows - Single EXE to Install Python Standalone Executable for Easy Distribution -

c# - Access objects in UserControl from MainWindow in WPF -

javascript - How to name a jQuery function to make a browser's back button work? -