python - Adding spaces on word boundaries in text extraction with lxml -


an example lxml.html documentation:

>>> lxml import html >>> root = html.fragment_fromstring('<p>hello<br>world!</p>') >>> html.tostring(root,method='text') 'helloworld!' 

my question: there easy (or "right") way producing 'hello world!' string instead?

you can try approach:

from lxml import html doc = html.document_fromstring('<p>hello<br>world!</p>')  br in doc.xpath("*//br"):     br.tail = " " + br.tail if br.tail else " "  doc.text_content() 

this prints:

'hello world!' 

Comments

Popular posts from this blog

windows - Single EXE to Install Python Standalone Executable for Easy Distribution -

c# - Access objects in UserControl from MainWindow in WPF -

javascript - How to name a jQuery function to make a browser's back button work? -