python - customize BeautifulSoup's prettify by tag -
i wondering if possible make prettify
did not create new lines on specific tags.
i make span
, a
tags not split up, example:
doc="""<div><div><span>a</span><span>b</span> <a>link</a></div><a>link1</a><a>link2</a></div>""" bs4 import beautifulsoup bs soup = bs(doc) print soup.prettify()
below want print:
<div> <div> <span>a</span><span>b</span> <a>link</a> </div> <a>link1</a><a>link2</a> </div>
but print:
<div> <div> <span> </span> <span> b </span> <a> link </a> </div> <a> link1 </a> <a> link2 </a> </div>
placing inline styled tags on new lines add space between them, altering how actual page looks. link 2 jsfiddles displaying difference:
anchor tags on new lines
anchor tags next eachother
if you're wondering why matters beautifulsoup, because writing web-page debugger, , prettify function useful (along other things in bs4). if prettify document, risk altering things.
so, there way customize prettify
function can set not break tags?
i'm posting quick hack while don't find better solution.
i'm using on project avoid breaking textareas , pre tags. replace ['span', 'a'] tags on want prevent indentation.
markup = """<div><div><span>a</span><span>b</span> <a>link</a></div><a>link1</a><a>link2</a></div>""" # double curly brackets avoid problems .format() stripped_markup = markup.replace('{','{{').replace('}','}}') stripped_markup = beautifulsoup(stripped_markup) unformatted_tag_list = [] i, tag in enumerate(stripped_markup.find_all(['span', 'a'])): unformatted_tag_list.append(str(tag)) tag.replace_with('{' + 'unformatted_tag_list[{0}]'.format(i) + '}') pretty_markup = stripped_markup.prettify().format(unformatted_tag_list=unformatted_tag_list) print pretty_markup
Comments
Post a Comment