Beautifulsoup In Python Not Parsing Right
I am running Python 2.7.5 and using the built-in html parser for what I am about to describe. The task I am trying to accomplish is to take a chunk of html that is essentially a re
Solution 1:
This is not a parsing problem; it is about encoding, rather.
Whenever working with text which might contain non-ASCII characters (or in Python programs which contain such characters, e.g. in comments or docstrings), you should put a coding cookie in the first or - after the shebang line - second line:
#!/usr/bin/env python# -*- coding: utf-8 -*-
... and make sure this matches your file encoding (with vim: :set fenc=utf-8
).
Solution 2:
BeautifulSoup tries to guess the encoding, sometimes it makes a mistake, however you can specify the encoding by adding the from_encoding
parameter:
for example
soup = BeautifulSoup(html_text, from_encoding="UTF-8")
The encoding is usually available in the header of the webpage
Post a Comment for "Beautifulsoup In Python Not Parsing Right"