Skip to content Skip to sidebar Skip to footer

Beautifulsoup In Python Not Parsing Right

I am running Python 2.7.5 and using the built-in html parser for what I am about to describe. The task I am trying to accomplish is to take a chunk of html that is essentially a re

Solution 1:

This is not a parsing problem; it is about encoding, rather.

Whenever working with text which might contain non-ASCII characters (or in Python programs which contain such characters, e.g. in comments or docstrings), you should put a coding cookie in the first or - after the shebang line - second line:

#!/usr/bin/env python# -*- coding: utf-8 -*-

... and make sure this matches your file encoding (with vim: :set fenc=utf-8).

Solution 2:

BeautifulSoup tries to guess the encoding, sometimes it makes a mistake, however you can specify the encoding by adding the from_encoding parameter: for example

soup = BeautifulSoup(html_text, from_encoding="UTF-8")

The encoding is usually available in the header of the webpage

Post a Comment for "Beautifulsoup In Python Not Parsing Right"