◆ __init__()
def __init__ |
( |
|
self, |
|
|
|
lang_filter = None |
|
) |
| |
◆ charset_name()
Reimplemented in HebrewProber, Latin1Prober, SingleByteCharSetProber, EscCharSetProber, MultiByteCharSetProber, CharSetGroupProber, UTF8Prober, EUCJPProber, SJISProber, CP949Prober, Big5Prober, EUCKRProber, EUCTWProber, and GB2312Prober.
◆ feed()
◆ filter_high_byte_only()
def filter_high_byte_only |
( |
|
buf | ) |
|
|
static |
◆ filter_international_words()
def filter_international_words |
( |
|
buf | ) |
|
|
static |
We define three types of bytes:
alphabet: english alphabets [a-zA-Z]
international: international characters [\x80-\xFF]
marker: everything else [^a-zA-Z\x80-\xFF]
The input buffer can be thought to contain a series of words delimited
by markers. This function works to filter all words that contain at
least one international character. All contiguous sequences of markers
are replaced by a single space ascii character.
This filter applies to all scripts which do not use English characters.
◆ filter_with_english_letters()
def filter_with_english_letters |
( |
|
buf | ) |
|
|
static |
Returns a copy of ``buf`` that retains only the sequences of English
alphabet and high byte characters that are not between <> characters.
Also retains English alphabet and high byte characters immediately
before occurrences of >.
This filter can be applied to all scripts which contain both English
characters and extended ASCII characters, but is currently only used by
``Latin1Prober``.
◆ get_confidence()
def get_confidence |
( |
|
self | ) |
|
◆ reset()
◆ state()
◆ lang_filter
◆ logger
◆ SHORTCUT_THRESHOLD
float SHORTCUT_THRESHOLD = 0.95 |
|
static |
The documentation for this class was generated from the following file:
- /home/passerat/Stage/flaskProject/venv/lib/python3.8/site-packages/chardet/charsetprober.py