OpenQuizz
Une application de gestion des contenus pédagogiques
UniversalDetector Class Reference
Inheritance diagram for UniversalDetector:
Collaboration diagram for UniversalDetector:

Public Member Functions

def __init__ (self, lang_filter=LanguageFilter.ALL)
 
def reset (self)
 
def feed (self, byte_str)
 
def close (self)
 

Data Fields

 result
 
 done
 
 lang_filter
 
 logger
 

Static Public Attributes

float MINIMUM_THRESHOLD = 0.20
 
 HIGH_BYTE_DETECTOR = re.compile(b'[\x80-\xFF]')
 
 ESC_DETECTOR = re.compile(b'(\033|~{)')
 
 WIN_BYTE_DETECTOR = re.compile(b'[\x80-\x9F]')
 
dictionary ISO_WIN_MAP
 

Detailed Description

The ``UniversalDetector`` class underlies the ``chardet.detect`` function
and coordinates all of the different charset probers.

To get a ``dict`` containing an encoding and its confidence, you can simply
run:

.. code::

        u = UniversalDetector()
        u.feed(some_bytes)
        u.close()
        detected = u.result

Constructor & Destructor Documentation

◆ __init__()

def __init__ (   self,
  lang_filter = LanguageFilter.ALL 
)

Member Function Documentation

◆ close()

def close (   self)
Stop analyzing the current document and come up with a final
prediction.

:returns:  The ``result`` attribute, a ``dict`` with the keys
   `encoding`, `confidence`, and `language`.

◆ feed()

def feed (   self,
  byte_str 
)
Takes a chunk of a document and feeds it through all of the relevant
charset probers.

After calling ``feed``, you can check the value of the ``done``
attribute to see if you need to continue feeding the
``UniversalDetector`` more data, or if it has made a prediction
(in the ``result`` attribute).

.. note::
   You should always call ``close`` when you're done feeding in your
   document if ``done`` is not already ``True``.

◆ reset()

def reset (   self)
Reset the UniversalDetector and all of its probers back to their
initial states.  This is called by ``__init__``, so you only need to
call this directly in between analyses of different documents.

Field Documentation

◆ done

done

◆ ESC_DETECTOR

ESC_DETECTOR = re.compile(b'(\033|~{)')
static

◆ HIGH_BYTE_DETECTOR

HIGH_BYTE_DETECTOR = re.compile(b'[\x80-\xFF]')
static

◆ ISO_WIN_MAP

dictionary ISO_WIN_MAP
static
Initial value:
= {'iso-8859-1': 'Windows-1252',
'iso-8859-2': 'Windows-1250',
'iso-8859-5': 'Windows-1251',
'iso-8859-6': 'Windows-1256',
'iso-8859-7': 'Windows-1253',
'iso-8859-8': 'Windows-1255',
'iso-8859-9': 'Windows-1254',
'iso-8859-13': 'Windows-1257'}

◆ lang_filter

lang_filter

◆ logger

logger

◆ MINIMUM_THRESHOLD

float MINIMUM_THRESHOLD = 0.20
static

◆ result

result

◆ WIN_BYTE_DETECTOR

WIN_BYTE_DETECTOR = re.compile(b'[\x80-\x9F]')
static

The documentation for this class was generated from the following file: