acres.util package¶
Package with general utilities modules.
Submodules¶
acres.util.acronym module¶
Utility functions related to acronyms.
-
class
acres.util.acronym.
Acronym
(acronym, left_context, right_context)¶ Bases:
tuple
-
property
acronym
¶ Alias for field number 0
-
property
left_context
¶ Alias for field number 1
-
property
right_context
¶ Alias for field number 2
-
property
-
acres.util.acronym.
create_german_acronym
(full)[source]¶ Creates an acronym out of a given multi-word expression.
@todo Use is_stopword?
- Parameters
full (
str
) – A full form containing whitespaces.- Return type
str
- Returns
acres.util.functions module¶
Module with general functions.
-
acres.util.functions.
create_ngram_statistics
(input_string, n_min, n_max)[source]¶ Creates a dictionary that counts each nGram in an input string. Delimiters are spaces.
Example: bigrams and trigrams nMin = 2 , nMax = 3 PROBE: # print(WordNgramStat(‘a ab aa a a a ba ddd’, 1, 4))
- Parameters
input_string (
str
) –n_min (
int
) –n_max (
int
) –
- Return type
Dict
[str
,int
]- Returns
-
acres.util.functions.
import_conf
(key)[source]¶ - Parameters
key (
str
) –- Return type
Optional
[str
]- Returns
-
acres.util.functions.
is_stopword
(str_in)[source]¶ Tests whether word is stopword, according to list.
For German, source http://snowball.tartarus.org/algorithms/german/stop.txt
- Parameters
str_in (
str
) –- Return type
bool
- Returns
-
acres.util.functions.
partition
(word, partitions)[source]¶ Find a bucket for a given word.
- Parameters
word (
str
) –partitions (
int
) –
- Return type
int
- Returns
acres.util.text module¶
Utility functions related to text processing.
-
acres.util.text.
clean
(text, preserve_linebreaks=False)[source]¶ Clean a given text to preserve only alphabetic characters, spaces, and, optionally, line breaks.
- Parameters
text (
str
) –preserve_linebreaks (
bool
) –
- Return type
str
- Returns
-
acres.util.text.
clean_whitespaces
(whitespaced)[source]¶ Clean up an input string of repeating and trailing whitespaces.
- Parameters
whitespaced (
str
) –- Return type
str
- Returns
-
acres.util.text.
clear_digits
(str_in, substitute_char)[source]¶ Substitutes all digits by a character (or string)
Example: ClearDigits(“Vitamin B12”, “°”):
TODO rewrite as regex
- Parameters
str_in (
str
) –substitute_char (
str
) –
- Return type
str