acres.util package¶
Package with general utilities modules.
Submodules¶
acres.util.acronym module¶
Utility functions related to acronyms.
-
class
acres.util.acronym.Acronym(acronym, left_context, right_context)¶ Bases:
tuple-
property
acronym¶ Alias for field number 0
-
property
left_context¶ Alias for field number 1
-
property
right_context¶ Alias for field number 2
-
property
-
acres.util.acronym.create_german_acronym(full)[source]¶ Creates an acronym out of a given multi-word expression.
@todo Use is_stopword?
- Parameters
full (
str) – A full form containing whitespaces.- Return type
str- Returns
acres.util.functions module¶
Module with general functions.
-
acres.util.functions.create_ngram_statistics(input_string, n_min, n_max)[source]¶ Creates a dictionary that counts each nGram in an input string. Delimiters are spaces.
Example: bigrams and trigrams nMin = 2 , nMax = 3 PROBE: # print(WordNgramStat(‘a ab aa a a a ba ddd’, 1, 4))
- Parameters
input_string (
str) –n_min (
int) –n_max (
int) –
- Return type
Dict[str,int]- Returns
-
acres.util.functions.import_conf(key)[source]¶ - Parameters
key (
str) –- Return type
Optional[str]- Returns
-
acres.util.functions.is_stopword(str_in)[source]¶ Tests whether word is stopword, according to list.
For German, source http://snowball.tartarus.org/algorithms/german/stop.txt
- Parameters
str_in (
str) –- Return type
bool- Returns
-
acres.util.functions.partition(word, partitions)[source]¶ Find a bucket for a given word.
- Parameters
word (
str) –partitions (
int) –
- Return type
int- Returns
acres.util.text module¶
Utility functions related to text processing.
-
acres.util.text.clean(text, preserve_linebreaks=False)[source]¶ Clean a given text to preserve only alphabetic characters, spaces, and, optionally, line breaks.
- Parameters
text (
str) –preserve_linebreaks (
bool) –
- Return type
str- Returns
-
acres.util.text.clean_whitespaces(whitespaced)[source]¶ Clean up an input string of repeating and trailing whitespaces.
- Parameters
whitespaced (
str) –- Return type
str- Returns
-
acres.util.text.clear_digits(str_in, substitute_char)[source]¶ Substitutes all digits by a character (or string)
Example: ClearDigits(“Vitamin B12”, “°”):
TODO rewrite as regex
- Parameters
str_in (
str) –substitute_char (
str) –
- Return type
str