acres.util package¶
Package with general utilities modules.
Submodules¶
acres.util.acronym module¶
Utility functions related to acronyms.
- class acres.util.acronym.Acronym(acronym, left_context, right_context)¶
Bases:
tuple
- property acronym¶
Alias for field number 0
- property left_context¶
Alias for field number 1
- property right_context¶
Alias for field number 2
- acres.util.acronym.create_german_acronym(full)[source]¶
Creates an acronym out of a given multi-word expression.
@todo Use is_stopword?
- Parameters
full (
str
) – A full form containing whitespaces.- Return type
str
- Returns
acres.util.functions module¶
Module with general functions.
- acres.util.functions.create_ngram_statistics(input_string, n_min, n_max)[source]¶
Creates a dictionary that counts each nGram in an input string. Delimiters are spaces.
Example: bigrams and trigrams nMin = 2 , nMax = 3 PROBE: # print(WordNgramStat(‘a ab aa a a a ba ddd’, 1, 4))
- Parameters
input_string (
str
) –n_min (
int
) –n_max (
int
) –
- Return type
Dict
[str
,int
]- Returns
- acres.util.functions.import_conf(key)[source]¶
- Parameters
key (
str
) –- Return type
Optional
[str
]- Returns
- acres.util.functions.is_stopword(str_in)[source]¶
Tests whether word is stopword, according to list.
For German, source http://snowball.tartarus.org/algorithms/german/stop.txt
- Parameters
str_in (
str
) –- Return type
bool
- Returns
- acres.util.functions.partition(word, partitions)[source]¶
Find a bucket for a given word.
- Parameters
word (
str
) –partitions (
int
) –
- Return type
int
- Returns
acres.util.text module¶
Utility functions related to text processing.
- acres.util.text.clean(text, preserve_linebreaks=False)[source]¶
Clean a given text to preserve only alphabetic characters, spaces, and, optionally, line breaks.
- Parameters
text (
str
) –preserve_linebreaks (
bool
) –
- Return type
str
- Returns
- acres.util.text.clean_whitespaces(whitespaced)[source]¶
Clean up an input string of repeating and trailing whitespaces.
- Parameters
whitespaced (
str
) –- Return type
str
- Returns
- acres.util.text.clear_digits(str_in, substitute_char)[source]¶
Substitutes all digits by a character (or string)
Example: ClearDigits(“Vitamin B12”, “°”):
TODO rewrite as regex
- Parameters
str_in (
str
) –substitute_char (
str
) –
- Return type
str