acres.model package¶
Package containing domain models (from the MVC design pattern).
Submodules¶
acres.model.detection_standard module¶
Model class that represents a detection standard. A detection standard works like a allow/block list to filter out inputs from the topic list that are not proper acronyms (e.g. BEFUND, III). Such inputs are then not considered for evaluation purposes.
It is designed as an append-only list (i.e., entries do not need to be updated with variable inputs).
- acres.model.detection_standard.filter_valid(standard)[source]¶
Filter out invalid entries from a gold standard. Invalid entries are not proper acronyms or repeated types.
- Parameters
standard (
Dict
[str
,bool
]) –- Return type
Set
[str
]- Returns
- acres.model.detection_standard.parse(filename)[source]¶
Parses a .tsv-formatted detection standard into a dictionary.
- Parameters
filename (
str
) –- Return type
Dict
[str
,bool
]- Returns
- acres.model.detection_standard.parse_valid(filename)[source]¶
Wrapper method for both parse and filter_valid.
- Parameters
filename (
str
) –- Return type
Set
[str
]- Returns
acres.model.expansion_standard module¶
Model class that represents an expansion standard. An expansion standard is the main reference standard containing acronyms-expansion pairs and their evaluation following the TREC standard (2/1/0).
It is designed as an append-only list (i.e., entries do not need to be updated with variable inputs).
- acres.model.expansion_standard.parse(filename)[source]¶
Parse a TSV-separated expansion standard into a dictionary.
- Parameters
filename (
str
) –- Return type
Dict
[str
,Dict
[str
,int
]]- Returns
A dictionary with acronyms pointing to expansions and an assessment value.
- acres.model.expansion_standard.write(filename, previous, valid, topics)[source]¶
Write results in the TREC format, one candidate expansion per line.
- Parameters
filename (
str
) –previous (
Dict
[str
,Dict
[str
,int
]]) – A dictionary of acronyms mapped to their senses and assesments (if any).valid (
Set
[str
]) – A set of valid acronyms, normally parsed from a detection standard.topics (
List
[Acronym
]) – A topic list.
- Return type
None
- Returns
acres.model.ngrams module¶
Module to handle n-gram lists.
- class acres.model.ngrams.FilteredNGramStat(ngram_size)[source]¶
Bases:
object
Filtered NGramStat generator
This generator generates ngrams of a given size out of a ngramstat.txt file, while respecting each ngram frequency.
@todo ngramstat itself should be a generator
- PRINT_INTERVAL = 1000000¶
- TOKEN_SEPARATOR = ' '¶
acres.model.topic_list module¶
Model class that represents a topic list. A topic list is used as main input (a la TREC) and thus can control which acronyms (together with their contexts) are to be considered for evaluation. A topic list can be used, e.g., to quickly switch between different evaluation scenarios such as acronyms collected from either the training or test dataset.
- acres.model.topic_list.create(filename, chance, ngram_size=7)[source]¶
Create a topic list out of random n-grams with a given chance and size.
- Parameters
filename (
str
) –chance (
float
) –ngram_size (
int
) –
- Returns