acres.model package

Package containing domain models (from the MVC design pattern).

Submodules

acres.model.detection_standard module

Model class that represents a detection standard. A detection standard works like a allow/block list to filter out inputs from the topic list that are not proper acronyms (e.g. BEFUND, III). Such inputs are then not considered for evaluation purposes.

It is designed as an append-only list (i.e., entries do not need to be updated with variable inputs).

acres.model.detection_standard.filter_valid(standard)[source]

Filter out invalid entries from a gold standard. Invalid entries are not proper acronyms or repeated types.

Parameters

standard (Dict[str, bool]) –

Return type

Set[str]

Returns

acres.model.detection_standard.parse(filename)[source]

Parses a .tsv-formatted detection standard into a dictionary.

Parameters

filename (str) –

Return type

Dict[str, bool]

Returns

acres.model.detection_standard.parse_valid(filename)[source]

Wrapper method for both parse and filter_valid.

Parameters

filename (str) –

Return type

Set[str]

Returns

acres.model.detection_standard.update(previous, acronyms)[source]

Update a previous detection standard with new acronyms from a topic list, preserving order.

Parameters
  • previous (Dict[str, bool]) –

  • acronyms (List[Acronym]) –

Return type

Dict[str, bool]

Returns

acres.model.detection_standard.write(filename, standard)[source]

Write a detection standard into a file.

Parameters
  • filename (str) –

  • standard (Dict[str, bool]) –

Return type

None

Returns

acres.model.expansion_standard module

Model class that represents an expansion standard. An expansion standard is the main reference standard containing acronyms-expansion pairs and their evaluation following the TREC standard (2/1/0).

It is designed as an append-only list (i.e., entries do not need to be updated with variable inputs).

acres.model.expansion_standard.parse(filename)[source]

Parse a TSV-separated expansion standard into a dictionary.

Parameters

filename (str) –

Return type

Dict[str, Dict[str, int]]

Returns

A dictionary with acronyms pointing to expansions and an assessment value.

acres.model.expansion_standard.write(filename, previous, valid, topics)[source]

Write results in the TREC format, one candidate expansion per line.

Parameters
  • filename (str) –

  • previous (Dict[str, Dict[str, int]]) – A dictionary of acronyms mapped to their senses and assesments (if any).

  • valid (Set[str]) – A set of valid acronyms, normally parsed from a detection standard.

  • topics (List[Acronym]) – A topic list.

Return type

None

Returns

acres.model.ngrams module

Module to handle n-gram lists.

class acres.model.ngrams.FilteredNGramStat(ngram_size)[source]

Bases: object

Filtered NGramStat generator

This generator generates ngrams of a given size out of a ngramstat.txt file, while respecting each ngram frequency.

@todo ngramstat itself should be a generator

PRINT_INTERVAL = 1000000
TOKEN_SEPARATOR = ' '
acres.model.ngrams.filter_acronym_contexts(ngrams)[source]

Filter an iterable of tokens by the ones containing an acronym in the middle and convert them to Acronym tuples.

Parameters

ngrams (Iterator[List[str]]) –

Return type

Iterator[Acronym]

Returns

acres.model.topic_list module

Model class that represents a topic list. A topic list is used as main input (a la TREC) and thus can control which acronyms (together with their contexts) are to be considered for evaluation. A topic list can be used, e.g., to quickly switch between different evaluation scenarios such as acronyms collected from either the training or test dataset.

acres.model.topic_list.create(filename, chance, ngram_size=7)[source]

Create a topic list out of random n-grams with a given chance and size.

Parameters
  • filename (str) –

  • chance (float) –

  • ngram_size (int) –

Returns

acres.model.topic_list.parse(filename)[source]

Parses a TSV-formatted topic list into a list of acronyms (with context).

Parameters

filename (str) –

Return type

List[Acronym]

Returns

acres.model.topic_list.unique_types(topics)[source]

Extract types from a topic list.

Parameters

topics (List[Acronym]) –

Return type

Set[str]

Returns