Skip to content

Configuring the API

The behaviour of IOMED Medical Language API can vary depending on a series of parameters. Some of these can be tweaked via environment variables, so that you can adjust the behaviour to better suit your use case. This document lists the variables that can be set through the environment, describing what they are and how they affect the functioning of the API.

When finding terms in text, you can set up certain filters in order to tweak performance, along with the precision and sensitivity.

  • MAX_TERM_WORDS: maximum length in number of words for terms to be found. The higher this parameter is, the more computational power is required by the API. Most terms are no longer than 3 or 4 words. The plot and table below show the distribution of terms length across a medical corpus containing 600000 terms. Defaults to 9.
length in number of words count of terms
1 523435
2 58609
3 12396
4 3912
5 479
6 31
7 12
8 3
9 1
View plot
  • MIN_TERM_LENGTH: minimum length in number of chars for terms to be found. Allowing very short terms can increase the number of false positives. Some validated terms can skip this filter. Defaults to 3.

  • PARSE_QUANTITIES: the API is able to parse quantities and units, as in "200 mg/dl". Finding these in text is at the moment a costly procedure, and if these are not needed, the performance can be boosted a lot by avoiding it. You can use this parameter to deactivate the finding of units and quantities, by setting it to "false". Defaults to "true".

  • MAX_NUM_CHARACTERS: the API allows a maximum text length per request. This parameter defines the maximum length in characters for a text to be accepted by the API. Before increasing this parameter, consider splitting your text before sending it to the API. Our python library automatically splits the text, sends it to the API and joins the results. Defaults to 2000.

Fuzzy search if the process of finding terms which are close to, but not exactly like, a given term. It becomes a key feature in systems which perform term recognition on human-written text, since we often introduce typos or use word variations which would not be recognized by a strict match.

The API performs fuzzy search on some terms found in the text which seem candidates to be valid medical terms. The following behaviour can alter the functioning of fuzzy searching. Since it is a very expensive steps, these parameters can affect greatly the performance of the API.

  • FUZZY_SEARCH__MAX_NUM_WORDS: maximum length of a term, in number of words, to be searched through fuzzy search. Terms with more words can be identified via a perfect match, but wont be subject to fuzzy searching. This parameter affects greatly the performance. The table below shows the proportion of terms of different lengths found via fuzzy search, on a corpus where 600000 terms where found. Defaults to 4.
length in number of words count of terms
1 10640
2 2431
3 952
4 526
5 14
View plot
  • FUZZY_SEARCH__MIN_LENGTH: minimum length, in number of characters, for a term to be fuzzy searched. Searching very short terms (2-3 chars) can lead to more false positives. Defaults to 4.

  • FUZZY_SEARCH__SCORE_CUTOFF: when a term is fuzzy searched and we identify a match, there is a score which indicates the closeness between the term and its match. This scor ranges between 0 and 100, being 100 the closest possible (an exact match). This parameter determines the minimum score in order to accept the match. A higher score will produce less false positives but more false negatives. A lower score will produce more false positives, but less false negatives. Defaults to 90.