Skip to content

Annotations

When you send a medical text to tagger/annotation, or tagger/annotation/<iomed-type>, you receive a JSON object as response. This object contains three entries:

{
  "annotations": [...],
  "relations": [...],
  "version": "v0.3.25",
}  
  • version: the current version of the API.
  • annotations: a list of all the medically relevant concepts found in the text. Each item of the list is an object representing a single medical concept (Annotation) found in the text.
  • relations: a list of relationships between the concepts found in the text. You can find out more here.

relations

Throughout this section we explain the content of annotations. To understand the content of relations please refer here.

annotation

When we use the word "annotation" we are refering to a medical concept found in a text.

Below you can find two examples of responses. Each concept found in the text is represented by an entry in the array 'annotations', and contains the entries match, type, code and characteristics. Below we explain each of these entries.

Sample of response from the API
{
  "annotations": [
    {
      "code": {
        "umls": "C0026591",
        "snomed_ct": [
          "72705000"
        ],
        "loinc": [
          "LA10417-6",
          "LP6983-3"
        ]
      },
      "type": {
        "umls": "Family Group",
        "iomed": "Group"
      },
      "match": {
        "begin": 0,
        "end": 5,
        "text": "madre",
        "found_as": "madre",
        "preferred_term": "maternal"
      },
      "characteristics": {},
      "id": 0
    },
    {
      "code": {
        "umls": "C0002395",
        "snomed_ct": [
          "26929004"
        ],
        "icd9_cm": [
          "331.0"
        ],
        "icd10_cm": [
          "G30",
          "G30.9"
        ],
        "loinc": [
          "LA22313-3",
          "MTHU020798"
        ]
      },
      "type": {
        "umls": "Disease or Syndrome",
        "iomed": "Disease or Syndrome"
      },
      "match": {
        "begin": 10,
        "end": 19,
        "text": "alzheimer",
        "found_as": "alzheimer",
        "preferred_term": "enfermedad de Alzheimer (trastorno)"
      },
      "characteristics": {
        "family_member": {
          "match": {
            "begin": 0,
            "end": 5,
            "text": "madre",
            "found_as": "madre",
            "preferred_term": "maternal"
          },
          "code": {
            "umls": "C0026591",
            "snomed_ct": [
              "72705000"
            ],
            "loinc": [
              "LA10417-6",
              "LP6983-3"
            ]
          }
        }
      },
      "id": 1,
      "head": 0
    }
  ],
  "relations": [
    {
      "from": 1,
      "to": 0,
      "rel": "has_experiencer"
    }
  ],
  "version": "v0.9.0-rc1"
}
{
  "annotations": [
    {
      "code": {
        "umls": "C0004604",
        "snomed_ct": [
          "161891005"
        ],
        "icd9_cm": [
          "724.5"
        ],
        "icd10_cm": [
          "M54",
          "M54.9"
        ],
        "loinc": [
          "MTHU020857"
        ]
      },
      "type": {
        "umls": "Sign or Symptom",
        "iomed": "Finding"
      },
      "match": {
        "begin": 11,
        "end": 27,
        "text": "dolor de espalda",
        "found_as": "dolor de espalda",
        "preferred_term": "Dorsalgia"
      },
      "characteristics": {
        "negative": {
          "trigger": {
            "begin": 0,
            "end": 2,
            "text": "no",
            "found_as": "no",
            "preferred_term": ""
          }
        }
      },
      "id": 1,
      "head": 0
    }
  ],
  "relations": [],
  "version": "v0.9.0-rc1"
}

Annotation: match

Each concept has a match entry which specifies where in the text the concept was found. A match is composed by the following entries:

  • begin (int): initial position in the text.
  • end (int): final position in the text.
  • text (string): exact match of the text.
  • found_as (string): how the concept was found in our database.
  • preferred_term (string): canonical term (in Spanish and according to the UMLS) for the given concept.
  • fuzzy_score (int): an integer from 0 to 100. Only present if the concept has been found through fuzzy search.
Example
"match": {
    "begin": 5,
    "end": 10,
    "text": "cabza",
    "found_as": "cabeza",
    "fuzzy_score": 87
}

Annotation: code

The entry code inside an annotation object is an object that might contain several entries. Each entry's key is a string indicating to which ontology the code belongs, while the value is the code (or list of codes) under the ontology.

  • umls: a string indicating the CUI (Concept Unique Identifier) under the UMLS ontology.
  • snomed_ct: a list of possible SNOMED CT codes.
  • icd9_cm: a list of possible ICD9 CM codes.
  • icd10_cm: a list of possible ICD10 CM codes.
  • icd10_pcs: a list of possible ICD10 PCS codes.
  • loinc: a list of possible LOINC codes.
Example
"code": {
    "umls": "C0018681",
    "snomed_ct": ["206948006", "25064002", "206946005", "162209005", "271329006", "158298001", "139490008", "158296002"],
    "icd10_cm": ["R51"]
}

Note

The presence of a code entry depends on the existence of the concept under each terminology. If no code is found that represents the concept under that terminology, there will be no entry for the terminology.

Current supported ontologies are:

UMLS

Unified Medical Language System

https://www.nlm.nih.gov/research/umls/

Format of code.umls

  • key name: "code.umls"
  • value: umls code, eg. "C0018681"

SNOMED CT

Systematized Nomenclature of Medicine - Clinical Terms

https://www.nlm.nih.gov/healthit/snomedct/us_edition.html

Format of code.snomed_ct

  • key name: "code.snomed_ct"
  • value: list of SNOMED CT codes, eg. ["206948006", "25064002"]

ICD9 CM

International Classification of Diseases, Clinical Modification.

https://www.cdc.gov/nchs/icd/icd9cm.htm

Format of code.icd9_cm

  • key name: "code.icd9_cm"
  • value: list of ICD9 CM codes, eg. ["A10", "A10.1"]

ICD10 CM

International Classification of Diseases, Clinical Modification.

https://www.cms.gov/Medicare/Coding/ICD10/index.html

Format of code.icd10_cm

  • key name: "code.icd10_cm"
  • value: list of ICD10 CM codes, eg. ["A10", "A10.1"]

ICD10 PCS

International Classification of Diseases, Procedure Coding System

https://www.cms.gov/Medicare/Coding/ICD10/index.html

Format of code.icd10_pcs

  • key name: "code.icd10_pcs"
  • value: list of ICD10 PCS codes, eg. ["0R9D3Z"]

LOINC

Logical Observation Identifiers Names and Codes terminology

https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/LNC/

Format of code.loinc

  • key name: "code.loinc"
  • value: list of LOINC codes, eg. ["47679-6"]

Annotation: type

The entry type inside an annotation object, as code, is an object that might contain several entries. Each entry's key is a string indicating to which ontology this type belongs, while the value is the type itself.

Example
"type": {
    "umls": "Sign or Symptom",
    "iomed": "Finding"
}

Currently supported types are:

UMLS Semantic Types

Unified Medical Language System

https://www.nlm.nih.gov/research/umls/

Format of type.umls

  • key name: type.umls
  • value: umls semantic type, eg. "Sign or Symptom"

IOMED types

Table of IOMED Types.

Format of type.iomed

  • key name: type.iomed
  • value: IOMED Type, eg. "Finding"

The IOMED types are a set of categories that we consider big enough to capture the different types of medical concepts, but small enought to be useful in real applications. You can find them here.

Annotation: characteristics

The entry characteristics is an object which contains information relevant to this concept. Each piece of information is represented as a key - value pair. Possible characteristics are:

Negativity

If present, indicates that the medical concept is negated. The property trigger has the same format as Annotation.match, and indicates which span of text has caused the negation.

Format of characteristics.negative

  • key: characteristics.negative
  • value: { "trigger": { "begin": ..., "end", ..., "text": ..., "found_as" ...} }
Example

Result of parsing "no dolor":

{
  "annotations": [
    {
      "code": {
        "snomed_ct": [
          "22253000"
        ],
        "icd10_cm": [
          "R52"
        ],
        "umls": "C0030193",
        "loinc": [
          "LA17107-6",
          "LA27491-2",
          "LA7460-4",
          "MTHU021175",
          "MTHU029813"
        ],
        "icd9_cm": [
          "338-338.99"
        ]
      },
      "id": 1,
      "characteristics": {
        "negative": {
          "trigger": {
            "preferred_term": "",
            "end": 2,
            "found_as": "no",
            "begin": 0,
            "text": "no"
          }
        }
      },
      "match": {
        "preferred_term": "dolor (hallazgo)",
        "end": 8,
        "found_as": "dolor",
        "begin": 3,
        "text": "dolor"
      },
      "type": {
        "iomed": "Finding",
        "umls": "Sign or Symptom"
      }
    }
  ],
  "version": "v0.8.1"

Family member

This characteristic can only appear in concepts with type.iomed any of "Disease or Syndrome", "Finding", "Therapeutic or Preventive Procedure", "Phenomenon" or "Activity". If present, it indicates that the disease belongs to a relative of the patient. The value is an object which contains information about the family member which carries the disease. The property match has the same format as Annotation.match, and points to the span of text in which the family member was mentioned. The property code indicates the UMLS and SNOMED CT codes for that family member.

Format of characteristics.family_member

  • key: characteristics.family_member
  • value: { "code": { "umls": "", "snomed_ct": [] }, "match": { "begin": ..., "end", ..., "text": ..., "found_as" ...} }
Example
"characteristics": {
    "family_member": {
        "match": {
            "end": 5,
            "text": "padre",
            "found_as": "padre",
            "begin": 0
        },
        "code": {
            "snomed_ct": [
                "66839005"
             ],
            "umls": "C0015671"
        }
 }

Quantity

In the case of concepts which express some kind of quantity (e.g. "200 mg/mL", "600 mg/24h", etc.) with IOMED type "Quantitative Concept", sometimes the API is able to parse the concept to extract more information from it.

Format of characteristics.quantity

  • key: characteristics.quantity
  • value: { "magnitude": ..., "units": ..., "magnitude_base_units": ..., "base_units": ..., "code": ... }
    • units (str): the original units of the quantity, expressed with the full name. E.g. "miligram / deciliter" in "200 mg/dl".
    • magnitude (float): the numeric part of the quantity. E.g. 200.0 in "200 mg/dl"
    • base_units (str): units, expressed in the standard way. E.g. "kilogram / meter ** 3" in "200 mg/dl"
    • magnitude_base_units (float): the numeric part of the quantity, transformed to match the standard units (base_units). Eg. 2 in "200 mg/dl", since "200 mg/dl" is equal to "2 kilogram / meter ** 3".
    • code.umls (str): the UMLS code of the base units.
    • code.snomed_ct (list of str): the SNOMED CT code (or codes) which identify the base units.
Example

Result of parsing "200 mg/dl":

{
  "version": "v0.8.1",
  "annotations": [
    {
      "id": 1,
      "code": {
        "snomed_ct": [
          "258798001",
          "396153000",
          "258794004"
        ],
        "umls": "C0439294"
      },
      "match": {
        "preferred_term": "",
        "found_as": "200 mg/dl",
        "begin": 0,
        "end": 9,
        "text": "200 mg/dl"
      },
      "characteristics": {
        "quantity": {
          "units": "milligram / deciliter",
          "magnitude": 200,
          "base_units": "kilogram / meter ** 3",
          "magnitude_base_units": 2,
          "code": {
            "snomed_ct": [
              "258798001",
              "396153000",
              "258794004"
            ],
            "umls": "C0439294"
          }
        }
      },
      "type": {
        "iomed": "Quantitative Concept"
      }
    }
  ]
}

Quantifier

When we find a quantity in the text, it very rarely appears alone. Normaly it appears as the result of some test or laboratory procedure, for example, as in "HDL 200 mg/dl". In these cases, it is interesting to not only parse the laboratory procedure or substance (HDL) and the quantity (200 mg/dl), but also to show explicitly that the quantity belongs to that specific laboratory procedure. We express this adding a property characteristics.quantifier to the laboratory procedure. Inside the characteristics.quantifier of HDL we would find a match object which points to the text span "200 mg/dl". If the quantity has units and could be correctly processed, characteristics.quantifier will have also the same entries as the characteristics.quantity of the quantity (units, magnitude, base_units, magnitude_base_units). characteristics.quantifier.code has the same format as all code entries. The code will be that of the base units of the or, in the case the quantity is just a number without units, the code will be UMLS "C0237753" ("Number").

Warning

When a quantity is assigned to another concept as a characteristics.quantifier, the quantity concept will be removed from the API output.

Only concepts the following iomed types can have a characteristics.quantifier:

  • Diagnostic Procedure
  • Finding
  • Laboratory or Test Result
  • Laboratory Procedure
  • Organism Attribute
  • Substance

Format of characteristics.quantifier

  • key: characteristics.quantifier
  • value: { "match" ..., "magnitude": ..., "units": ..., "magnitude_base_units": ..., "base_units": ..., "code": ... }
Example

Result of parsing "HDL 200 mg/dl":

{
  "version": "v0.8.1",
  "annotations": [
    {
      "id": 1,
      "code": {
        "snomed_ct": [
          "28036006",
          "17888004"
        ],
        "umls": "C0392885"
      },
      "match": {
        "preferred_term": "Lipoproteínas de alta densidad",
        "found_as": "HDL",
        "begin": 0,
        "end": 3,
        "text": "HDL"
      },
      "characteristics": {
        "quantifier": {
          "units": "milligram / deciliter",
          "magnitude": 200,
          "magnitude_base_units": 2,
          "match": {
            "preferred_term": "",
            "begin": 4,
            "end": 13,
            "found_as": "200 mg/dl",
            "text": "200 mg/dl"
          },
          "code": {
            "snomed_ct": [
              "258798001",
              "396153000",
              "258794004"
            ],
            "umls": "C0439294"
          },
          "base_units": "kilogram / meter ** 3"
        }
      },
      "type": {
        "iomed": "Laboratory Procedure",
        "umls": "Laboratory Procedure"
      }
    }
  ]
}

Qualifier

Qualifiers are concepts which qualify or complement another concept. For example, in "normal RMN", "normal" can be a qualifier of "RMN". In "2 weeks pain", "2 weeks" can be a qualifier of "pain". Qualifiers appear under characteristics.qualifier. Concepts with the following iomed types can have a qualifier:

  • Activity
  • Anatomy
  • Diagnostic Procedure
  • Finding
  • Laboratory or Test Result
  • Laboratory Procedure
  • Organism
  • Organism Attribute
  • Substance

Info

When a concept is assigned as qualifier to another concept, the qualifier concept is removed from the API output.

Example

Result of parsing "hemograma normal":

{
  "annotations": [
    {
      "code": {
        "snomed_ct": [
          "363680008"
        ],
        "umls": "C0043299",
        "icd9_cm": [
          "87"
        ]
      },
      "id": 1,
      "characteristics": {
        "qualifier": {
          "code": {
            "snomed_ct": [
              "17621005"
            ],
            "umls": "C0205307"
          },
          "match": {
            "begin": 3,
            "end": 9,
            "found_as": "normal",
            "preferred_term": "sin particularidades",
            "text": "normal"
          }
        }
      },
      "match": {
        "preferred_term": "Roentgenografía",
        "end": 2,
        "found_as": "rx",
        "begin": 0,
        "text": "rx"
      },
      "type": {
        "iomed": "Diagnostic Procedure",
        "umls": "Diagnostic Procedure"
      }
    }
  ],
  "version": "v0.8.1"
}

Date

Dates in the text are found and parsed as concepts with type.iomed_type == "Temporal Concept". Also, they are parsed and transformed into a string with format "2018-07-28T00:00:00", which can be found under characteristics.date.

Dates can be parsed even if they are incomplete, as in "27 de octubre" or in "hace 3 días". In these cases, the API parses the date relative to the date in which the text was produced. By default it assumes it was produced now (in the moment of parsing), but you can tweak this behaviour by sending a datetime in your request. Check how to do this here.

Format of characteristics.date

  • key: characteristics.date
  • value: { "date": "2017-08-23T15:43:22" }
Example

Example of parsing "27 de octubre de 2015":

"version": "v0.8.1",
"annotations": [
  {
    "match": {
      "found_as": "27 de octubre de 2015",
      "text": "27 de octubre de 2015",
      "end": 21,
      "preferred_term": "2015-10-27T00:00:00",
      "begin": 0
    },
    "code": {
      "loinc": [
        "MTHU021546",
        "LP182451-7"
      ],
      "umls": "C0011008",
      "snomed_ct": [
        "410671006",
        "410672004"
      ]
    },
    "id": 1,
    "characteristics": {
      "date": "2015-10-27T00:00:00"
    },
    "type": {
      "iomed": "Temporal Concept",
      "umls": "Temporal Concept"
    }
  }
 ]
}