Skip to content

Instantly share code, notes, and snippets.

@adosib
Created October 22, 2024 18:09
Show Gist options
  • Select an option

  • Save adosib/67ca24e46c9e4293656332a0f9a77d06 to your computer and use it in GitHub Desktop.

Select an option

Save adosib/67ca24e46c9e4293656332a0f9a77d06 to your computer and use it in GitHub Desktop.

Revisions

  1. adosib created this gist Oct 22, 2024.
    47 changes: 47 additions & 0 deletions icd10_regex.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,47 @@
    # ICD-10 PCS
    `\b[0-9BCDFGHX][0-9A-HJ-NP-Z]{6}\b`

    False positives exist! E.g. “G0BBLDGˮ would be identified as an ICD10 PCS code with the expression even though it isnʼt one.

    But no false negatives (tested against all 2023 CMS-approved codes):

    ```python
    In [13]: import re
    In [14]: with open("icd10pcs_codes_2023.txt") as pcs:
    ...: pcs_codes = pcs.readlines()
    ...:
    In [15]: len(pcs_codes)
    Out[15]: 78530
    In [16]: matches = 0
    In [17]: for line in pcs_codes:
    ...: if re.search(r'\b[0-9BCDFGHX][0-9A-HJ-NP-Z]{6}\b',
    ...: ...:
    In [18]: matches
    Out[18]: 78530
    matches += 1
    ```

    # ICD-10 CM
    `\b[A-TV-Z]\d[A-Z\d]\.?[A-Z\d]{0,4}\b`
    This follows the specification*:
    - 3 - 7 characters
    - Character 1 is alpha (all letters except U are used) Character 2 is numeric
    - Characters 37 are alpha or numeric
    - Use of decimal after 3 characters

    Brute force validated as well, though there were 3 false negatives:

    ```python
    In [1]: import re
    In [2]: with open("icd10cm_codes_2024.txt") as cm:
    ...: icd10cm = cm.readlines()
    ...:
    In [3]: len(icd10cm)
    Out[3]: 74044
    In [5]: matches = 0
    In [6]: for line in icd10cm:
    ...: if re.search(r'\b[A-TV-Z]\d[A-Z\d]\.?[A-Z\d]{0,4}\b
    ...: matches += 1
    ...:

    ```