Created
October 22, 2024 18:09
-
-
Save adosib/67ca24e46c9e4293656332a0f9a77d06 to your computer and use it in GitHub Desktop.
Revisions
-
adosib created this gist
Oct 22, 2024 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,47 @@ # ICD-10 PCS `\b[0-9BCDFGHX][0-9A-HJ-NP-Z]{6}\b` False positives exist! E.g. “G0BBLDGˮ would be identified as an ICD10 PCS code with the expression even though it isnʼt one. But no false negatives (tested against all 2023 CMS-approved codes): ```python In [13]: import re In [14]: with open("icd10pcs_codes_2023.txt") as pcs: ...: pcs_codes = pcs.readlines() ...: In [15]: len(pcs_codes) Out[15]: 78530 In [16]: matches = 0 In [17]: for line in pcs_codes: ...: if re.search(r'\b[0-9BCDFGHX][0-9A-HJ-NP-Z]{6}\b', ...: ...: In [18]: matches Out[18]: 78530 matches += 1 ``` # ICD-10 CM `\b[A-TV-Z]\d[A-Z\d]\.?[A-Z\d]{0,4}\b` This follows the specification*: - 3 - 7 characters - Character 1 is alpha (all letters except U are used) Character 2 is numeric - Characters 3 7 are alpha or numeric - Use of decimal after 3 characters Brute force validated as well, though there were 3 false negatives: ```python In [1]: import re In [2]: with open("icd10cm_codes_2024.txt") as cm: ...: icd10cm = cm.readlines() ...: In [3]: len(icd10cm) Out[3]: 74044 In [5]: matches = 0 In [6]: for line in icd10cm: ...: if re.search(r'\b[A-TV-Z]\d[A-Z\d]\.?[A-Z\d]{0,4}\b ...: matches += 1 ...: ```