Thursday, December 1, 2016

Normalizing Data

How do I Normalize Phone Numbers?

Recent question about Phone Numbers and how to remove non-number characters from a byte container, raised some interesting solutions to normalizing phone numbers:

Considering the following data, you see that the phone numbers have all sorts of different formats.

>in myphone
>list
>xeq
>IN myphone (0) >OUT $NULL (0)
PHONENUM        = #123.456.7890

>IN myphone (1) >OUT $NULL (1)
PHONENUM        = (123)567-1234

>IN myphone (2) >OUT $NULL (2)
PHONENUM        = (321).123.5678

IN=3, OUT=3. CPU-Sec=1. Wall-Sec=1.
The steps in normalizing the data is to remove the non-numeric numbers:
>in myphone
>set cleanchar ""
>clean "^0:^47","^58:^255"
>def newphone,1,14
>ext phonenum=$clean(phonenum)
>out newphone,link
>xeq
IN=3, OUT=3. CPU-Sec=1. Wall-Sec=1.

>in newphone
>list
>xeq
>IN newphone (0) >OUT $NULL (0)
PHONENUM        = 1234567890

>IN newphone (1) >OUT $NULL (1)
PHONENUM        = 1235671234

>IN newphone (2) >OUT $NULL (2)
PHONENUM        = 3211235678

IN=3, OUT=3. CPU-Sec=1. Wall-Sec=1.
You can then use an edit mask to format it in the same way. You do need to redefine the field being edited with a define of the number with just the length of the phone number:
>in newphone
>form
    File: newphone     (SD Version B.00.00)  Has linefeeds
       Entry:                     Offset
          PHONENUM             X14     1
    Entry Length: 14  Blocking: 1
>def my,phonenum,10
>def targ,1,12
>ext targ=$edit(my,"xxx.xxx.xxxx")
>list
>xeq
>IN newphone (0) >OUT $NULL (0)
TARG            = 123.456.7890

>IN newphone (1) >OUT $NULL (1)
TARG            = 123.567.1234

>IN newphone (2) >OUT $NULL (2)
TARG            = 321.123.5678

IN=3, OUT=3. CPU-Sec=1. Wall-Sec=1.

No comments:

Post a Comment