Developing new policies on data standards constituent names is a crash course on what can go wrong in the development world. For every discussion over first or last names or any other field stories abound about times when those rules failed. It seems everyone has been burned at one point after, following the report they were given, they addressed a prospect by the wrong name.
Nowhere does this get more offensive than getting gender incorrect. Refer to someone as “Mrs.” instead of “Mr.” and a chain of anger will eventually find it’s way to your desk. While an enveloped addressed to “Mrs. Robert Smith” is obviously wrong, trying to apply a hard and fast rule is difficult due to the flexible nature of names.
If we accurately capture preferred name titles at the time a constituent is entered into the database, our work is already done. This is also the only way to ever account for more difficult situations such as transgender or transitioning constituents where there’s simply no way BUT direct preference to be accurate.
But that still leaves situations where data is entered incorrectly or simply isn’t available. How do we check for possible errors? How do we fill in missing information? We’ll need either an army of interns or something a bit more automated.
Finding a Data Source
Fortunately, after searching for awhile, I sumbled across the Social Security Birth name data, a list of nearly every first name and gender of children born in the US in the last 100+ years. The perfect open-source data set to use as a starting point.
Social Security Birth Name Data
Created by the Social Security Administration
- Lists every name given to a child born in the US by gender and birth year.
- Excludes names given to fewer than 5 children for any gender in a given year (due to privacy reasons). For example, if there were 4 female Peters born in 2007, they will not show up.
- Covers the years 1880 and on, although not everyone was required to get a Social Security card until 1937.