Choosing Identifiers for Your Application
Campus ID? Email Address? OH MY! Help for applications looking to choose an identifier for use in applications.
A note on access to data: some of the data elements discussed in this document may not be immediately available to your application and are not publicly available by default. University of Wisconsin related applications can request access to attributes from NetID Login (and other systems) through the Identity Data Integration (IDI) request process. Note that this process does not always have to end with actual access to data and is a great place to start if you don't know where to start in integrating your application. Operators are standing by.
The choice of an identifier for your application may be something you can do something about, and it may not. Unfortunately, many applications are stuck with the things they understand and how they understand them, so you may simply have to bite your lip and work with what you have. Hopefully even if this is the case, this document can help you work through issues.
First, it is important to understand that there is no immutable identifier that will absolutely never change. Anybody who tells you they can accomplish this is selling you something that doesn't exist. But, you can pick identifiers and use strategies that will enable you to have something that is more stable for particular uses, or allows you to more easily handle changes (programmatically or otherwise).
Second, there is no right answer. There may be answers that are better or worse for a situation (and answers that are definitely not going to work), but most options will work somehow.
Is there something you think is missing here? Do you have a question you'd like answered? Please comment on this doc and ask! We'll do our best to answer.
Identifier Concepts
This section borrows, heavily (to put it lightly), from the work done by Internet2 and REFEDS collaborators in the eduPerson Schema Definition. Any errors here are surely mine, and all credit goes to the authors.
Persistence
Persistence is a measure of the length of time during which an identifier can be reliably associated with a particular individual. A short-term identifier might be associated with an application session. A permanent identifier is associated with its entry for its lifetime. Bear in mind that it is very hard -- perhaps impossible -- to have an identifier that is associated with a person forever and can never change and always be resolvable, even those created with the best design will have cases where they must change.
Privacy
Some identifiers are designed to preserve the individual's privacy and inhibit the ability of multiple unrelated services from correlating principal activity by comparing values. Such identifiers by nature must be opaque, having no particular relationship to the principal's other identifiers.
Uniqueness
Unique identifiers are those which are unique within the namespace of the identity provider and the namespace of the service provider(s) for whom the value is created. A globally-unique identifier is intended to be unique across all instances of that attribute in any provider.
Reassignment
Many identifiers do not specifically guarantee that a given value will never be reused. Reuse means assigning an identifier value to one person, and then assigning the same value to a different person at some point in the (possibly distant) future, for example, email addresses are occasionally reassigned, but NetIDs never will be (resulting in an ever-dwindling "user-friendly" NetID namespace).
Human Palatability
An identifier that is human-palatable is intended to be rememberable and reproducible by typical human users, in contrast to identifiers that are, for example, randomly generated sequences of numbers and letters. In general, human-palatable identifiers are more likely to need to be changed (or in extreme cases, reassigned) for a variety of reasons, but primarily because of tendency to be based on name or occupation or other things that change over time.
So, what options are there?
NB these identifiers, with the exception of eduPersonPrincipalName are locally-specific and meaningless to people outside of University of Wisconsin. Additionally, even those that may have an analog at other institutions ("NetID" or perhaps "Photo ID" or "Campus ID") are likely to overlap and not be globally or even possibly locally (non-UW-Madison) unique.
PVI
UW-Madison's Identity system's generated identifier. This identifier is issued by UW-Madison's identity management (IdM) system (Person Hub or UDS -- among other names) to what it thinks is a single person. If what you know to be a single human being has "multiple PVIs" the IdM system doesn't realize that they are one person. This problem needs to be resolved by linking the disjoint identities before the person's digital experience will function well. PVIs change regularly as new data is introduced and linked to the identity.
Pros:
- Every person in the identity system has one.
- Not personally identifying (not name-based)
- Change sequence is well defined and easily evaluated going forward or backward.
Cons
- People do not know and are not expect to know what their PVI is.
- Changes regularly, usually for reasons the person is unaware of.
If your application is using PVI, it must deal with PVIs changing. This may be as simple as recognizing that users will have a broken experience and there is nothing that can be done, or advanced as consuming changes from either a table or a Web Service and taking appropriate action.
Why do PVIs change? Can't they just stay the same?
Note: PVIs change for a lot of underlying reasons, this is not attempting to be exhaustive, and may be missing a crucial reason. But the root cause is always the same: one of the nearly 20 sources of person data that feeds Person Hub introduced data that caused us to realize that what we thought were two people were a single person (or the opposite, which is a bigger problem.)
As with everything, please comment and contact us if you have questions!
For simplicity of our machinery and applications, early on in development of what became Person Hub it was decided that when we found what we thought were two people becoming one, we would throw away both old PVIs and create a new PVI. This eliminated the need for the infrastructure (which runs without human intervention) to decide on a "winner" and eliminated risk of applications not realizing that something fundamental changed about a person (at the expense of needing to check for old PVIs in cases where continuity is important). Note that is it important to have an internal ID if you want to weather PVI changes as easily as possible.
The general case for PVIs changing is data about a person changing, causing the machine to realize that what it thought were two different people are actually the same person. Because PVIs are created and available to applications before the person creates a NetID (digital identity) and interacts with something, we unfortunately can't possibly know which PVIs was "right" and which was "wrong" or if a PVI has been consumed by a downstream application. For a highly simplified example, imagine someone who is an employee and a student, but hasn't arrived yet: some systems get all employees, some get all students, some get both -- What should we do when the "two people" suddenly become one? How would the machine possibly know which was "right"? This usually happens when names, birthdates or SSNs are updated as more information is known about the person.
Even more complicated and dangerous is the case where what we thought was one person becomes two. This is a generally catastrophic situation for applications, as anything keyed by PVI in the application and all data about the person has unclear ownership and could be actions or data belonging to either natural person. On top of that, someone may have been able to see private information about another person. This is always disruptive for the user and can cause loss of data. It is for this reason that our machinery that identifies people is conservative in matching people, which causes more PVIs changes than could be necessary (but fewer catastrophic mistakes.)
An aside on Self Link and the "Multiple NetID Tool"
To get around issues with data and assist people in situations where they may never be able to provide enough data to match, Self Link was born. This tool allows an HR officer or authorized staff in the Registrar's Office to issue a link to a person that the person can use to log in and map a role to their account that would not match up naturally. This is dangerous, since tearing people apart is hazardous, the person issuing the link has to be very sure that the person to whom they are issuing the link is actually the person for whom the role was created. The general case for this is an HR person who enters an employee giving the person a link at the same time he or she was entered, so it can be certain that the role was created for him or her. There are safeties in the system and if data is sufficiently mismatched a case is opened for IAM staff to review.
The Multiple NetID Tool was created to handle the unfortunately common case of people who accidentally activate multiple NetIDs when the system can't tell that they are a single natural person. As noted below, it is best to never activate two NetIDs, but if a user has, he or she can use this tool to prove ownership of the two NetIDs and generate a case for the accounts to be merged. Because of service limitations, this is an unfortunately manual process that requires user involvement for some services (or it can cause data loss). Unfortunately because we cannot tell the two people are the same we cannot prevent someone from activating two accounts.
NetID
NetID is a branding for UW-Madison's login ID for people. People log in to NetID Login and other NetID-enabled applications using their NetID and password. Usernames and passwords at other institutions (eg UW-Whitewater or University of Chicago) are sometimes also called "NetID" but are not the same value and cannot be used the same places. NetIDs are generated when someone goes through NetID Activation and are not assigned ahead of time. A single individual should never have multiple active NetIDs, and it is not desirable (although it is sometimes inevitable) for someone to knowingly have multiple NetIDs at all. NetIDs are fairly stable for applications to use when identifying users.
Pros:
- Changes more predictably for a person. NetIDs only normally change when a person asks that their NetID be changed, and the person is expecting that some services may have problems when they are changed. NOTE: it is not normal for a NetID to change unexpectedly. If this happens something has gone wrong and the user needs to contact the Help Desk.
- Human readable, and known to the person. People are aware of what their NetID is and are not going to be offended or confused when they see it.
Cons
- Changes are not easily followed going forward or backward. When a NetID changes the old NetID ceases to exist and a new one suddenly exists.
- NetIDs do not exist until the person activates one. A brand new person will not have a NetID and cannot be identified using it until they go through the activation process.
eduPersonPrincipalName (ePPN) or userPrincipalName (UPN)
ePPN (also known as UPN in Campus AD) is a standard part of the eduPerson schema that identifies a person at a campus. For UW-Madison is it always NetID@wisc.edu, and as such behaves exactly the same as NetID. If your application has even the slightest chance of wanting to interact in a Federated world and incorporate people from other UW System or worldwide institutions, ePPN is the best option, as anybody logging in through SAML Federation (through Wisconsin Federation, Incommon and EduGAIN) worldwide will have one and it is guaranteed to be unique.
Emplid
Emplid
is the name of the internal identifier in a PeopleSoft system. Every PeopleSoft System has them (eg, HRS, SIS, UW Health), and they are not the same across the systems. Using a particular system's Emplid will limit your application to only function with identities from that system, and is not recommended, even if your application right now interacts only with a specific backend. However, if your application does deal only with people from a particular backend system, using and tracking this identifier may help your application weather changes in other identifiers.
Campus ID
There is no common and settled definition for Campus ID
at UW-Madison. There are contexts where various 10 or 11 digit numbers are used for specific purposes. See below for help thinking about how you might use Wiscard, SIS Campus ID, Wiscard Account Number, or Library Patron ID, but bear in mind that they are specifically created for specific purposes and you probably do not want to use them. . Please do not try to use these identifiers without a careful understanding, and full understanding of these is beyond the scope of this document.
Wiscard Number
The number currently printed on a non-expired or revoked UW-Madison Photo ID Card. It is 11 digits long and consists of the Wiscard Account Number and an issue code. When a card expires, is reported lost or stolen, or otherwise revoked, this number does not exist. This number will only change when a person gets a new card printed at the Wiscard Office.
Wiscard Account Number
The number designating a user's account for charging to his or her Wiscard. This is 10 digits long (does not include the issue code because it does not indicate the card). When a person has never had a card this will be the same as a Predicted Photo ID, but if the person has a revoked, expired or otherwise invalid Wiscard, the person will not have an account number and cannot charge items.
Predicted Photo ID
The number that will be printed on a person's card if they are eligible to get a card and go get a card. This can change at any time as new role data collects on a person. This number does not exist when someone has a Wiscard Number.
Library Patron ID
The number used by the UW-Madison Library to identify a patron. It is 10 digits long and usually in line with other IDs, but behave as desired by the library.
SIS Campus ID
The number issued by SIS for a student. This number will be chosen as when a new Wiscard is printed for someone who is active in SIS.
Note: It is possible (and not unusual) for a person to have a Wiscard number or Library Patron ID that is different than their SIS Campus ID number. These numbers will sync back up if the person gets a new Wiscard.
Please have an internal ID!
If possible, applications should have their own internal identifier and track PVI as well as another identifier (eg NetID) for a person so that the application can as easily as possible detect changes to it's primary identifier and recover gracefully, either by seeing that the one identifier did not change, or by evaluating changes to PVI.
Tracking against an internal identifier will allow the application abstraction to recover from situations where what the application thought were two people suddenly become one (necessitating picking a "winner") or when what the application thought was a single person becomes two.
But what about email address?
We know that everbody does it,
but on the UW-Madison campus, email address is a very unreliable and insecure identifier. For various reasons (that we know are erroneous, but are nonetheless real), email address is often user-provided and not confirmed. As such, if you restrict something to buckingham.badger@wisc.edu
and look for that value in mail
as delivered from NetID Login, it is trivial for someone to intentionally or unintentionally get in as that account.
Additionally, for the same (in this case good) reasons (especially that mail
is meant to convey the address at which a person wants to be emailed), it is trivial for a user to change the value of mail
to whatever he or she wishes. If your application is keying on buckingham.badger@wisc.edu
and he decides he would rather use bucky.badger@wisc.edu
, he would be locked out of your application.
What about netid@wisc.edu? Isn't that a valid email?
Using ePPN/UPN (netid@wisc.edu
) as an email address is a tempting solution
to the problem of knowing the owner of an email address, which it actually does do. However, it only works this way for people who have UW-Madison Office365. For other people (former employees, applicants, UW System employees, UW Health and most other affiliates) who have NetIDs but are not eligible for Office365 email to netid@wisc.edu
will bounce.
Regardless your population now it is not recommended that you do this, since it will bind you in the future, and you are depending on someone else's eligibility: If you decide to expand your population, you will have backed yourself in a corner and need to re-engineer; worse, if UW-Madison decides to change the population that is eligible that is eligible for Office365, your application may unexpectedly break.
How can I use email safely?
Aside from ePPN as email (above) and its extreme caveats, the only way to use email safely is as a way of contacting the user, while using NetID, PVI or another identifier for the person.
If appropriate, use invites to a Manifest group and release that group to your application for authorization or use that group to drive a feed. Alternately, your application could issue its own invitations to people and have them log in using their NetID (and the application uses NetID, PVI or another identifier for the person.)
Resources
If you're interested in learning more...
- The aforementioned eduPerson Schema Definition
- Identifiers and Usernames by Ian Glazer from the IDPro Body of Knowledge