Early Thoughts on Upside-Down Badge Stacking and Assessment Validity

Lately I’ve been blogging about an “upside-down” approach to stacking badges as a way to reputably self-issue badges. If you have no idea what I mean by upside-down stacking of badges, I invite you to check out my previous post on the topic. In a nutshell, the idea involves learners identifying goals or competencies in which they’re interested, and then acquiring or self-issuing badges as evidence of progress toward those goals or competencies. The learner creates a collection of “evidence” badges, and the more badges they acquire and the more they or their badges are endorsed, the more compelling their claim of competence begins to look.

In this post, I’d like to share some early thoughts about how collections of badges, and using upside-down stacking in particular, relates to assessment validity.

An upside-down approach to stacking badges hinges on the idea that a collection of badges can represent one’s claim to possess a competency or to have progressed toward a goal as well as an individual badge can. I believe that’s an assumption of badge pathways in general. But if that’s the case, I wondered if concerns about badges, over things such as validity and rigor, applied in the same way to collections of badges. My gut sense is that collections of badges may be more resistant to potential problems around validity. However, that “gut sense” might also be called a bias, and so it deserves some scrutiny.

Assessment Validity

What got me thinking about this in the first place was a concern over the validity of claims made by digital badges, and especially of claims made by badges tied to competency-based assessments. This was a concern I saw raised by Dan Hickey in his 2016 blog post “Traditional Approaches to Validity in Assessment Innovation.” In his post, Hickey wrote that he’s surprised credential innovators don’t express more concern over the criterion validity of the claims some credentials make. Criterion validity, as I understand it, refers to how well the results/scores of an assessment correlate to some external or future metric. For instance, the results of a competency-based assessment that assesses a learner’s ability to write professional emails may show a high degree of criterion validity if results from that assessment correlate strongly with, say, that learner’s job performance reviews related to professional email writing.

Hickey’s concern about criterion validity resonates with me because, as a designer of competency-based micro-credentials, I’ve seen first hand the tendency to make competency claims in micro-credentials that feel too strong for the evidence used to support those claims. As an example, imagine a learner who takes a competency-based micro-credential course on the topic of writing professional emails. The learner’s competence in this domain is evaluated by an authentic summative assessment: she has to write an email, the professionalism of which is evaluated by an assessor using a rubric. If the learner scores above a certain threshold, she earns a badge that claims she has demonstrated mastery at writing professional emails. But has she really demonstrated mastery? Is one email, written in the context of an assessment, which may contain instructional prompts, examples, and maybe even a template, sufficient evidence on which to base a claim that our learner will be a competent email writer “out in the wilds” of the workplace?

I wondered if an upside-down approach to stacking badges may address concerns about the criterion-related validity of micro-credential assessments. In one sense I think it might, but in another sense I suspect it may simply shift the concern to the evaluation of collections of micro-credentials.

On the one hand, it may address concerns about criterion validity of competency-based micro-credentials because, with upside-down badge stacking, there needn’t be an artificial threshold that sharply divides learners who “possess” a given competency from those who “lack” that same competency. Learning usually isn’t an all or nothing prospect. It more often exists as a gradation that ranges from novice to master (Dreyfus & Dreyfus, 1980). Of course, badge and pathway designers aren’t naive about this, and there’s currently an effort to make badges reflect that graduated nature of learning by creating badge levels: Learner A earns the “Competent” badge, while Learner B earns the “Really Competent” badge, but Learner C earns the “Holy Competence, Batman!” badge. When badges are stacked “upside-down” beneath a pathway node (which itself may be a goal or a competency statement), there’s less need for such badge leveling. The claim of the competency statement simply looks more and more convincing as the learner acquires more evidence badges and more endorsements to support those badges. That said, there could still be room for badges that recognize when a learner, based on the evidence she has been able to marshall, passes over a competency threshold. Such a badge might be used to recognize additional rights and responsibilities that the learner has earned in an organization, e.g. email mentor, resident expert on writing professional emails, or even the right to email on behalf of the organization.

So, in the case of an upside-down arrangement of badges, the over-arching claim of whether or not an individual learner possesses a given competency needn’t reside within a particular badge or competency pathway. Instead, it can reside as a judgment in the mind of the badge consumer who reviews the competency claim and supporting evidence, and makes up her own mind about whether or not the learner is likely to be competent. Or if the badge consumer is an algorithm, it makes up its “mind” about the learner’s competence. In either case, it might be that a collection of upside-down badges makes a softer and more implicit claim about a competency or goal than does the explicit and all-or-nothing sorts of claims made by many competency-based micro-credentials.

On the other hand, upside-down badge stacking may simply shift concerns about the criterion validity of micro-credentials to concerns about the validity of collections of micro-credentials. However, badge pathways may do the same thing; I’m not sure it’s a characteristic unique to upside-down stacking. Before I continue, I confess that when I’m talking about assessment validity, there’s still too much that I don’t know. Maybe someone can clarify for me: can one talk about the validity of a collection of badge-related assessments in the same way one talks about the validity associated with a single summative assessment? There wouldn’t appear to be a single assessment result with which to correlated to a criterion, so it seems the answer should be “no.” Or perhaps a collection of badge-related assessments might show a degree of validity that is somehow the sum of validity (validities?) of the assessments that make up the collection? In any case, I still think there’s a sense in which the sum of any collection of micro-credentials may or may not correlate with a given criterion.

Perhaps there can be a single score associated with of a collection of badges or pathways with which to correlate to a criterion. For instance, I have heard that Concentric Sky/Badgr is currently experimenting with a search feature it calls “Badge Rank,” which ranks badges based on a number of criteria. Open badges are machine readable, so it’s not hard to imagine that someone could develop a ranking for collections of badges. In other words, as the owner of a collection of micro-credentials, I might get a score that represents the probability of how likely I am to possess a given competency (side note: I don’t think that’s what Badgr’s Badge Rank does. I only mentioned it to point out that technologically the prospect isn’t so far-fetched). The service/program that generates that ranking might be thought of as another kind of badge consumer, and it would almost certainly be an algorithm.

I don’t think I’ve answered any of my questions about validity and collections of badge-related assessments. That’s okay, my goal in this post was simply to share some early thoughts on the topic. At least, that sounds better than saying I didn’t figure anything out.

Incidentally, the notion that collections of micro-credentials, and by extension the owners of those collections, may one day be evaluated and ranked by algorithms is something I hope to hear others eventually weigh in on (and perhaps others already have and I simply missed it). It’s a topic about which I feel ambivalent: if micro-credentialing proliferates then perhaps that kind of scoring will be needed by badge consumers, e.g., human resource professionals. At the same time, it feels like something that has the potential to run counter to the ethos of open badges. Plus, it’s tough to imagine what the far-reaching consequences of such scoring might be.



Dreyfus, S. E., & Dreyfus, H. L. (1980). A five-stage model of the mental activities involved in directed skill acquisition (No. ORC-80-2). California Univ Berkeley Operations Research Center. Retrieved from http://www.dtic.mil/docs/citations/ADA084551

Hickey, D. (2016, July 4). Traditional Approaches to Validity in Classroom Assessment and Innovative Credentialing (Part 1). [Blog]. Retrieved from http://remediatingassessment.blogspot.com/2016/07/modern-approaches-to-validity-in.html


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s