This is very helpful, and an issue I'm struggling with currently. My program is evaluating individuals nested within village clusters within districts, and in our particular setting we believe there is clustering at both the village cluster and district levels. Our comparison individuals are both within the treatment district and in neighboring untreated districts, so it's nice to hear that "clustering adjustments may not matter much."
For a typical impact measure in our data, the normal and robust standard errors are identical at .120, the village-clustered standard errors go up to .157, and the district-clustered standard errors go up to .178 (which pushes the statistical significance of the result from .001 to .014 to .081, effect size .2). Does this count as not mattering much? If I am unsure from a theoretical / program setting point of view, how do I decide which to use from there? Should I choose the clustering that gives the largest standard errors to play it safe?
I've tentatively decided to use village-clustered errors based on our best intuition of the program setting, but is that just one more researcher degree of freedom that I'm tweaking for my own benefit?