I like your second exercise - a good project for someone to do - but I agree it would be quite a bit of work to do it well. On your first point, Google Scholar tries to combine cites to the published and working paper versions of the same paper, so would also take some work to try and separate these.