As I recall, the confusion/inconsistencies about game covers started when someone renamed the USA cover pack by changing the region letter from the game ID to make a PAL cover pack... some games didn't match because the publisher was different for NTSC and PAL. That's when removing the last two digits seemed like a good idea, so that covers would match the games.
One of the advantages of trying to be compatible with the MAME listXML format, is that the main XML element for each game will contain the full name, including version number, which allows for multiple versions (1.0, 1.1) of the same game with the same game ID.
If the publisher's code from the game ID is consistent (and it should unless there was a mistake by Nintendo or the publisher), adding the publisher will only add to the bloat factor. (actually, I'd like to call the file "bloatii.xml"
) The publisher field could remain as optional, and the loaders would still have the info from the ID if the publisher field is missing. Maybe it should only be included if the code is known to be wrong, I really don't know...
The format is XML so it will be easy for people who want a faster loading time to remove the content they don't need. As you said, a script could be made.
About the project's long term goal, I'm not sure it will be used by No-Intro or redump.org, because all checksums would need to be verified, twice.
If there is no way to tell if a checksum has been confirmed, there is little use in the long run. Maybe it could still be useful to them as a point of reference, I'm not sure. To be able to confirm checksums such a project needs an official maintainer and not a wiki-like access for everybody, or a forum with a team reviewing new entries. There are already such sites, and I think it would be best to join forces. The idea is to get a more complete database, instead of having on one hand the checksum oriented dats and on the other hand the game collection oriented databases (Offlinelist, and online game lists).
Ideally, there could be different groups, working on the same database... I'd like to avoid starting a separate project.
If it is a separate project and it is picked up at some point as a way to add more information to their dat, it will have been useful to some extent, but if it is part of a bigger project it will be useful right from the start, supported by more people and with a content growing faster.
I'll try to see if there is interest from those groups in going in this direction, and if not this will just be another project.