10 thoughts on “Same content, different file

  1. Enforcing a whitespace-/indentation-rule for XML is arbitrary since XML ignores both. Therefore I’d rather see it as yet another artificial annoying rule.In addition: as soon as upstream metadata tags get used, even more metadata.xml files will differ.

    Like

  2. Most programming languages ignore whitespace and indentation. Is that a reason not to have conventions?

    Like

  3. @dev-zero thanks for proving my note about herding cats.Sure, XML ignores whitespace. But as I said there are *very good technical reasons* to have the same indenting format for all the files just for the sake of de-duplication and compression.But “yet another artificial annoying rule” (as in, the mindset that lets you write that) is the reason why Gentoo’s results are getting shittier by the year.

    Like

  4. @diego, for your purpose a tab-space indentation is the only viable solution:1- with white-spaces, the files will be bigger size than with tab-spaces;2- white-spaces are more error prone than tab-spaces, nullify the de-duplication purpose and requiring extra QA checks for the validation of the files;however I agree with you, the new Gentoo trend seem to be: “WTF! a new rule? why we need it? the solution is to ignore the problem”

    Like

  5. @dev-zeroWe already have this “artifical rule” in ebuilds.There is no reason for identing with tab in bash scripts (apart the technical one), the same would just apply for the metadata.xml.I guess you never had issue with forced style on ebuild files…

    Like

  6. Maybe I’m missing something, but standardizing indentation and whitespaces should be the easiest thing to do automatically – strip all indentation and whitespaces and recreate them again with some xml beautifier package. So why not just add some hook to the vcs that services the tree that would do that to every metadata.xml file?

    Like

  7. A simple step towards this goal is to put a vim modeline in skel.metadata. I’m surprised no one has done this yet. I just use whatever indenting the previous person did and if there was a modeline, even better.

    Like

  8. @Jeremy: then we’ll add a kate modeline, and a emacs modeline, and ….Why not rather make the vim, emacs and whatever else plugins handle those properly?

    Like

  9. It shouldn’t be too hard to write some analogue of dev-util/indent for xml (probably such a thing even exists already), so one could simply require its use (repoman?).Concerning tarballs, it is not so important to have identical files: The differences cost in the compression just a few bytes, and for many files the differences itself are similar to each other and thus cost even less.For squashfs, the situation is different, because for identical files the problem of a huge distance in the archive is avoided.

    Like

  10. If identical files are hardlinked together, perhaps by some periodic deduplication scan (since VCSes don’t seem to care), that would reduce blocks and inodes used both, right? I wonder how expensive for the rsync server it would be if all clients used –hard-links, and I wonder if that’s a complete enough solution.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s