While I wrote my rant about last year’s SoC I started to think of some advises for both students and mentors of Google Summer of Code.
I have to say first off that I didn’t partecipate actively as a mentor in 2006 (I was backup), and I didn’t partecipate last year at all, so you have to take these suggestions as an outsider’s suggestion, but, I think, a quite experienced outsider, by now.
- Work in advance. Don’t wait till your application is accepted. Plan ahead, if you intend to partecipate, start working already! Find an interesting idea, and start fleshing out details. How are you going to work on it? Can you already prepare a few use case diagrams? Can you design an interface already? This is an investment, even if you don’t get accepted, the goal of SoC is to put you into a real-world environment, and doing this work is the first step. Think of it like trying to sell an idea to your superior, or a new company you want to work for. Also, it should give you quite an edge, showing you care about the experience more than the money (which in turn should mean you might actually continue working on the thing afterward).
- Ask around! An important task for any developer, not limited to Free Software developers, is to be able to ask the right questions. If you’re a free software developer, you most likely have to ask one day to people who worked on similar issues than your own, colleagues and similar, so that you don’t have to re-implement the wheel every time. Searching documentation is cool, but it’s not always going to cut it as you might not find any reference to what you want to know. If you need information, you have to ask not only your mentor, but whoever has the information you need. Your mentor is supposed to know more about the project which you’re working on than you, but if that is not the case you has to find someone to give you the information. Trying to work without knowing the conventions and similar is not going to produce good results.
- If you’re working on a testable project (a library, or a non-interactive tool), write testcases: doing so will make sure that your mentor can tell your work is proceeding correctly. And will give you a way to make sure you don’t end up breaking what you wrote a week before. In addition, I’d suggest you to write one test each time you do find an error in the behaviour, even if that means you end up with hundreds of tests!
- Profile your code. Profiling is an important task for making sure of the quality of the code; while in most universities you’re done with writing a working solution, or a working not over-complex solutions, in real world you have to write code that doesn’t suck. And that means it has to run in a timely fashion and not abusing memory. Try looking around in my blog about memory usage and cowstats for instance. You need to learn how to use tools like that, smem, valgrind, and so on. This is the best time!
Mentors should look at these points above, and see what they can do to facilitate their students. Be around to answer their questions, point them to the right person to ask if you can’t answer them. Make sure you know how the testsuites work in the language of choice of your student, this way you can judge if he’s doing them right or if he’s just testing the behaviour that is known to work; also, try to figure out patterns that are not yet tested for, and ask the student to test those.
Up to now the suggestions refer to any organisation and project involved in Summer of Code, not just Gentoo. So feel free to link this post (or quote it, please still reference my blog though) to your students.
As it might be that not all the developers writing up as mentors will have time to do all of the above, I’m at least going to help by trying to stay around for the students. This means that if you have any question, especially related to C or Linux programming (ELF files, memory usage, compiled code analysis – both static and dynamic), feel free to contact me. In the worst case I’ll have to answer you “ask me again tomorrow, I’m busy” or “sorry this I don’t know, it’s not my area of expertise”. It’s worth a try 🙂
(Joshua, Alec, whoever, feel free to link this blog in the SoC page).
Caring about cow is something about which you shouldn’t even start thinking until you’ve gone through and tweaked your data structures and your allocators. Spending ages saving a single cow page when you’re using allocator-humping slow data structures all over the place is a great example of optimising completely the wrong things. Baselayout and OpenRC are perfect examples here — there’s no point even starting to look at cow there until the insanity that is strlist is replaced.
Moreover, what you are doing with cows has nothing to do with profiling. You haven’t posted how much time/memory is REALLY wasted. Profiling is not just measuring which is what you did there.- ferdy
Ferdy, you have no idea what am I doing, do you? Profiling is not about showing what to change but rather how the program behave. Memory profiling shows you how much memory is used. cowstats _is_ a memory profiling tool. A static profiler but still a profiler.As for counting COW pages, for a library they are still _quite_ an important task, if you find yourself relying on 20MiB of COW pages in a library (which really wouldn’t surprise me much for _students_, not sure you, but I have seen code of people coming fresh out of universities with no real-life experience), then you can easily see you have a problem.Or are you saying COW analysis is unneeded because C++ is notoriously bad with COW because of the vtables? And thus cowstats shows a huge amount of space in writable areas because of that?Just for your information, I do plan to add a –ignore-vtables option to cowstats to make it more useful in C++, I just haven’t had time to look into the demangling.
Yes, I know what you are doing. And I understand every single thing you are trying to do and I still think you are measuring the wrong thing.You keep saying it is a ‘quite important task’ without providing any single fact or number.Not sure where I mentioned C++ in my post. Could you please point me to the exact paragraph?What I’m actually saying is that you should provide some real facts as to why COW analysis is that important.- ferdy
I assume you could at least be able to read, as I wrote _more than once_ what is involved in COW pages and why it is important to reduce their number. If you can’t yet see why it is important to reduce them, I certainly hope you’re not going to make any work on libraries I use…
Yet you didn’t provide any real fact. Moreover, the fact that you write about something doesn’t make it neither interesting nor true.However, now that you mention reading abilities, I’d like to know where I said reducing the number of COW pages was bad. You are just making all that up yourself. I just said you are measuring the completely wrong thing.Also, may I ask you to be polite? You are being embarrasingly rude.- ferdy
The amount of COW pages in a final object (shared library or executable) is the direct representation of in-memory usage. Sure you need to check with smem for the improvement in case-by-case, but you can’t hit _all_ codepaths at once to get it measured dynamically, so the right way out is to measure it _statically_. So what else would _you_ measure?And by the way, I do provide facts, if your reality makes anything I write not true by default because you have a problem with it, then it’s not up to me to try to convince you. Feel free to try, I’m quite sure that most people agree COW pages are a problem and cowstats is a tool that helps reducing it.As for politeness, I tend to reserve it for who deserves it. And no, I don’t think you deserve any from me; and I’m not half as rude as you have been in public, with me and others, so feel free to get the hell out of here if you don’t like the fact that here you won’t find somebody who’s going to limit his expression.
I’d measure the overhead they cause in real use cases. That’d give you more insight into the real picture. Absolute numbers with completely 0 context mean nothing.For an application that has a resident set of half gig, you can measure how much of an overhead 20Mb of COW pages cause.With respect to politeness, feel free to keep your insanity levels up there. Actually, being polite is not about limiting your own expression, quite the contrary.- ferdy
so far openrc seems working fine and Roy is quite open to accept patches, Ciaran please submit a proposal for a nicer structure and substantiate why the current one is humping the memory allocator ^^
@lu_zero: Just read that code. Really.- ferdy
Are you Ciaran? Anyway fill a bug if you agree with Ciaran about the fact that kind of string list is ugly.
Of course I’m not Ciaran. And no, I won’t file the bug, I don’t care enough.- ferdy
Honestly, if you can’t see straight off why rc_strlist is horrible without having it explained, you have no business discussing it. Or, for that matter, writing code that other people will use…Roy has already said that he won’t change it because he doesn’t want to use “custom structures” rather than the “standard” char * * *… Says all you need to know really.
bug number?
Extensive discussion in the IRC channel whose name we don’t mention where the plebs can see it.
Hi Diego,not sure this is the right place for an SoC idea, but what about a portage backend for packagekit?