September 16
Worked on Intranet, learned about Reed-Solomon from A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems"
September 18
Nearly done with Reed-Solomon encoding
September 22
Finished with Reed-Solomon, but it's not quite working.
September 23
Reed-Solomon now works. I can break up a file into N pieces and M "checksum" pieces, each of which is 1/N times the size of the original file, and reconstruct the original file from any N of the new files.
September 25
Fixed up RS so that it can work on arbitrarily large files with memory requirements based only on N and M, not the size of the file. However, testing showed it decodes about 1MB/s on a guitarist, which comes out to about 27 hours for 100GB. The basic algorithm can probably be somewhat optimized, but distributed computing may be a better solution: the storage is distributed, why not the computing, too?
September 29
People talking about what they were doing; no productive work done.
September 30
Switched over to 16 bit wordsizes, which should, in theory, double the speed with little work. It seems to be working, but I didn't have time to thoroughly test it.
October 2
It doesn't work when it's split into large numbers of chunks, and I have no idea why. It seems to give different outputs for the same input, which is just wrong.
October 6
Still can't find the stupid bug. It doesn't seem to make any sense when it shows up.
October 7
Tracked down that mysterious bug - it was due to the first data character in the file being a newline or space - this confused fscanf, and caused problems. I switched to fread, which is better in almost every way, and succeeded in breaking up the RSraid tutorial Postscript into 25 + 15 parts, and recovering it using the odd ones and even ones 4-12 (as an arbitrary nontrivial test).
October 9
Worked on the project proposal, did more testing on the program. It really seems to work this time. Next week I should get started on the sockets bit. Based on what I remember from before, it shouldn't be too bad.
October 14
Tried to get sct grader working; failed.
October 16
Learned sockets, got quick socket test working. Now to apply that to rsraid...
October 20
Time flies when you're, um... doing whatever I was doing today.
October 23
Worked on non-code aspects of project - added the Project Proposal, added more description to code, and tweaked the web site to accomodate these changes.
October 27
This day never happened.
October 28
Got the basic part of sockets in rsraid working - sent it in two pieces over sockets, it seems to have been sent correctly. I should do more testing, then get the retreiving part working. Also, the ending is a bit weird... I should work out a better way to signal the end of the file.
October 29
Was pointed to a slashdot article, where someone asked for a similar thing, and found, in comments, the Distributed Internet Backup System, which is scarily like what I'm doing, even using Reed-Solomon. Interesting.
November 18
Been lazy about the logs... I've done some non-coding stuff, as well as working more on code, almost have the full inter-computer backup done. I just have to put the final touches on getting data back; specifically, figuring out when the data ends. Once this is done, I should be nearly finished.
November 25
Learned a bunch about Kerberos.
December 9
More forgetting of logs... worked a bit more on the files, but today was trying to get the Cray (SV1) to run faster than a Guitarist (2.4Ghz P4). Hopeless.
Lotsa time
Working on assorted things, such as project poster and paper, and trying to come up with ideas for extending the project to make it more interesting. Also spent a bit of time failing to make the Cray work faster than a Guitarist. Not a whole lot of work on the actual code, though.
January 13
January 15
Submitted info to Science Fair! Did a tiny bit of work on the code, but nothing really significant.
January 20
Poster poster poster poster poster poster printer poster poster poster gluestick poster poster papercutter poster scissors poster poster poster poster gluestick poster poster poster poster poster. Guess what I did today?
January 22
End of quarter! Finished everything (that needs to be done to get a decent grade)! No work on code! Finished 4th in US on most recent USACO! Again!
January 28
____ _* _ º_____ · * º__
/ ___|| \ | |/ _ \ \ ·*º / /
\___ \| \| | |*| \ \ /\ / /
·___) | |\ | |_| |\ V V /
|____/|_|·\_|\___/ º\_/\_/
/ ___|| \ | |/ _ \ \ ·*º / /
\___ \| \| | |*| \ \ /\ / /
·___) | |\ | |_| |\ V V /
|____/|_|·\_|\___/ º\_/\_/
January 29
Worked on actually making code work - distributing code seems to be working now, at least for encoding. Decoding should be pretty much the same thing.
February 2
Stuff seems to be working after finishing it over the weekend. Still need to work on my poster for the science fair, though...
February 3
Snow day! Work on poster! Yay!
February 5
Science fair was yesterday, went reasonably well. I was also contacted by a college student doing something like this for his thesis - sounds kind of interesting. Started on factoring out the code to make it nicer - all of the encoding/decoding stuff is now in rsStuff.h. This should make it much easier to fix stuff/add new features.
February 9
Physics Olympiad today, so I missed the first part of class. Over the weekend, I had an AIM conversation with Nathan, the college student mentioned in the last post. He's apparently doing this using standard block devices and such, so it will probably be a lot easier to use than mine. He also mentioned that he sped up the algorithm 8-10 times by replacing the log/antilog table with a single multiplication lookup table, which required reducing the word size to 8 bits rather than 16. And I did this, and it ran at exactly the same speed. Except, the user time was much smaller - it turns out that the problem is with AFS being pathetically slow. Working in /tmp (local, fast storage), encoding took about 1.3 seconds for a 10M file with N=50, M=20. So that translates to 500GB-1TB per day, rather than the 100GB/day that I originally thought. And when combined with distributed computing, this could actually be a very reasonable system. Now to get it all working perfecty...
Also, I found out this morning that I was selected to go on to the regional science fair.
February 10
Worked on factoring stuff out - I now have an rsStuff.h that contains all of the actual interesting code. This should make it much easier to work with.
February 12
Much nicer code now - connecting client to server is a single call; the code looks much nicer, and I don't have to have it once for encoding and once for decoding. And encoding seems to generally work "distributed" over one computer; however, it seems that TCP is pathetically slow for small packets. I think buffering should help quite a bit. I probably should actually make a good buffering thingy.
February 17
Worked on buffering - encoding is now almost entirely buffered (i.e., stuff is only sent when there's a lot of it). However, it still takes just as long to do. Encoding a 352 byte file takes 14 seconds and no CPU time on any of the parts. I need to figure out why.
February 19
USACO USACO USACO - trying to get dstats to work.
February 23
Working on buffered sockets - think I almost have it working!
February 24
Stupid stupidity... I was being silly with the buffered sockets implementation, which made it almost impossible to make it work. I redid the whole thing, so now the socket is included in the buffer - there is a single structure that has a buffer and a socket, and you just bread or bwrite or bflush it, and it all just works. I updated everything to use that, and it worked (on the first try!). However, when I tried to make encoding buffered (the last thing that wasn't), it all just broke. Oh well. That's what Thursday is for.
March 4
Alright... after a bunch of time spent trying to make everything work, I'm starting over. Today I made sure the algorithm works, and it does, except for a stupid off by one error that I fixed. Now I just have to redo all the sockets stuff; however, this shouldn't take more than three to five hours. Then it all will work! Yay!
March 8
Working on Science Fair Techlab project. Mostly working, I think.
March 9
WHY IS THE NETWORKING SO SLOW??? I'm sending data over a 100Mbit network at ~1K/s. There's no reason for this.
March 11
There's still no reason for the slow network. However, everything else works, so I can get nice runtimes just ignoring the network issue by doing it on one computer.
March 15
Yay! I got a first place at the Science Fair. However, didn't get much done today.
March 16
Worked on newintranet. Made the directory.prototype work reasonably nicely. I'm kind of scared, though, since the classes are about 200 lines (good), but the PHP interface and Smarty template are about 20 lines apeice. That's good, but it seems to good to be true.
March 18
Worked on newintranet, specifically the groups part. Got some basic stuff worked out, but still lots of stuff to do.
March 22
Got started on the announcements system for newintranet.
March 23
Alright... the basic stuff seems to be working. A basic web interface is also up, though it's pretty primitive.
March 25
Well, announcements seem to be working... but it's kind of hacked together. I really need to go through and make it pretty. (Internally, not externally).
March 29
Hmm... multiple-group announcements killed it. I started redoing it focused on groups, so announcements can be approved per-group. But it's not done yet.
April 1
Yay! Announcements work! Using only the web interface, I was able to submit announcements for various groups that only sponsors and admins can approve. I suppose there are some features that would be nice, like viewing announcemnts you submitted, or permanently rejecting an announcement. But as it is, it's certainly usable by users. So anyone can submit an announcement, and any sponsor/admin can approve it. Much nicer than the current system.
...
Working on adding stuff to Intranet...
April 26
At Carnegie Mellon.
April 27
Taking the USAMO
April 29
Couldn't find a reason to not be here today.
May 3
Worked on newintranet boxes.
May 5
Newintranet boxes almost finished - they work, except have to be hardcoded. Just have to put them in the database instead, and the bases should be finished! I think the next major step is working out how to have multiple people working on stuff with some semblence of security.
May 7
Newintranet boxes work! They move, and close, and shade, and everything. No way to add them back at the moment, but that's a seperate issue.
May 10
Monday. Useless. Well, except for fixing up logcheck rules.
May 11
Looked into CVS for Newintranet - it seems like it should be quite helpful keeping things organized
May 13
Massive learning and setting up of CVS for Newintranet.
May 17
Working on making Newintranet cooperate with CVS to run out of people's homedirs.
May 18
Assorted Intranet/Newintranet stuff
May 20
Looked into optimizing MySQL queries (more for information than practical use), and it turns out it uses B-trees for storing indicies. This reminded me that I wanted to learn how those work, so I went and learned how those work.
May 24
Monday.
May 25
Worked on documentation for newintranet - now /doc has some useful information.
May 27
Robustus now has an updated copy of newintranet at its root, and is updated every 15 minutes from CVS nicely. Also made a few changes to it.
June 1
The paper is finished! Completely and utterly. Most other stuff is done, too.
June 3
Finished! Everything is finished, printed out, stapled together, archived, filed, bent, folded, spindled, and mutilated! (Well, maybe not those last few). Anyway, I officially declare this project/log closed.