pmuellr is Patrick Mueller

other pmuellr thangs: home page, twitter, flickr, github

Wednesday, February 15, 2006

David Byrne's blog

I love the fact that David Byrne has a blog. Makes him seem more ... approachable ... or something. Or maybe it's just that I agree with his politics.

Tuesday, February 14, 2006

web scraping fix up

I've been doing a bit of web scraping over the years. My pride and joy is a Slashdot scraper, which I've used to generate RSS for in recent times, and has been generating iSilo friendly HTML for me for years. I finally ditched the RSS generator, as I finally found one which basically works and won't get me banned every so often. The code for my scraper is here. My latest efforts in not getting banned from /. is to run off of the Coral shadow instead of /. directly. Not sure, now, why I wasn't doing this before; I had all kinds of elaborate checks to make sure I wasn't hitting /. too hard, but inevitably I'd screw up and get banned for a few days, a couple of times a year.

But that /. scraper has been running great, for years. I run it at about 4:00am and 11:00am, and the run iSiloC 30 or so minutes after that. So I have fresh, hot /. articles, with full comments, on my Palm in the morning while I'm waiting for the kid's buses to come, and in the afternoon when I go for a walk. About 2 Mb worth (that's compressed HTML). Now, if only there were some interesting articles!

I just had to fix up another one of my scrapers, for Harmony-Central, that generates RSS that includes the actual article text (and images) instead of the empty item bodies the site provides. This is the one you want, since Bloglines has a couple listed: Compare it to the one that HC provides itself:

What I had to fix was a typo they injected in an article link. My python script was throwing an exception and dying before writing out the RSS. Quick fix to try/except around it, providing an error message in lieu of the content it was supposed to getting, and in a little while, bloglines picked up the new copy, and I got a few days of H-C news to catch up on.

That's the web scraper's life; constantly having to add little checks for things that go wrong, or change. In the end, worth it to me though.

set up bloglines a bit more

Just set up bloglines a little more, so that you can get to my blogroll, and what appears to be a way for me to post a link to one blog entry into my 'blog' over on bloglines, referred to as a clipping blog. We'll see. Since, of course, I'd rather post links here.

I switched from an 'app' feed reader, RSS Bandit, a few months ago. The only real downside of not running an 'app' instead of the web site is that I can't look at stuff that bloglines can't see, like IBM internal blogs. Oh well, my RSS Bandit list is basically just IBM blogs now ...

Monday, February 13, 2006

DC trip with Sam 2006

Sam and I visited DC this January. Flickr photoset available.

Peter and I had made this trip three years ago with our YMCA Indian Guides tribe. The Y's Guide/Princess program is a great excuse for Dads to spend time with their kids. Highly recommended. And fun. But I was getting a little tired of it, Sam wasn't all that interested in it anymore, and you can roll your own DC trip a lot cheaper than what the Y charges. A lot.

Here's what we did.

Took Amtrak up. From Cary NC to Alexandria VA, round-trip, for me and Sam, total, was $90. We specifically went to Alexandria, as there is a very convenient hotel across the street from train stop (Embassy Suites), as well as a Metro stop right there. Very, very convenient. In the past, Sandy and I have taken the train to Metro center, then took the Metro to Dupont circle and stayed up there. Lots of convenient options when taking the train up.

Taking the train itself is, as you could expect, rather boring. It's a little under 5 hours from Cary to Alexandria. You should plan on packing a lunch of some kind, to save a little money on what you'd otherwise spend in the "dining car".

The Embassy Suites we stayed at was pretty run of the mill, I thought. Nothing outstandingly bad or good about it. But Sam was awestruck. He thought it was the height of luxury. We stayed on the 8th floor, which contributed to the awe.

The train got in around 5pm, and we planned on taking a bus tour that night, that started at about 8pm, so we headed out not too long after we arrived, to go eat at Metro Center, where we were to meet the bus. In the main food court, underground, we didn't see anything that knocked out socks off, so we ended up getting some burgers, which were fine. We then walked around a little before heading up to meet the bus.

It was pretty cold, the weekend we went up, and even colder of course at night. The tour stopped at the 'new' WW II memorial (I'd never been there), Lincoln, Jefferson, and FDR memorials. Lincoln was packed, the others less so. We stopped by the Korean War memorial which is near Lincoln. That one is a bit creepy, actually, especially at night; a bunch of life-size statues of soldiers. The bus driver gave a 15-20 minute tour of FDR, which was pretty good, but it was at the end, we were tired, and it was freezing, so we enjoyed it less than we might have otherwise. At the end of the tour, they were going to take us back to Metro Center. Eventually. After dropping other people off at hotels, but not us, we were too far away. This was going to be a problem, as we probably weren't going to be back to Metro Center till midnight, and I had a 10 year old kid on my hands. The very last thing we did was to drive by the Iwo Jima memorial, so we were already in VA, and I asked the driver if he could drop us off near a metro stop. Which he did, and we were back at the hotel 10 minutes later.

The next morning, we left at about 9:15 or so to go to the International Spy Museum. Pretty fun museum, you could spend a lot of time in there. We ended up spending about an hour and half.

We then walked down the Archives, 'cause I wanted to see the Declaration of Independence, since I never had, and we had recently seen 'National Treasure'. They actually have the DoI, the Constitution, and the Bill of Rights, altogether in the main area of the Archives. While you can take photos, you can't use a flash, and it's quite dim, so it's basically impossible to get a decent shot. I tried anyway. In fact, I tried propping my camera up against a door to be able to get a long, stable, exposure, and a guard came up and asked me what I was doing. sigh.

After that, we went down to the National Gallery of Art, West wing, looking for "ugly baby" (a Van Gogh, I think, of a baby in a green-ish tint; quite ugly). But couldn't find it. We decided to christen a new painting as "ugly baby". We did find the guy being eaten by a shark. A classic.

We then got some lunch in the cafeteria in the basement between the East and West wings


Then wandered over the East wing, and Sam's reaction to 'modern' art was hilarious, pretty much what you would expect, especially with stuff like Rothko's: "A first grader could have done that!". At one point, I got a picture of him next to something, rolling his eyes.

After that, we headed over to the National Air and Space Museum, primarily because Sam wanted to see the "talking trashcans" that his brother had told him about. I thought I remembered they were outside, but we didn't see them. Peter told us later they were in the cafeteria. Oh well. We walked around a bit, kind of aimless. We went to a couple of hands-on exhibits that were kind of broken down. Sad.

We were pretty pooped by this time, but I thought we might have just enough energy for the Hirschorn, so I thought we'd at least try one level. Well. Sam loved it. I love it too. Lots of odd, bizarre stuff. That was the highlight of the day for him. I had never been in the 'basement' before, and they had some really strange, disturbing, stuff down there. Not too creepy, but, a little creepy.

Now we were seriously pooped. We headed back to the hotel, hung out for a while, then set out for dinner. We were right around the corner from King St, but a mile or so from the river. Not many restaurants right there, and nothing Sam was interested in. We ended up heading back and ate at Joe Theisman's restaurant, which was right next door to the hotel. Got to watch the Panther's blow their game against the Seahawks. Called Sandy from the restaurant, trying to tell her that Joe Theisman invited us over to his house for dinner, etc, etc. Sandy is a big Redskins fan, you see.

And that was about it. Left on the train the next morning, and the trip home seemed like it was even longer than coming up. The train was a lot more crowded. Odd that it was more crowded on a Monday morning than a Saturday morning.

Got a trip coming up with Peter soon, more on that after it happens.

Effective Java - the book

Bill Higgins, a colleague of mine, reminded me just now of a great Java book, Effective Java, by Joshua Bloch.

This is a must read for anyone writing classes which someone else is going to use. ie, APIs. Lots of great examples and anti-examples of how to design classes. Highly recommended.

Sunday, February 12, 2006

starting a blog ... again

So, for the umpteenth time I'm creating a blog. In particular, I had some links to some interesting material I found, and I didn't want to lose them, and thought a blog would be a good way to do that. Sure, there are other, better, ways, like, but that never really stuck with me either. Blogs are cheap, I figure I'll give it another whirl.

HTTP caching

I need to start taking a closer look at HTTP caching for my project at work. Ran into two pretty good references today: