Archive of July, 2008
| 1993 | ||||||
|---|---|---|---|---|---|---|
| 2001 | ||||||
| 2002 | ||||||
| 2003 | ||||||
| 2004 | ||||||
| 2005 | ||||||
| 2006 | ||||||
| 2007 | ||||||
| 2008 | ||||||
| January | ||||||
| February | ||||||
| March | ||||||
| April | ||||||
| May | ||||||
| June | ||||||
| July | ||||||
| Su | M | Tu | W | Th | F | Sa |
| 1 | 2 | 3 | 4 | 5 | ||
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| 20 | 21 | 22 | 23 | 24 | 25 | 26 |
| 27 | 28 | 29 | 30 | 31 | ||
| August | ||||||
| September | ||||||
| October | ||||||
| November | ||||||
| December | ||||||
| 2009 | ||||||
| 2010 | ||||||
Say What You Mean, Mean What You Say
July 25, 2008:
I came home Wednesday night to a note stuck to my door from one of my neighbors. Apparently Ginger barks quite a bit when I leave, and she's hearable from the parking lot (that part I knew about). Unfortunately, my desire to be neighborly has been tempered by the fact that my neighbor is probably an idiot.
Y'see, the note said that my dog "literally barks for hours" when I leave. Now, I know that she barks, but I seriously doubt that she barks for hours, or even an hour. Though I'd be willing to be proven wrong.
It's an unfortunate effect of people hearing words and not thinking about what they mean. And more unfortunately the practice is widespread. People being people, they're sometimes prone to hyperbole and exaggeration. So if someone tells you "your dog barked for hours" you'd think they were doing it for effect -- it barked for a noticeably long time, and it was annoying.
So, you add the word "literally" to mean that you aren't exaggerating: You wish your words to be taken literally. There is no embellishing; you mean exactly what you're saying. People take this lack of exaggeration seriously -- if a usually hyperbolic statement is actually true as spoken that's a big thing.
But stupid people noticed this, and decided "literally," instead of meaning what it sounds like it means, is a modifier like "big-ass." So you get ridiculousness like, "he was literally green with envy." It gotten to the point that, barring other evidence, I assume anyone using "literally" is probably dumber than driveway gravel.
I figure the dog barks for half an hour, 45 minutes tops. I still need to fix it of course, but now I'm fixing it for my retard neighbor instead of my intelligent neighbor.
Edit, 7/25, 6:52 PM: If this looks more like a draft that usual, that's because it is. I meant to come back to his one later, before it went live. Oops. Oh well, might as well add that for the last two days I've left the TV on while I was out, and no more nastygrams on the door. I guess that means it's working.
More Than I Ever Wanted to Know About Character Encoding
July 21, 2008:
This was on my Facebook news feed for a few days, so you can guess I spent a lot of time at work on it. You can find most of this information on Wikipedia, but I'm collecting it here, mostly for my own benefit later on.
Unicode
First off, let's cover what Unicode is and isn't. Unicode is basically a list of "code points" that represent concepts like letters, Chinese words, Japanese syllables, and stuff that has nothing to do with language like musical notes.
So I can say that a capital letter A is code point 65, or 0x41 in hexadecimal. 0x3041 is the hiragana syllable "あ", 0x2E9D is the CJK (combined Chinese, Japanese and Korean characters) radical for "moon." As a non-language example, 0x01D11E is a G Clef. Lots of stuff to worry about.
But that's it, really. Unicode says nothing about how you get those code points from Point A to Point B. To do that you need to encode the numbers, and that's where a couple days' worth of time sink came into play.
Character Encoding
Well, great. We have these code points but now we have to send them over the wire. In theory we can just do a straight dump of a bunch of binary data, but we run into a problem. Look back at that Japanese character: 0x3041. It's going to come in as two bytes: 0x30 0x41. Is that hiragana, or a zero followed by a capital A? Maybe we can figure it out from the language but even that's no guarantee.
We need to work a little magic on these code points to make them unambiguous.
UTF-16
Most operating systems these days use UTF-16 to represent Unicode characters. Just take the code points and send them as 16-bit words. For everything other than English this works just fine; most of the rest of the world has been using double-width characters for years anyway. There's no way to mistake the hiragana "あ" for a "0A" combo, because the Japanese is sent as 0x3041, while the others would be sent as 0x0030 0x0041.
That also demonstrates the one real drawback: if you're doing a lot of stuff in English you "waste" bytes by sending 7-bit ASCII as 16 bits. You also have to worry about endianness, which tells you which byte within a word is sent first -- when that hiragana character is broken up for transfer, both 0x30 0x41 and 0x41 0x30 are legitimate; you have to figure out which order you're getting your bytes in. In theory your receiving program will take care of this for you. In theory.
It's also a minor pain in the butt if you leave the Basic Multilingual Plane (BMP), the range between 0x00 and 0xFFFF. Our G clef would get sent as a pair of 16-bit words. If you're in the 0x01000 to 0x10FFFF range:
- Subtract 0x01000 from your character. This will drop you to something between 0x000000 and 0x0FFFFF.
For the G Clef this gives 0x00D11E. - Take the right 20 bits (0x00000 to 0xFFFFF) and split then into two groups of 10.
For the G Clef you take your bits (0000.11010001.00011110) and split them up into (00.00110100) and (01.00011110). - To your first set of bits, add 0xD800.
(11011000.00000000 | 00000000.00110100 = 11011000.00110100 = 0xD834)
To the second set of bits, add 0xDC00.
(11011100.00000000 | 00000001.00011110 = 11011101.00011110 = 0xDD1E)
Unicode has blocked out code points 0xD800 though 0xDFFF so there will never be collisions. - Now you just have to decide on endianness and send.
Seems kinda complicated. But we're only getting started.
UTF-8
UTF-8 kills two birds with one stone: It lets you use single bytes for ASCII, and it gets around byte order problems by sending each byte individually in order.
Since we're dealing with individual bytes we need to tell the receiving system how many we plan on using for a given character. We do this by setting the high bits of the first byte.
- For 1 byte: Starts with 0 (0xxxxxxx)
- For two bytes: Starts with 110 (110xxxx)
- Three bytes is (1110xxxx) and so on up to 6 bytes (1111110x)
Remaining bytes in a multi-byte character are formatted (10xxxxxx) -- since single-byte characters start with 0, there can never be collisions.
Since we start with a 0 for single bytes, that means we can only send up to 0x7F as normal ASCII. Everything else gets encoded the same way, regardless of the number of bytes:
- Break your character number up into groups of 6 bits starting on the right:
あ = 0x3041 = (0011)(000001)(000001)
G Clef = 0x1D11E = (00)(011101)(000100)(011110) - To each byte group but the first, attach (10000000):
あ = (0011)(10000001)(10000001)
G Clef = (00)(10011101)(10000100)(10011110) - To each first byte, set one bit for byte you're sending, then a 0, then zero-pad what's left to fill in the byte:
あ = (11100011)(10000001)(10000001)
G Clef = (11110000)(10011101)(10000100)(10011110)
That's all, folks. OK, I over-simplified the process a little (it would be possible to "bump" up into an extra byte if you have the right character) but that's the basic idea.
Our final characters in UTF-8, then, are:
A = 0x41
あ = 0xE3 0x81 0x81
G Clef = 0xF0 0x9D 0x84 0x9E
Downside: If you're doing a lot with Asian languages you wind up using 3 bytes instead of 2, increasing file size by 50%.
But What About E-mail, Smart Guy?
Oh, you didn't think we were done, did you? Now we have to e-mail that crapola. The body of the e-mail is fine, because we can just use HTML entities like ぁ for the あ. The subject is a different matter though, not being 8-bit clean. You have to work one more layer of voodoo on your UTF-8 if you have anything represented as multi-byte.
OK, just like before, we're gonna take our (now-converted) bytes and chunk 'em into groups on 6 bits, but now we're gonna start at the beginning instead of the end.
あ = 0xE3 0x81 0x81 = (111000)(111000)(000110)(000001)
G Clef = 0xF0 0x9D 0x84 0x9E = (111100)(001001)(110110)(001001)(100111)(10----)
We now have a bunch of groups that can store anything between 0 and 63. If only we had some way of representing this as a set of columns, like base-10 is 10^0 (ones place), then 10^1 (tens place), then 10^2 (hundreds place) and so on. It'd be like... base-64!
And that's what Base64 does. 0-25 are capital letters, 26-51 are lower case letters, 52-61 are 0-9, 62 is + and 63 is /. Base64 conversions have to come in groups of 4 characters, so if you have any gaps, like in the G Clef conversion, you pad it out with equals signs, so the decoding computer knows not to insert NULL characters.
So now we have:
あ = (111000)(111000)(000110)(000001) = (56)(56)(6)(1) = 44GB
G Clef = (111100)(001001)(110110)(001001)(100111)(10----) = (60)(9)(54)(9)(37)(32) = 8J2Jlg==
OK, we've done the Base64 conversion. Now what? How does the receiving mail program know I didn't have a seizure while I was typing?
By a magic number. Magical character sequence, in this case: =?UTF-8?B?44GB8J2Jlg==?=
That complicated mess gives the character encoding (UTF-8) and tells how it's encoded itself (B = Base64). The =? and ?= are markers to let us know when it starts and ends; you can have other stuff outside the escaped stuff. So if I wanted to send an e-mail with the subject of Aあ it might come through as A=?UTF-8?B?44GB?=. It might also get encoded with the A inside the Base64 stuff, which would look different: =?UTF-8?B?QeOBgg==?=
So What Was the Big Deal?
Well, I haven't had a chance to learn ASP.NET yet, so all my web sites are still written in ASP 3.0, which first came out about 10 years ago. ASP 3.0 is based on VBScript, which is based on VB6, which means it does 16-bit integers. That can cause problems if you're dealing with code points above 32,000 or so, since negative numbers don't map to anything.
I already had code in place to handle stuff in the 0x7F to 0xFF range, so I could convert from Windows-1252 characters to Unicode. Since my sites are all served as either Windows-1252 or Latin-1, any other characters got converted by the browser when the user hits send.
That meant I had a bunch of text, with HTML entities in it. So I hacked together something quickly to parse it out. This bypasses the whole Int16 problem by just doing UTF-8. The part where I converted the UTF-8 to Base64 is left as an exercise for the reader.
Please note I'm not saying this is optimal. I'm just saying it works. As an FYI, the "and" keyword in VBScript is bitwise, and is the only bit operator (aside from "or") available to me. Everything else, like bit-shifting, has to be done mathematically.
for i = 1 to Len(Text) ' string position is 1-indexed
if Mid(Text, i, 2) = "" then
TmpString = ""
j = i + 2
while IsNumeric(Mid(Text, j, 1))
TmpString = TmpString & Mid(Text, j, 1)
j = j + 1
wend
TmpString = CLng(TmpString)
if TmpString < 2^7 then
' U+000000 to U+00007F (ASCII)
RetString = RetString & Chr(TmpString)
elseif TmpString < 2^13 then
' U+000080 to U+0007CF (Unicode, includes Arabic and non-ASCII Eurpoean)
RetString = RetString & Chr(&hC0 + ((TmpString and &h07C0) / 2^6)) & _
Chr(&h80 + (TmpString and &h003F))
elseif TmpString < 2^16 then
' U+000800 to U+00FFFF (Unicode, includes Chinese and Japanese)
RetString = RetString & Chr(&hE0 + ((TmpString and &hF000) / 2^12)) & _
Chr(&h80 + ((TmpString and &h0FC0) / 2^6)) & _
Chr(&h80 + (TmpString and &h003F))
elseif TmpString >= 2^16 then
' doesn't handle U+010000 to U+10FFFF, but doesn't really need to.
end if
i = j
else
RetString = RetString & Mid(Text, i, 1)
end if
next
Text = RetString
Now wasn't that fun?
I Owe My Neighbors a Beer
July 18, 2008:
It's damn handy to meet people who've been in the area for a while and know where all the cool stuff is.
One of the people at the dog park has lived around here... pretty much his whole life, I think, and has made two recommendations that make my stomach happy and my scale unhappy.
While the dogs run around and play, the humans tend to congregate and shoot the breeze. Once, while I was griping about the only breakfast around being at McDonald's or IHOP, he stepped in and pointed out the Yorkshire Restaurant. (No link, they have no web site.) Not the greatest food on earth, but it's country cooking with biscuits, country ham, sausage gravy, and the SOS I mentioned a little while ago. They serve other meals, but I've never tried them.
It's become part of my Saturday routine -- I eat breakfast there while I wait for places like Weber's or Anime Pavilion to open.
Last week I was complaining again -- I guess I do that a lot -- about my coffee snack options being limited to Starbucks. There's a Dunkin' Donuts right across the street, but I can't go there due to the terrorist scarf idiocy (scroll down to "Wardrobe Correctness" if you're not familiar with that story). Well, he told me that Shoppers grocery stores have a wonderful thing called the Colossal Donut that sells for 50¢ and is about two and a half inches thick.
They are truly wonderful confections that I probably shouldn't know about. But still, these are two things that I wouldn't have known about otherwise. Maybe being on good terms with one's neighbors can be a good thing after all.
Movie Review: Hancock
July 16, 2008:
Over the July 4th weekend I saw the Will Smith movie Hancock. I was a little iffy on this one; it had been pretty heavily panned.
Don't ask me why, it was a good movie. In most superhero movies, the good guys do everything right, and the public loves them. Anyone who doesn't (like in the X-Men titles) is a bigot of one kind or another.
But what if the hero is a fuck-up? What if his helping made things worse? Would he still be adulated? Or would people on the street call him an asshole?
In Hancock they call him an asshole. Frequently. And quite frankly, he is. But he's a redeemable asshole, and after meeting up with a PR guy (who he saves from a train by chucking his car on top of someone else's, then stops the train causing a derailment) and his son he tries to better himself by voluntarily going to prison on several warrants for destruction of property and the like.
Of course, with the superhero, even a lousy one, out of the picture crime skyrockets. And Hancock is released to help, and tries to act like a superhero. Even puts down the booze. And when he rescues some people without leveling a city block, people actually cheer him.
So yeah, it's a cliche. But it was done pretty well, and you have to love the image of a little kid calling a superhero an asshole to the guy's face.
(Recurring line: "Call me an asshole one more time." People frequently do, with cartoon-violent, humorous results.)
This is a superhero movie, so you're going to need to leave any knowledge you have of physics at the door. Momentum does not exist in Hancock's world, and we are all entertained for it.
There is a bit of a twist, and my explanation turned out to be wrong. There was one character who either got lucky or knew more than he should have. As the movie's main villain he got short shrift on screen time; I think there may be some scenes that explain him better and may help explain is role in the end of the movie. The second half of the movie felt rushed, like the director was under pressure to cut it to the typical 90 minutes, when a 1:45 or even two hour runtime probably would have been better.
And then there's the big issue some people seem to have: The explanation of where Hancock got his powers is lacking, they say. Well, it is. Because he can't remember them. Another character explains to him (and us -- amnesia is a great excuse for exposition) how things really went down, and it makes sense. At least as much sense as a superhero movie makes, anyway.
So yeah, a lot better than I thought it would be. Not as good as Iron Man but about as good as The Incredible Hulk. If you were OK with spending $10 on Hulk you'll be cool with $10 got Hancock. But see it as a matinee if nothing else. And of course wait for the inevitable video game, comic book and sequel.
Turnaround Time
July 14, 2008:
One of the webcomics I read doesn't really sell anything, but does offer computer desktop images in return for donations. So from time to time I "buy" a wallpaper JPEG.
Since I don't see any point in giving PayPal a cut for doing next to nothing in exchange for a handful of bits, I send my donation via snail mail. In order to avoid just sending an envelope full of cash (and to provide an e-mail address for the author to send the image to) I wrap the money in a short letter.
About half the time, the author responds to something in the letter in the wallpaper e-mail. But this person is always prompt.
I wrote my letter on Saturday afternoon, and put it and the money in an envelope. Then the techie part of my brain disassociated from the nature of the USPS -- by Sunday afternoon I was waiting for a reply to my letter. That was still sitting in the bottom of the mailbox, not picked up until Monday afternoon.
(Just to wrap up, the letter would have been picked up on Monday, and probably didn't get to its destination until after the July 4th weekend. I got my e-mail on Wednesday the 9th, 7 USPS-days after it was sent. Given that the comic author provides a PO box instead of a street address, for obvious reasons, I'd say that's pretty punctual for a transfer protocol nicknamed "snail mail.")
WANT
July 11, 2008:
On Discovery Channel the other day, I saw two great shows: In the Shadow of the Moon and When We Left Earth: The NASA Missions. Even on my eight-year-old CRT TV, they looked incredible. I need to get them in hi-def.
Which means buying them on Blu-Ray. Which means buying a Blu-Ray player -- either a dedicated player, a PS3, or maybe a next-gen XBox 360, assuming they shift from DVD-HD support.
But then I'd still be showing it on my old TV, on which Blu Ray would do no good. So I'd need a nice HDTV, preferably at least a 40" display.
And that TV would need someplace to go, and my apartment, as it's laid out, is short of places to put something that size. So I need to buy a house, which I still plan on doing in the spring.
That's a big goddamn purchase just for a couple of documentaries.
Probably Time To Start that Diet Again
July 09, 2008:
From my Facebook status feed:
Jason has no willpower. And half a dozen donuts. 12:16pm
Jason still has no willpower. And no longer has the donuts. (But does have an uncomfortably full stomach). 1:46pm
(Sigh.)
Nostalgia
July 07, 2008:
While out for breakfast Saturday morning, I decided that I was in the mood for something I hadn't eaten in probably two decades -- SOS. The place I go serves it up as part of their breakfast menu, so I partook.
Now, SOS is usually prepared because it's cheap and easy. And, that's probably why Dad made it from time to time when my sister and I were kids. But his version had a lot more of the chipped beef in it, compared to this place's which was mostly gravy. Still good, mind you, but white gravy by itself doesn't confer much taste.
But I was happy to have had some old-style comfort food, and drove off to finish my weekend errands. The 80's station on Sirius was doing their Top 40 countdown; this week they were doing 1981. "Angel of the Morning," by Juice Newton, was on.
Before I go any farther, let me make this very clear: That song, as well as everything else in Juice Newton's repertoire, is crap. But Mom listened to it back then, so I'm at least familiar with it, unlike most of the other the-70s-aren't-quite-dead-yet garbage they played.
But then the illusion was shattered when I saw that gas was just slightly more than the 95¢/gallon I was expecting.
Now I need to go online and dig up some 80s music -- "Whip It," some stuff by Power Station/Robert Palmer, that kind of thing. This may take some doing.
"Happy Birthday, Dear America..."
July 04, 2008:
Using the official date of July 4th, even though the original document itself wasn't signed by everybody until August, this is the United States of America's 232nd birthday. Or the 0503rd, or the 0x143rd, or the 0b101000011st, depending on your preferred numerical base.
Likewise, I have a friend who turns 0x21 tomorrow. I doubt this person would feel the need to take a redo on turning 21, though. Still: Happy birthday!
But anyway, getting back to the original topic of Independence Day: Take a moment to reflect on the principles this country was founded on as you shuffle between cookouts and baseball games and fireworks shows. Then think about maybe electing someone in the fall who might actually believe in them himself.
Facebook Wants
July 02, 2008:
I've gotten semi-addicted to Facebook lately, thanks to a critical mass of my friends being on it. There's one small place where it could stand improvement, though.
Currently, I can lock down some parts of my profile to be friends-only. That would be most of it; I've got my address on there for starters. This means that anyone I "friend" (that word is not supposed to be a verb) knows where I live.
Well, what about people I'm OK with, but don't necessarily want to see randomly show up at my apartment? There needs to be a second level of friend. Just call it what it is -- acquaintance. They show up in the friend list like everyone else; they just don't see things that are friends-only (as opposed to non-public, which both friends and acquaintances would see).
This would require some back-end work on Facebook's part -- the friend access flag (probably a yes/no bit) would need to change to a tinyint (8 bit) at least, possibly a 32-bit int if the conversion is more expensive than 3 bytes of database space, as would the flag in the friends list. Then there'd be testing of... oh... pretty much everything on the site. Including all the third-party widgets, though they could probably get by if all they're doing is seeing if Column1 = you and then getting a list of who's in Column2. (Assuming, of course, that the database has a table that just lists friend combos, which makes sense when I look at it quickly.)
But anyway, this would let me "friend" people I know, without putting some stuff out there I only want my (close) friends seeing. It'd just be a change that affects everything on the site. No problem, right?