Quantcast

Cross-writing

During the 1800′s, letters were sometimes writing using cross-writing to save paper. The Boston Public Library has been scanning letters written by Abolitionists prior to the American Civil War. Here is an example of a letter that shows two levels of cross-writing. However, it is written with a cursive slant so that the ascenders and descenders seem to form additional diagonal layers:

In 1890, Lewis Carrol wrote a booklet titled Eight or nine wise words about letter-writing where he warned against cross-writing:

My ninth Rule. When you get to the end of a note-sheet, and find you have more to say, take another piece of paper — a whole sheet, or a scrap, as the case may demand: but, whatever you do, don’t cross! Remember the old proverb ‘Cross-writing makes cross reading‘. “The old proverb?” you say, en-quiringly. “How old?” Well, not so very ancient, I must confess. In fact, I’m afraid I invented it while writing this paragraph! Still, you know, ‘old’ is a comparative term. I think you would be quite justified in addressing a chicken, just out of the shell, as ” Old boy ! “, when compared with another chicken, that was only half-out!

Proofs of a Conspiracy

Boing Boing has an excellent post on the Birth of the Illuminati, which traces the Illuminati conspiracy theory back to a book called Proofs of a Conspiracy against all the Religions and Governments of Europe, published in 1797.

Here is a scan of Proofs of a Conspiracy, scanned from the John Adams Library by the Internet Archive:

Signing Amazon Web Services API Requests in Python

I wanted to ping the “Amazon Product Advertising API” which now requires an HMAC signature, and the pyAWS library doesn’t sign requests and is no longer maintained. Here is some Python code to create a signed request:

# pyAWS no longer works with the AWS signed request requirement
# Sign an AWS REST request using the method described here
# http://docs.amazonwebservices.com/AWSECommerceService/latest/DG/index.html?RequestAuthenticationArticle.html
#_______________________________________________________________________________
def getSignedUrl(accessKey, secretKey, params):
 
    #Step 0: add accessKey, Service, Timestamp, and Version to params
    params['AWSAccessKeyId'] = accessKey
    params['Service']        = 'AWSECommerceService'
 
    #Amazon adds hundredths of a second to the timestamp (always .000), so we do too.
    #(see http://associates-amazon.s3.amazonaws.com/signed-requests/helper/index.html)
    params['Timestamp']      = time.strftime("%Y-%m-%dT%H:%M:%S.000Z", time.gmtime())
    params['Version']        = '2009-03-31'
 
    #Step 1a: sort params
    paramsList = params.items()
    paramsList.sort()
 
    #Step 1b-d: create canonicalizedQueryString
    # This code comes from http://blog.umlungu.co.uk/blog/2009/jul/12/pyaws-adding-request-authentication/
    # and the resulting discussion
    canonicalizedQueryString = '&'.join(['%s=%s' % (k,urllib.quote(str(v))) for (k,v) in paramsList if v])
 
    #Step 2: create string to sign
    host          = 'ecs.amazonaws.com'
    requestUri    = '/onca/xml'
    stringToSign  = 'GET\n'
    stringToSign += host +'\n'
    stringToSign += requestUri+'\n'
    stringToSign += canonicalizedQueryString.encode('utf-8')
 
    #Step 3: create HMAC
    digest = hmac.new(secretKey, stringToSign, hashlib.sha256).digest()
 
    #Step 4: base64 the hmac
    sig = base64.b64encode(digest)
 
    #Step 5: append signature to query
    url  = 'http://' + host + requestUri + '?'
    url += canonicalizedQueryString + "&Signature=" + urllib.quote(sig)
 
    return url

Fast manipulation of tar files using 7zip

tar always reads every byte in an archive (never calles seek()) and is very slow when trying to extract a single file from a large archive.

One solution is to use 7z instead, which is found in the p7zip-full debian package. For some operations, 7z is three orders of magnitude faster than tar. Here are some timings that illustrate how much faster 7z is:

#4.5GB archive listing using tar
time tar tvf fifteenthcensus00reel2149_jp2.tar 
 
(output suppressed, timing stable whether file cache is warmed or not)
 
real	2m1.876s
user	0m0.444s
sys	0m7.740s
 
#4.5GB archive listing using 7z with cold file cache
time 7z l fifteenthcensus00reel2149_jp2.tar 
 
(output suppressed)
 
real	0m10.419s
user	0m0.080s
sys	0m0.124s
5:28 PM
 
#4.5GB archive listing using 7z with hot file cache
time 7z l fifteenthcensus00reel2149_jp2.tar 
 
(output suppressed)
 
real	0m0.145s
user	0m0.052s
sys	0m0.040s
 
 
#extraction of last file in a 4.5GB archive using tar
time tar xvf fifteenthcensus00reel2149_jp2.tar fifteenthcensus00reel2149_jp2/fifteenthcensus00reel2149_0185.jp2
 
real	2m3.545s
user	0m0.436s
sys	0m7.824s
 
#extraction of last file in a 4.5GB archive using 7z and a hot file cache
time 7z e fifteenthcensus00reel2149_jp2.tar fifteenthcensus00reel2149_jp2/fifteenthcensus00reel2149_0185.jp2
 
real	0m0.104s
user	0m0.036s
sys	0m0.036s

The Church

The Internet Archive recently moved into a large Christian Science church.

church3

This is, of course, completely absurd.  I mean, the Archive now has a full pipe organ, and a gathering room filled with wooden pews, and a stained-glass skylight with a cross in the middle.

But apart from the absurdity, there is the reality that, from 1923 until just a few days ago, this church was a community center, a gathering place, a place to celebrate.  Some of us had the chance to meet some of the church members during their celebration and farewell service two weeks ago; this post will try to honor them.

Homemade cookies and chocolates

As we spoke, I was surprised that they didn’t seem very sad about leaving.  Frances pointed out to me that the building is simply a building, and that the people who attend are the real church: a true Christian sentiment.  They expressed gratitude that someone had come along to purchase the building.

Farewell service at the church

After the farewell service at the church

Where would they go now for community?  They said that it was up to each member to find her or his own way to other churches.

Hopefully, the Archive can carry on their spirit and grace.

The Archive recently moved into a (former) Christian Science church.

Remembering GeoCities and KickTam

After fifteen years, GeoCities is shutting down for good today. The Internet Archive has been working with Yahoo to make sure that the Wayback Machine has a complete, final snapshot of GeoCities before it goes offline.

The Archive Team, another archivist group run by Jason Scott of textfiles.com, is also archiving GeoCities. Jason created an under construction animated gif gallery to show the important cultural artifacts that we are going to lose with the GeoCities closure. I was browsing the gallery and found this little guy:

sesenshi-moonkawaii_construction

That penguin looks a lot like Tux, the linux penguin, but he’s actually the old-school mascot of QuickTime. I think his name was KickTam, which was how someone’s kid pronounced QuickTime. I forget the details.

KickTam isn’t used by Apple marketing, but you can sometimes find him hanging out with the QuickTime developers. He’s seen less and less everyday, and I was surprised to stumble upon him wearing a hard hat.

I think the only place that Apple still has KickTam up on apple.com is on the Letters from the Ice Floe page. Icefloe were tech notes that were useful to developers. I remember pointing people to Icefloe #19 every once in a while, and was surprised to see it is ten years old now. It seems the last Icefloe was written in 2001, and I imagine these will slip off apple.com soon, and KickTam will be gone from the net forever.

skylight at the internet archive’s new bldg

may

Book Server Launch at the Internet Archive

may

E-reader Taste Test

We’ve tried a bunch, but have yet to find one that is actually tasty.

Also, I started a new blog about the Archive.

Invasion of the Book Scanners

We’re on a roll scanning books, and are always running out of space for book scanners. I arrived at HQ today and found five book scanners that can been set up in the conference room over the weekend!

confroom

GIANT STEPS

We switched to Blue Bottle beans at work. Just look at this giant bag of Giant Steps!
photo-5

archive.org supports random seeking in their videos now!

i finally found the magic dead chicken to use
with flowplayer + lighttpd + mod_h264_streaming.
it turns out one needs to download and use an additional
.swf file to get the scrubber bar to send the
(already working) “?start=610″ (to start 610 seconds in)
parameter to our litey on archive.org

phew! yay!

Brewster Kahle’s terracotta army

just part

TikiTV In Action

Here are some old pics of Sam and Peliom vj-ing with the open source TikiTV software at the Timothy Leary Archives event at 111 Minna. Way fun!

IMG_5628

IMG_5631

IMG_5629

IMG_5633

Announcing the Open Library Blog

Check out the new Open Library Blog! Development on openlibrary.org is happening at a fantastic pace, and hopefully the new blog will help people keep up with new features of the site.

I set up the Open Library Blog with the new WordPress 2.7 beta, which is big improvement over the older WP version that we use on TikiRobot. The OL Blog also uses a clean, widget-capable theme, and a syntax highlighter plugin that is better maintained than the one we use on TR. The visual editor in 2.7 works well, which means I don’t have to install any markup plugins. I can’t wait to update TikiRobot to modern software and get rid of the clunky CSS that we’ve inherited!

Destination Earth

Here is a propaganda cartoon that the American Petroleum Institute made in 1956. Did you Mars was populated by Commies??

This film is part of the Prelinger Archives. Direct link to archive.org page.

Mesh Day!



Mesh Day!, originally uploaded by tiki.robot.

Mesh Day is tomorrow. Ralf is busy unpacking these cute little nodes
from open-mesh.com

coredump

Seen outside archive HQ:

Today, while trying to burn a Knoppix DVD, I had four senior programmers standing around me, helping with cdrecord command line options.

me: I can’t believe I’m wasting four other developer’s time trying to burn a dvd…
sam: That’s the difference between Linux and Windows.

Photo credit: Paul!

Gag Order Lifted on Internet Archive, NSL Withdrawn

EFF Press Release:

FBI Withdraws Unconstitutional National Security Letter After ACLU and EFF Challenge

Gag Order Lifted on Internet Archive, Allowing Founder to Speak Out for First Time

San Francisco – The FBI has withdrawn an unconstitutional national security letter (NSL) issued to the Internet Archive after a legal challenge from the American Civil Liberties Union (ACLU) and the Electronic Frontier Foundation (EFF). As the result of a settlement agreement, the FBI withdrew the NSL and agreed to the unsealing of the case, finally allowing the Archive’s founder to speak out for the first time about his battle against the record demand.

“The free flow of information is at the heart of every library’s work. That’s why Congress passed a law limiting the FBI’s power to issue NSLs to America’s libraries,” said Brewster Kahle, founder and Digital Librarian of the Internet Archive. “While it’s never easy standing up to the government — particularly when I was barred from discussing it with anyone — I knew I had to challenge something that was clearly wrong. I’m grateful that I am able now to talk about what happened to me, so that other libraries can learn how they can fight back from these overreaching demands.”

Internet Archive Brings Free Ultra High-Speed Internet to Public Housing

Go Brewster and Ralf!!

The Internet Archive, a San Francisco-based organization dedicated to preserving a record of the Internet and to increasing access to the Internet, today began offering free Internet service to public housing projects at speeds far greater than any other city resident can receive.

Valencia Gardens Housing, with 240 units, is the first area to be connected in a pilot project that expects to wire more than 2,500 units in the city in the next eight months, according to Internet Archive founder Brewster Kahle.

What makes the project unique is that the apartments will be connected to the Internet, and to the educational resources at the Internet Archive, at 100 megabits per second (Mbits/second). That speed contrasts sharply with the normal Internet service offered by telephone companies, which is usually less than 6 Mbits/second.

The residents can instantly view DVD-quality videos of the thousands of lectures and other educational information from the Internet Archive’s collections, as well as traditional Internet access.

The Internet Archive is able to achieve this high speed by connecting the San Francisco municipal fiber optic network, which runs through the public housing developments, to an Archive switching center, which connects to the Internet.

“We are pleased to be the first non-profit organization to bring public housing online,” Kahle said.

He added: “We are excited to see much faster access to the Internet as a way to experiment with advanced applications, and are pleased that the underserved get first access to advanced technology.”

See also: NYTimes Bits Blog, The Reg, Cnet article by Greeter Dan.

Using git For Large Scale Digital Archiving: An Outline

Here are some notes on how one might re-architect Internet Archive infrastructure to meet some additional goals:

  • easy to set up and replicate
  • provide versioning and transactions
  • handle more media types well
  • better ingest/locate/read apis
  • better search

The current architecture looks like this:
iaarch.png

The diagram is simplified a lot. There are currently about 1800 nodes in the cluster, most of which are storage nodes (low power 1U nodes with 4 1TB hard drives). The deriver nodes are used for crunching things like pdfs and h.264s, and there are about 300 of those. There are 5 www frontends, hidden behind a couple load balancers, and database server has at least one read-only secondary.

What I like about the current infrastructure:

  • Easy to add more storage. Some other archival solutions do not scale well, since they insist all hard drives be connected to the same machine. This starts to break down at the petabox scale.
  • Easy to add more bandwidth. Currently IA is pushing 5+Gbps of outbound bandwidth. Every storage node runs an Apache server, which lessens load on the homenode, which is a problem with other archival systems.
  • Database hits are not required to locate an item on the cluster. When an item is requested through the Locator service, a multicast is sent, and machines that have the item will respond. The lessens load to the DB server, which is important when getting thousands of web requests per second.

What I find interesting about the current infrastructure:
  • RAID is not used. Items are backed up on to a secondary machine when added to the archive.
  • This is mostly due to “RAID is hard to get right” and cost
  • This means there are two machines (and two apaches) ready to serve the same content.
  • One machine can be taken down for repair while the content is still online.
  • I would like to see use of either RAID or maybe RAID_Z

An idea on how to re-architect things using git as a storage backend to provide versioning and transactions
  • git is the version control system used for the linux kernel.
  • git is a totally new way to operate on data. Read this if you are a non-believer.
  • We could keep the infrastructure mostly the same as IA, but store items as git repositories. This would not be a large architecture change.
  • git would become a supported access protocol, in addition to http, ftp, and rsync. Backups could be simple a git pull. We could git clone the entire cluster.
  • We would get versioning!

Changes needed to repo.git to make it useful in an archive cluster:
  • Change reguser.cgi to tie into the existing user database (talk to dbserver)
  • Change regprog.cgi to work in a cluster environment. Repositories are inited in /{0-4}/items/id/id.git on a primary node (talk to catalog/homenode)
  • Use post-commit hook to queue backup and derive tasks (talk to catalog)
  • Change gitweb to show custom view of movie, audio, texts (book), and photo collections. Software collections would show standard gitweb view.

I don’t think this would take too long to implement, but I’m lacking co-conspirators these days.. Maybe when shag makes it to SF we will have to knock something out :)

Digitizing old books with large foldouts

This panorama is one of the first images from our test of digitizing books with foldouts. It is from this book. The full-size image is here.

These foldouts are hard to image.. This picture shows how it was placed under the camera.

Dr. Alexander Shulgin’s First Lab Notebook

Dr. Alexander Shulin’s first laboratory notebook has been scanned and put online.
p.jpg
If you want higher resolution images, check out the JPEG 2000 files here.

How to tunnel VPN over SSH

Today I had to use VNC to debug a remote machine, but firewalls were blocking VNC ports.

After I failed to get my VNC client (Chicken of the VNC) to use a SOCKS proxy, I was able to use SSH port forwarding to get it working. On your local machine type:

ssh user@remotehost -L 5900/localhost/5900

This forwards port 5900 on localhost to port 5900 on the remotehost. Then in Chicken of the VNC, open a new connection to localhost. That’s it! EEZ!

It turned out that Xorg was eating all available memory and invoking the oom killer. Sigh.

I guess I could have figured this out without VNC, but I couldn’t reproduce the bug locally, so I watched as a remote user was working on the machine.

Real-World Average Shutter Life for Canon 5D and 1Ds mkII

We take more than 5 million pictures every month using a pool of 250+ Canon 5D and Canon 1Ds mkII cameras. Recently, Jon was contemplating buying a 5D and wanted to know how long the shutter life was. Hey, we can answer that using real-world numbers!

The Canon 5D is rated for 50K+ shots, but they last much longer, and fail after an average of 150K shots.

The Canon 1Ds mkII is rated for 200K+ shots, but actually lasts for 750K shots before shutter failure!

Someone else might find these numbers useful.

Older Posts »