Quantcast

Destination Earth

Here is a propaganda cartoon that the American Petroleum Institute made in 1956. Did you Mars was populated by Commies??

This film is part of the Prelinger Archives. Direct link to archive.org page.

coredump

Seen outside archive HQ:

Today, while trying to burn a Knoppix DVD, I had four senior programmers standing around me, helping with cdrecord command line options.

me: I can’t believe I’m wasting four other developer’s time trying to burn a dvd…
sam: That’s the difference between Linux and Windows.

Photo credit: Paul!

Gag Order Lifted on Internet Archive, NSL Withdrawn

EFF Press Release:

FBI Withdraws Unconstitutional National Security Letter After ACLU and EFF Challenge

Gag Order Lifted on Internet Archive, Allowing Founder to Speak Out for First Time

San Francisco - The FBI has withdrawn an unconstitutional national security letter (NSL) issued to the Internet Archive after a legal challenge from the American Civil Liberties Union (ACLU) and the Electronic Frontier Foundation (EFF). As the result of a settlement agreement, the FBI withdrew the NSL and agreed to the unsealing of the case, finally allowing the Archive’s founder to speak out for the first time about his battle against the record demand.

“The free flow of information is at the heart of every library’s work. That’s why Congress passed a law limiting the FBI’s power to issue NSLs to America’s libraries,” said Brewster Kahle, founder and Digital Librarian of the Internet Archive. “While it’s never easy standing up to the government — particularly when I was barred from discussing it with anyone — I knew I had to challenge something that was clearly wrong. I’m grateful that I am able now to talk about what happened to me, so that other libraries can learn how they can fight back from these overreaching demands.”

Internet Archive Brings Free Ultra High-Speed Internet to Public Housing

Go Brewster and Ralf!!

The Internet Archive, a San Francisco-based organization dedicated to preserving a record of the Internet and to increasing access to the Internet, today began offering free Internet service to public housing projects at speeds far greater than any other city resident can receive.

Valencia Gardens Housing, with 240 units, is the first area to be connected in a pilot project that expects to wire more than 2,500 units in the city in the next eight months, according to Internet Archive founder Brewster Kahle.

What makes the project unique is that the apartments will be connected to the Internet, and to the educational resources at the Internet Archive, at 100 megabits per second (Mbits/second). That speed contrasts sharply with the normal Internet service offered by telephone companies, which is usually less than 6 Mbits/second.

The residents can instantly view DVD-quality videos of the thousands of lectures and other educational information from the Internet Archive’s collections, as well as traditional Internet access.

The Internet Archive is able to achieve this high speed by connecting the San Francisco municipal fiber optic network, which runs through the public housing developments, to an Archive switching center, which connects to the Internet.

“We are pleased to be the first non-profit organization to bring public housing online,” Kahle said.

He added: “We are excited to see much faster access to the Internet as a way to experiment with advanced applications, and are pleased that the underserved get first access to advanced technology.”

See also: NYTimes Bits Blog, The Reg, Cnet article by Greeter Dan.

Using git For Large Scale Digital Archiving: An Outline

Here are some notes on how one might re-architect Internet Archive infrastructure to meet some additional goals:

  • easy to set up and replicate
  • provide versioning and transactions
  • handle more media types well
  • better ingest/locate/read apis
  • better search

The current architecture looks like this:
iaarch.png

The diagram is simplified a lot. There are currently about 1800 nodes in the cluster, most of which are storage nodes (low power 1U nodes with 4 1TB hard drives). The deriver nodes are used for crunching things like pdfs and h.264s, and there are about 300 of those. There are 5 www frontends, hidden behind a couple load balancers, and database server has at least one read-only secondary.

What I like about the current infrastructure:

  • Easy to add more storage. Some other archival solutions do not scale well, since they insist all hard drives be connected to the same machine. This starts to break down at the petabox scale.
  • Easy to add more bandwidth. Currently IA is pushing 5+Gbps of outbound bandwidth. Every storage node runs an Apache server, which lessens load on the homenode, which is a problem with other archival systems.
  • Database hits are not required to locate an item on the cluster. When an item is requested through the Locator service, a multicast is sent, and machines that have the item will respond. The lessens load to the DB server, which is important when getting thousands of web requests per second.

What I find interesting about the current infrastructure:
  • RAID is not used. Items are backed up on to a secondary machine when added to the archive.
  • This is mostly due to “RAID is hard to get right” and cost
  • This means there are two machines (and two apaches) ready to serve the same content.
  • One machine can be taken down for repair while the content is still online.
  • I would like to see use of either RAID or maybe RAID_Z

An idea on how to re-architect things using git as a storage backend to provide versioning and transactions
  • git is the version control system used for the linux kernel.
  • git is a totally new way to operate on data. Read this if you are a non-believer.
  • We could keep the infrastructure mostly the same as IA, but store items as git repositories. This would not be a large architecture change.
  • git would become a supported access protocol, in addition to http, ftp, and rsync. Backups could be simple a git pull. We could git clone the entire cluster.
  • We would get versioning!

Changes needed to repo.git to make it useful in an archive cluster:
  • Change reguser.cgi to tie into the existing user database (talk to dbserver)
  • Change regprog.cgi to work in a cluster environment. Repositories are inited in /{0-4}/items/id/id.git on a primary node (talk to catalog/homenode)
  • Use post-commit hook to queue backup and derive tasks (talk to catalog)
  • Change gitweb to show custom view of movie, audio, texts (book), and photo collections. Software collections would show standard gitweb view.

I don’t think this would take too long to implement, but I’m lacking co-conspirators these days.. Maybe when shag makes it to SF we will have to knock something out :)

Digitizing old books with large foldouts

This panorama is one of the first images from our test of digitizing books with foldouts. It is from this book. The full-size image is here.

These foldouts are hard to image.. This picture shows how it was placed under the camera.

Dr. Alexander Shulgin’s First Lab Notebook

Dr. Alexander Shulin’s first laboratory notebook has been scanned and put online.
p.jpg
If you want higher resolution images, check out the JPEG 2000 files here.

How to tunnel VPN over SSH

Today I had to use VNC to debug a remote machine, but firewalls were blocking VNC ports.

After I failed to get my VNC client (Chicken of the VNC) to use a SOCKS proxy, I was able to use SSH port forwarding to get it working. On your local machine type:

NOCODE:
  1. ssh user@remotehost -L 5900/localhost/5900

This forwards port 5900 on localhost to port 5900 on the remotehost. Then in Chicken of the VNC, open a new connection to localhost. That’s it! EEZ!

It turned out that Xorg was eating all available memory and invoking the oom killer. Sigh.

I guess I could have figured this out without VNC, but I couldn’t reproduce the bug locally, so I watched as a remote user was working on the machine.

Real-World Average Shutter Life for Canon 5D and 1Ds mkII

We take more than 5 million pictures every month using a pool of 250+ Canon 5D and Canon 1Ds mkII cameras. Recently, Jon was contemplating buying a 5D and wanted to know how long the shutter life was. Hey, we can answer that using real-world numbers!

The Canon 5D is rated for 50K+ shots, but they last much longer, and fail after an average of 150K shots.

The Canon 1Ds mkII is rated for 200K+ shots, but actually lasts for 750K shots before shutter failure!

Someone else might find these numbers useful.

theinfo.org: for people with large data sets

Aaron Swartz has launched theinfo.org, a wiki for people who crawl and analyze large datasets.

This is a site for large data sets and the people who love them: the scrapers and crawlers who collect them, the academics and geeks who process them, the designers and artists who visualize them. It’s a place where they can exchange tips and tricks, develop and share tools together, and begin to integrate their particular projects.

Creative Commons launches new CC Zero License

May, Shag, and I went to the Creative Commons 5-Year Birthday Party and got to hear Lawrence Lessig announce a new CC License, CC Zero. Licensing a work under CC0 is similar to placing it in the Public Domain, but CC0 is meant to work better internationally. Did you know Germany (and maybe other EU countries) don’t allow authors to dedicate their own works into the public domain? I’m glad smart people are working on this problem!

The CC 5-Year party was cool, but the sound system was turned down so low that it was hard to hear Lessig and Gilberto Gil. Fortunately, DJ Spooky turned up the sound for his set, but that caused others to complain that they could no longer talk over the music.

Here’s the press release and wiki page for CC0 (where I got the CC0 image above). The tool to generate the CC0 machine-readable license should be available on Jan 15.

Lawrence Lessig’s blog has a list of the amazing number of announcements at the CC 5-Year party.

Behind the scenes at the Internet Archive

When you are building a digital library to provide Universal Access to Human Knowledge, how to you hold all the data?

You start with a few racks of machines to hold the data using redundant storage:

The red boxes are built by Capricorn. Each one is a 1U half-depth low-power server that can hold four 1TB hard drives:

Add a bunch of homemade routers:

And some BigIron: (this thing pushed 6Gb/s today!)

Now you need to power it up:
IMG_3715.JPG

And cool it down:
IMG_3705.JPG

And fill it with books:
IMG_3714.JPG

For some reason, you need a 1980’s-era Connection Machine:
IMG_3712.JPG

Finally, no Archive is complete without a world-class Linux kernel hacker:
IMG_3703.JPG

IMG_3711.JPG

Creative Commons Radio from archive.org

Shag and I are testing a netradio station, streaming CC ShareAlike-licensed tracks from archive.org. Give these a listen and let us know what you think!

Ambient Drone Electronic Folk Indie IDM Pop

Ambient:
Drone:
Electronic:
Folk:
Indie:
IDM:
Pop:

Something for you to listen to: CC-licensed ShareAlike Albums

Here are roughly 2000 albums, all licensed under a Creative Commons ShareAlike license. Tons of awesome stuff for you to listen to, all tagged by genre, artist, and label!

The code used to make the list of albums can be found at the NetLabelShareAlike wiki page.

What Will Libraries Look Like in the Future?

For the Open Content Alliance meeting two weeks ago, the conference room at the Internet Archive HQ was transformed into a prototype library that will soon be open to the public. Here are some pictures of what Brewster calls the Open Library.

When you enter, you are greeted with a sign that explains the library:

This is a prototype library of the future that has access to millions of books, videos, and audio items from thousands of libraries worldwide. This library fits into a small room but still can house music, videos, one of a kind or popular books, and a librarian. It has download capabilities for patrons with music players, e-books, audio books and storage devices, and a Print on Demand machine that can print and bind a book in ten minutes.

The purpose of the open library is to provide universal access to all published knowledge. By using digitizing equipment, computer storange, and the Internet, we can realize the dream of the Library of Alexandria.

IMG_1688.JPG

When you walk in, the first thing that grabs your attention is the Espresso Book Machine, which can print a book and bind a book in about ten minutes.

The EBM completely changes the physical structure of the library. Using the public access terminal in the library or your own laptop, you can order one of the 200,000+ books from the Internet Archive book collection. It takes about five minutes of preparation and another five minutes of printing, and then a perfect-bound book shoots out of the machine. Here is some video of the EBM in action.

Even though this prototype library is pysically quite small, it has a collection larger than 80% of the libraries in the US. The Internet Archive book collection is growing at a rapid pace (15,000 books a month and rising). Soon, this might be the largest library in the world, and you will be able to put one in every town!

IMG_1699.JPG

In the two pictures above, you can see the ingredients of the Library of the Future:

  • Librarian’s Desk
  • Ten Minute Press
  • A public internet terminal, for ordering books form other libraries, printing books out, and filling up your iPod/ebook reader.
  • One-of-a-Kind Books, including:
  • E-Book Readers, in this case, the OLPC
  • Banned Books
  • Foreign-language books
  • Local-interest and technology books
  • 78 rpm records, and other non-book material
  • A comfy chair

What do you think? Anything we should add to the prototype Open Library?

Liveblogging an Ubuntu 7.10 installation

Photo_102.jpg

Bob, Shag and I are trying to move our book scanning hardware to Ubuntu 7.10 - the Gutsy Gibbon. It’s a ridiculous process, and our hardware is crap. Here are some notes:

  • chai:20 (4:20) - Started up the installer app on the live cd. Unfortuantely the screen rez is 800×600, so we can’t see the important back/next/ok buttons on the bottom of the installer panel. What kind of installer requires greater than 800×600 screen rez?
  • chai:23 - Somehow, by logging the Live CD user out and fucking with the screen rez, we got the screen to display a larger screen res, but we can’t see the entire desktop on our screen. Moving the mouse around seems to pan the desktop, which would kinda work, if we could see the mouse cursor.
  • chai:25 - We are asked for the timezone, and San Francisco isn’t one of the available options. Los Angeles is. However, we opt to move to La Paz.
  • chai:30 - It is now officially time for chai.
  • chai:40 - We have found that starting a lot of xeyes processes lets us estimate where the invisible mouse cursor should be. There are fifty eyeballs on our screen
  • chai:45 - Bob starts playing minesweeper
  • chai:48 - Someone figures out that this version of xeyes lets us resize the window, so there is a GIANT EYEBALL staring at me
  • chai:50 - Installation done, rebooting!
  • Mouse works after reboot! Now to try and scan books!

Photo_101.jpg

Photo_10.jpg

Pics from the Prelinger Library

We went to the Illuminated Corridor event, Prelinger on Prelinger, at the Prelinger Library last night. Lots of video art! Some pics:

linky to pics on flickr

The Copyright Database has been set Free

Rick Prelinger of the Internet Archive, along with university libaries and other public interest groups, asked the Register of Copyrights to free the copyright cataloging database, which sells for $86,625.

Although the Copyright Office has decided to continue charging for the database, the fine folk at public.resource.org has set the copyright database free!

Archivists and researchers will be happy tonight! Download away!

Video of the Espresso Book Machine printing a book!

This is the first time I got the Espresso Book Machine to print and bind a book without human intervention! I happend to capture a video of Flatland being printed. Very cool!


(click play to start) (link to other sizes)

Video and Pics of the Espresso Book Machine

Here is a short video of a test run of the Open Content Alliance’s Espresso Book Machine, an automatic print-on-demand robot that makes perfect-bound paperback books. The Espresso Book Machine was created by On Demand Books.

This video was shot during configuration of the machine, so you can see the printing/binding process, but the book gets stuck and comes out mangled.. I’ll upload another video after the machine is set up..


(press play to start video) (link to other sizes)

IMG_0994.JPG

IMG_0995.JPG

IMG_0998.JPG

Backup genny? We don’t need no backup genny!

It seems like half the net just got knocked out by six back-to-back power outages in downtown San Francisco. A bunch of great sites went down: archive.org, craigslist, LJ, yelp. Did Slide go down, too?

A bunch of our racks are still powered down…

Announcing the Open Library!

Announcing The Open Library!

What if there was a library which held every book? Not every book on sale, or every important book, or even every book in English, but simply every book—our planet’s cultural legacy.

First, the library must be on the Internet. No physical space could be as big or as universally accessible as a public web site. The site would be like Wikipedia—a public resource that anyone in any country could access and that others could rework into different formats.

Second, it must be grandly comprehensive. It would take catalog entries from every library and publisher and random Internet user who is willing to donate them. It would link to places where each book could be bought, borrowed, or downloaded. It would collect reviews and references and discussions and every other piece of data about the book it could get its hands on.

But most importantly, such a library must be fully open. Not simply “free to the people,” as the grand banner across the Carnegie Library of Pittsburgh proclaims, but a product of the people: letting them create and curate its catalog, contribute to its content, participate in its governance, and have full, free access to its data. In an era where library data and Internet databases are being run by money-seeking companies behind closed doors, it’s more important than ever to be open.

So let us do just that: let us build the Open Library.

From Aaron Swartz’s blog:

I thought of the smartest programmers and designers I knew and gave them a ring, sat down for coffee with them, threatened to fly out to their homes and knock on their doors. In the end, we got together an amazing group of people — all sworn to secrecy of course — and in the past few months we’ve put together what’s probably the biggest project I ever worked on.

So today I’m extraordinarily proud to announce the Open Library project. Our goal is to build the world’s greatest library, then put it up on the Internet free for all to use and edit. Books are the place you go when you have something you want to share with the world — our planet’s cultural legacy. And never has there been a bigger attempt to bring them all together.

Congrats Aaron and team!

First Edition Principia Discordia Recovered from JFK Assasination Archive

This is highly weird. In April 2006, a First Edition copy of the Principia Discordia was recovered from the John F. Kennedy Archives (see routing slip). Here is a bit of detail on how it was found:

I stumbled upon knowledge of the Dead SeePresident Scrolls purely by chance - a reference number on a scan of a copy of something I did not believe I was looking at: so much so that I passed over the title page of the first edition of the Principia Discordia (How The West Was Lost) many times before it dawned on me what it was before my eyes.

On that sheet was an Accession Number. And that number pointed to a secret which has lain hidden for over 30 years, trapped unseen in a musty, dusty vault in Maryland.

As luck would have it, the Rev. Karl Musser happened to be in the neighbourhood of that very vault, and willing to do me a favour, All Blessings Unto Him.

But how did these papers end up in the Assassination Archive in the first place?

In the late sixties, founding Discordian Kerry Thornley, who had been in the Marines with Oswald, found himself under the microscope of those investigating the Assassination of John F. Kennedy. Such Official Investigations generate a Paper Trail - evidence proffered is indexed and stored… preserved against the erosion of time. (Well, mostly…)

How the Pentagon Papers Came to be Published by the Beacon Press: A Remarkable Story Told by Whistleblower Daniel Ellsberg, Dem Presidential Candidate Mike Gravel and Unitarian Leader Robert West

You should watch this episode of Democracy Now.

Thirty-five years ago this weekend, Beacon Press lost a Supreme Court case brought against it by the US government for publishing the first full edition of the Pentagon Papers. It is now well known how the New York Times first published excerpts of the top-secret documents in June 1971. But less well known is how the Beacon Press - a small, nonprofit publisher affiliated with the Unitarian Universalist Association - came to publish the complete 7,000 pages that exposed the true history of U.S. involvement in Vietnam. Their publication led the Press into a spiral of two and a half years of harassment, intimidation, near-bankruptcy, and the possibility of criminal prosecution.

Today, we hear the story from three men at the center of the storm: Former Pentagon and RAND Corporation analyst, famed whistleblower Daniel Ellsberg who leaked the Pentagon Papers to the New York Times. Mike Gravel - the former Alaska Senator who is now a Democratic Presidential candidate - who tells the dramatic story of how he entered the Pentagon Papers into the Congressional record and got them to the Beacon Press. And Robert West, the former president of the Unitarian Universalist Association which owned the Press and agreed to risk publication of the Pentagon Papers. [includes rush transcript]

This is a story that has rarely been told in its entirety. Last weekend I moderated an event at the Unitarian Universalist conference in Portland, Oregon commemorating the publication of the Pentagon Papers and its relevance today.

(via Tracey)

Lessig To Shift Efforts From Free Culture To Fighting Corruption

For the last 10 years, Lawrence Lessig has been at the forefront of the Free Culture movement. At the iCommons summit, Lessig announced that he will stop working on Free Culture issues, and shift his work to fighting corruption:

I don’t want to be a part of that business. And more importantly, I don’t want this kind of business to be a part of public policy making. We’ve all been whining about the “corruption” of government forever. We all should be whining about the corruption of professions too. But rather than whining, I want to work on this problem that I’ve come to believe is the most important problem in making government work.

Best of luck, Professor Lessig! You make the world a better place, and we are all thankful! (via brewster)

Older Posts »