Friday, August 12, 2011

Using RESTful APIs

Please note, this article is intended as an introduction to RESTful APIs and is has been simplified to target those who are new to programming. Comments, corrections, and feedback would not only be appreciated but greatly encouraged as I'm sure experts will find ways to improve this general overview. Happy hacking!


Problems with Non-RESTful APIs
APIs as we know (or application programming interfaces -- but no one really calls them that anymore... They're just 'APIs') are compact ways to use other people's code to extend the capabilities of a computer program. Traditionally, to use an API, one would install and then include/import the required software modules/packages within one's code and then interact this third party software via the programming interface defined by said package's author. Of course, this process requires (1) installation and importing of their code and (2) that you are using a programming language compatible with their libraries.


The RESTful Solution
RESTful APIs help solve this problem. Instead of downloading, installing, and including/importing someone else's package within your program (and hoping their software is supported by your language), you can leverage the HTTP as well as the properties of REST to achieve a more elegant solution.

The Big Picture
What if, instead of importing other peoples' libraries, these libraries were available and responsive as web-services? And what if your program could communicate with these designated web services using HTTP methods and have these services perform the requested computations and tasks for us? This is the underlying notion behind RESTful web APIs.

A good example would be, instead of buying a mill saw, some blades, a nice workbench, cutting wood in your house, and cleaning up the mess, you can instead send the wood off to a service which will just return cleanly cut pieces of wood exactly to your specifications.

If you read on, you will learn more about HTTP and REST, as well as see examples demonstrating how your program can make HTTP requests to use a RESTful web API.


History and Intro
There's been a lot of buzz recently about RESTful APIs, but what does it all mean? The term REST, or Representational State Transfer, is not new and dates back to the year 2000 when Roy Fielding was assisting in the writing of the HTTP 1.0 and 1.1 specs. While the idea of REST can be applied to many application layer protocols, it was originally designed parallel to HTTP and thus it makes sense to understand how HTTP works.


Understanding HTTP
The Hypertext Transfer Protocol (HTTP) is the underlying application layer protocol used for the World Wide Web. The HTTP spec defines several request methods over which communication can occur between participants: HEAD, GET, POST, PUT, DELETE, TRACE, OPTIONS, CONNECT, and PATCH. Every time you type a URL into your browser, that URL is actually translated and resolved into the name or physical address of a computer which will serve your request (or pass your request along to the correct server). When your request eventually reaches the correct destination server, a response will be generated and returned to you containing a response code and any resources which were requested. This response code provides insight as to the current state of the responder. This notion of responses and state are important to understanding the concept of REST (representational state transfer).

Understanding REST
The best resource for understanding REST is by reading Roy Fielding's PhD dissertation, Architectural Styles and the Design of Network-based Software Architectures. However, there are several blogs and articles which summarize Fielding's research and makes the information more accessible. Peter Laird makes an important clarification that REST, "is not a technology, a standard, or a product. REST is an architectural pattern that describes the underlying architecture of the World Wide Web and how it came to be such a massively scalable computer application" [Laird OnDemand].

Laird provides a nice summary:
In essence REST describes an architecture in which:
  • Application resources (objects, in the OO world) are exposed as URIs
  • HTTP requests are used to retrieve and update data on the server
  • The HTTP requests utilize the standard HTTP verbs (GET, POST, PUT, DELETE) to define the API operations, helping client developers by providing a consistent interaction model
Since I am not a networking expert, I will defer the the following three resources (in reading order):

1. tomayko (the basics)
2. technoracle (lite overview)
3. Laird OnDemand
General Wikipedia Supplemental
REST-style architectures consist of clients and servers. Clients initiate requests to servers; servers process requests and return appropriate responses. Requests and responses are built around the transfer of representations of resources. A resource can be essentially any coherent and meaningful concept that may be addressed. A representation of a resource is typically a document that captures the current or intended state of a resource.
The client begins sending requests when it is ready to make the transition to a new state. While one or more requests are outstanding, the client is considered to be in transition. The representation of each application state contains links that may be used next time the client chooses to initiate a new state transition.[7]

Submitting a GET Request over HTTP
Sounds nice, but how does it work? In this tutorial, I'll show you how anyone can use a RESTful API without installing any software packages (assuming you already have the python installed and urllib/2).

When you type a search on google, you are actually submitting a HTML form. A form is a special type of HTML tag which specifies instructions on how information should be passed from one computer (or server) to another, when an action such as 'submit' has been specified. Forms generally have two attribute fields, a method and an action. Your web browser examines the html form's attributes and uses this to determine how it should pass data to the recipient.

A html form may look like this:

Code:
<html>
    <head>
        <title>Mek's form example</title>
    </head>
    <body>
        <form method="GET" action="some-url">
            <input type="text" name="username" />
            <input type="password" name="password" />
            <input type="submit" name="submit" />
        </form>
    </body>
</html>
First, before anyone jumps at my throat, yes, I am aware the method of this form is GET and that a 'password' field is being used. Whenever you are constructing a form, you have several options regarding what 'method' to use when transmitting data from location A to B. Each has their own advantages and disadvantages. I will explain this in greater detail later, however, in nearly all cases, if you are transmitting secure data like credit card information, passwords, or private keys, you will NOT want to use GET as your HTTP method (and you will also want to use HTTPS to encrypt your request). I will explain why this is the case using the code above.

First, I'd like to explain what the code above does and what it means (and then I'll mention the difference between two of the methods of the HTTP protocol, GET and POST).

In the code above, we create a form, just like any HTML tag, but we specify two fields, a method and an action. The method determines how we send data from A to B, as I said before. The action determines who the recipient of the information.

In order to understand the difference between GET and POST, let's pretend the 'action' field in our code above was google, and not 'some-url'. I'll describe what's actually happening when a user (like you or I) visit the web page containing the code above up to the point where the form is submitted and information is returned.

1. A user goes to our website (with the code above) and enters in a username and password within the input fields.

2. The user presses the submit button

3. The user's web browser examine the form's attributes and determines how the information should be sent to the recipient (according to the method)

3. a) If the method is GET, the browser understands you would like to GET a new webpage as a response to your request. As we know, the way to get a new webpage (or the way we request a webpage) is by asking for a URL.

The browser constructs a URL from our data which looks like this:

Code:
something.com/something?username=xxx&password=1234

Anyways. You'll notice there is a question mark in the url. This question mark denotes the start of a query. A query is how special parameters are specified within a URL. They are used to tell the server you want specific pieces of information. In this case, we are passing the server two parameters through our URL (and we're asking to GET a webpage which fits that criteria). For a google search, when you type a query in the textbox, maybe we're searching for 'python' and press submit, your browser turns your request into something like:

Code:
google.com/#q=python
Why wouldn't you want to have a password sent to a server via GET? Notice using GET all the information transmitted is sent visibly through the URL, and you wouldn't want others to see your password. In what cases is it preferable to use GET? And what alternative is there to GET? Good questions!

GET is useful because it allows us to share links which contain very specific information. For instance, not only can I share a link to Google, but I can share a link containing the exact query I used to get a Google result! So there, GET actually is extremely useful. It's how we GET every web page we want! But when it comes to passing secure information, GET is not the best solution because it is passed visibly and the amount of information you can send within GET is (as per the specs) limited.
RFC 2068 states:
    Servers should be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations may not properly support these lengths.

Therefore, we can use the POST method to send information to a server without it being directly visible in our URL. Instead of telling the destination server we want to GET a resource back, we are telling it we want to POST information to them. Note, while POST solves the problem of sending information without displaying it in the URL, the use of the POST method is not inherently secure. In order to perform secure POST requests, the request needs be encrypted and sent using HTTPS.

But wait, maybe you're thinking, "How come when I POST information like my username and password on Google, I GET a new page?". This is another good question. When you POST information to a server, the server will handle your POST request how ever it is designed. A POST doesn't have to direct you to a new website, however, traditionally this is preferable, so the recipient server will redirect you to the appropriate page.

Now that we understand the basics of HTTP requests, we'll write a simple python script which simulates a GET request to a web service acting as a RESTful API. We will request that this web service returns a JSON resource and will use python to capture the server's response so we can use this returned data!

Once you learn how to do this, you will be able to apply this knowledge use any RESTful API on the web! You will no longer be restricted by your programming language.

But there's one last thing I didn't discuss. Fine, we make a request to a server using their RESTful API, but what format is their returned resource in? Often time you can request the resource be sent in a specific format. Two of the more popular (and often default) formats are JSON and XML. In my opinion, dealing with XML is a pain in the arse, so I'm only going to discuss JSON.

In order to understand JSON (javaScript Object Notation -- actually a lot easier than it sounds) we must first learn about python dictionaries (associative arrays). Aw, but we were just getting to the good part! Let's see some python! Fine, we can hold off on learning how and why JSON is awesome when we get to the part in our Python code where the server has actually sent us the JSON code as a response.

So, without further ado, here's a snippet of python code which sends a GET request (without any data) to my website Baybo and retrieves a list of all the products on the platform. Oh, if you'd rather get info about our users instead, just replace the word 'products' in the url below with 'users':

Code:
import urllib2

url = "http://www.baybo.it/api/products"

# Now we have our url, let's open it in python
response = urllib2.urlopen(url)

# Now let's read that response into a variable
html = response.read()

# if you're feeling lazy, you could do this all in
# one line and just type:
# html = urllib2.urlopen(url).read()

# Now the variable 'html' contains our 
# JSON response but it's a String! Not JSON!

# Oh well, guess we have to learn about JSON now!

We got a string representation of the JSON data we want from the server, now we want to tell python to parse this data as JSON so we can interact with it.

In the last step we did

Code:
html = urllib2.urlopen(url).read()
In order to take this html data (in string form) and convert it to JSON, we will need to import the 'json' module (should already be installed on your machine with python) and use the json.loads() function. This is what the entire program would look like:

Code:
import urllib2
import json
html = urllib2.urlopen(url).read()
data = json.loads(html)
The variable 'data' is now an actual data structure within Python that we can natively access. In this case, the data the server transmitted to us is an array (a python list) of dictionaries. Uh oh, now we really need to learn what dictionaries are, or we won't be able to use this data effectively!

A dictionary is similar to a list/array in that it is a data structure for storing values. However, the method of indexing values and the values of an index are different than a list. In fact, the complexity analysis for the entire data structure is different! By this I mean, a dictionary (associative array) does not have the same complexity properties as a list: the computational cost (the number of steps) to perform insertion, deletion, and resizing over the data structure are different than lists.

So, before we tackle the complexity properties of dictionaries, what the heck is a dictionary? Think of a dictionary like... Well, a dictionary! Or a phone book, if you prefer. A dictionary is a collection (unordered) of key-value pairs. In a dictionary (the books we're used to) a dictionary word is used to reference a definition. If you want to know the definition for a word, use use the word as the key, find this word, and knowing the word will give you access to the definition. Same goes for a phone book. You use someone's name as the key, and with it can find someone's number (the value).

A dictionary in python looks like this:

Code:
mydict = { "key": 3,
           "mek": 13}
Values are access from a dictionary in a similar way as lists, however, the index is not a position (that is, not necessarily an integer), but a key. Here's an example of how I would access the value 13 in the above dictionary called 'mydict' using the key "mek" which is a string"

Code:
# This returns the value 3
mydict["mek"]
Ok! Dictionaries have very interesting properties with respect to complexity analysis... But I suspect you folks would rather continue the example right now and so something useful with your data / see a full example program. If you're interested in learning more about the data structures, let me know! This is useful information which can be applied to nearly any programming language and knowing this information is useful for CS job interviews and school.

Anyways. Where did we leave off in our program? Ah yes, you just turned our data into JSON format and python automatically loaded this JSON into a native python data structure as a list of dictionaries which we can use!

Code:
# data is a list of dictionaries
data = json.loads(html)

# The first element of the list is a dictionary.
# This will print out the dictionary so we can see
# what keys and values it has.
data[0]

# This tells us the keys for the dictionary are:
# "content", "name", "created", "slug", "price", "modified",
# "users_id", "currency", "avatar", "id".

# What does this mean? For every product on Baybo,
# We can get all this information! If we wanted,
# we could make our own website and advertise / affiliate
# products with tiny script.

# Q: How would we access the product id of the 1st product?
 # A: get the 0th product in the dictionary and 
#     get the value where the key is "id"
data[0]["id"]

# We could write a loop to iterate over every product
# and do something useful:

# This will print out the name of every product
for product in data:
    print data[product]["name"]

Here are some examples from the Python Documentation which explain httplib and urllib2 in greater depth. This resource also explains how you would accomplish such a request via POST, as opposed to GET.

Following these steps, you can use any RESTful API out there. I just used my personal RESTful API as an example, that way if you folks have any questions, I can help you solve your problems.

Until next time, good luck and happy hacking!

Wednesday, July 6, 2011

Get Better at Python - Tips and Tricks

Interested in learning some of the lesser known features of Python? Check out, "The Hacker's Guide for Python on Ubuntu" on http://baybo.it/shop/p/502

Excerpt:
The Hacker's Guide to Python on Ubuntu. Anyone can go online and read the Python API. This guide is not intended to teach you Python but rather to share the lesser known secrets about the Python programming language, help you better master the language, and identify the strengths and weaknesses of the language. Learn elegant solutions to common python problems, all in one convenient place and with clear explanations.
 
Table of Contents:
1. Introduction
    a. Target Audience
    b. Definitions and Overview
    c. Tools and Setup
2. Python Built-ins
    a. Uncommon Operators
    b. Random generators
    c. dir(), __doc__, and other helpful builtins
3. “Strings”
    a. String Formatting Fun with Dictionaries
4. {Sets} and {“Python” : “Dictionaries”}
5. [“Python”, “Lists”]
6. Web Frameworks and Webpy
    a. What is a web framework?
    b. Web framework comparison
    c. Webpy
7. Extras, Fun, and Easter Eggs
    a. Fun hackery, jokes, and Easter eggs
    b. Supplemental Resources
8. Coming Soon: (Sample)
    a. Code for Writing a 2D Game - Tutorial In progress (view blog)
    b. Good Programming (and helpful/good practices)
    c. Decorators, and more
9. Supplemental Resources and Modules for Any Occasion

Selected Examples:
# Using stepping by fives
>>> "Pass your text here oh funnies!"[::5]
‘Python!’


#Generate a List Containing Letters of the Alphabet
>>> import string
>>> list(string.uppercase)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q',
'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']

#Flooring Operator
>>> 3.2 // 1.3
2.0


Feel free to direct all suggestions, questions, or requests to mek@babolabs.com

Thanks for your interest,
- Mek



Saturday, June 4, 2011

Effective Text Processing - VIM & Emacs v. IDEs

Howdy hackers,

Do you lead a life of extensive text processing? Maybe you're a programmer and it's your job. Or maybe you write for a news paper and currently use microsoft word for all of your editing (please don't... There's a better way...)

Programmers
Let me guess, if you're not a VIM or Emacs-er already, you use a fancy IDE to write code because it provides syntax highlighting, offers "magical code completion", and a unified environment" which automatically compiles your code. Chances are you've gotten into an argument (or several) with a VIM or Emacs 'elitist' and spent more time defending your choice of tools rather than learning about their advantages or disadvantages -- and I'm sure the emacs/vim users were just as guilty.

My Philosophy
I am not in the business of converting people from one tool to another. From personal experience, I know many people tend to become comfortable with tools with which they are familiar. I'm writing this essay because I spend a lot of time typing in front of a computer. I'd like to share my experience of what's worked well for me.

I've programmed in a variety of environments including netbeans, eclipse, dev's bloodshed interface, emacs, vim, pico, nano, ed, notepad, etc, and find that different tools have different advantages. Being a lisp programmer who spends most his time on the CLI (command line interface), emacs is my favourite editor to date (though elisp's lack of tail recursion and TCO is disappointing).

That said, I must admit I feel a bit guilty and hypocritical about writing this post. For years my hubris clouded my judgment, preventing me from acknowledging the advantages of VI. Despite the steep learning curve, I find myself preferring VIM's macro system and there really isn't much emacs can do that vim can't (and vis versa). I don't want to make this a VIM versus emacs religious battle, but I will say that emacs' abundance of modes, interactive and supported environments (like SLIME, python-shell, and and sql-mysql (not like I'm in love with sql), bash friendly key-bindings, lower learning curve, indentation, etc, make it a better fit for my life style... Also, I am a heavy org-mode user, something I don't have with vim. However, if you see a talented VIM hacker magicking, it's difficult not to concede that VIM may indeed have the upper hand for text processing.

But what about IDEs? Why the heck would someone use a command line tool (or even X emacs) when perfectly good IDEs are available for specific languages. Before I answer, I invite you to read the hacker community's thoughts @
why do some programmers hate ides or think programmers that use ides are bad programmers.

Why VIM or Emacs over an IDE?

In case you're not comfortable trusting my experience, see for yourself why Google recommends solutions like VIM.
  • Macros (available in vim and emacs)
Imagine being able to perform a sequence or pattern of operations over an entire document? Sure, you can perform search and replace on text, but what about deleting the third word in every paragraph? What about turning a comma separated value file into SQL queries? Emacs and VIM allow you to start recording a series of commands and keystrokes as a key-binding and play back the 'maco'. http://www.youtube.com/watch?v=D7WL5Hv_Cas

If you just recorded a macro in emacs via C-x ( to start and C-x ) to end, you can use C-u 5 e to execute it 5 sequentially times. (In vim you would use 'q' to start recording a macro, followed by the letter for which you wish the completed macro to be bound. Then, type the operations you wish to be invoked by the macro, type q to stop recording, and execute by typing: <# times to execute -- default 1> @ <key to which macro is bound>.
  • Repeat Commands 
 This is a really important feature for me. With simple commands like C-u 30 C-k (emacs) you can cut 30 lines from a document.
  • Terminal Programs & Code, side-by-side
 If you program in Python, (perhaps you use windows and idle) there aren't many great tools for writing and testing your code side by side. Emacs' python-shell will let you run code in an interactive REPL in one buffer, and type code in the other (so you don't have to re-write your code when you close the shell). Also, it's nice to manage IRC, a terminal, your database session, a game of Tetras, chat with your doctor Eliza, write elisp, and organize yourself with org-mode, all within one program (text editor... or operating system? Hmm...)
  • Interactive REPLs (read eval print loop) 
 Some IDEs support multiple languages, but they tend to support a primary language (netbeans - java, visual studios - .net languages, etc). Emacs supports a variety of modes (including python, perl, c[++], haskell, php, common lisp, scheme, elisp, etc and so on). In fact, most of these language have built in REPLs (interactive shells - read eval print loop) to let you test code without ever leaving your programming environment
  • Superior Navigation and Context Switching
This is kind of a silly point, but I rarely use a mouse. I'd rather switch contexts / buffers / screens with key-bindings. Also, in this regard, I'd highly recommend looking into a tile window manager like awesome or stumpwm, as well as terminal multiplexing with a program like GNU Screen... But that's for a different essay.
  • Syntax highlighting, tabbing, and code completion
Emacs and VIM have great syntax highlighting, tabbing, whitespace notification, and code completion: http://stackoverflow.com/questions/1285971/emacs-code-completion-for-c-c. Vim even allows you to easily generate html tags with its concise html with tools like tidy.


Conclusion
If you're really into programming, an IDE is a great tool. However, I prefer using a programmable text editor with tons of features added by the community, to accomplish exactly what I want.

Sincerely,

- Michael E. Karpeles

Python Tips & Tricks

Howdy hackers,

Some exciting news, Babolog is now Baybo -- just baybo.it

Here are a few helpful Python tips and tricks which I hope will help you save you time on menial tasks. If you have any feedback or suggestions, please let us know. We'd love to hear from you!

If you like these examples, consider searching for, "The Hacker's Guide to Python on Ubuntu" on http://baybo.it. The resource explains many of the lesser know features of the language which can save you a lot of time and hassle. HGPU should be available for purchase for $1.99 in the next few days so keep your eyes open.


Generate a List Containing All Letters of the Alphabet
>>> import string
>>> ' '.join(string.uppercase).split(' ')
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 
'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R',
'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']

Alternate Approach to Conditionals
# <expr> if <boolean> else <expr>
>>> True if (y == 1) else 3
3 
 
Iterate in Steps (with slicing) [start:stop:step] 
Iterate over a list from a [start:end:by-step-increment]. This syntax allows you to specify both a start and end position, as well as specify an 
increment for stepping through the iteration.   
 
# Strip the first two items and from the 8th element on
# and Iterate the remaining elements by increments / steps of 2 
>>> for item in range(9)[2:8:2] 
...    print item 
2 
4 
6 

Hope you found these tips helpful.

Sincerely,
- Michael E. Karpeles

Wednesday, May 25, 2011

26 x 26 x 2 - #!/bin/bashing Your Way to a Good Domain

The state of the domain world is pretty grim.  There's some helpful sites such as impossibility.org (props to chaosmachine of HN), bustaname.com and namestation.com but the real problem is not search methods but the search space itself.

Here's a quick little bash script I threw together to search through
permuatations of [a-z][a-z]name.com and name[a-z][a-z].com.  Feel free to adapt this script for your own needs. Happy Hacking.


#!/bin/bash

# All-permissive Copying License
#
# Copyright 2011 Babo Labs, LLC - http://babolabs.com/
#
# Copying and distribution of this file, with or without modification, are
# permitted in any medium without royalty provided the copyright notice and
# this notice are preserved.

verbose=0;
word=$1;

while getopts v: OPT; do
    case "$OPT" in
        v)
            echo "VERBOSE MODE"
            verbose=1;
            word=$2;
            ;;
    esac
done
echo "Finding all permutations of [a-z][a-z]$word.com"

for fl in `echo {a..z}`;
do
    for sl in `echo {a..z}`; 
    do
        whois $fl$sl$word'.com' | grep "No match" > /dev/null;
        if [ "$?" -eq "0" ]; then
            echo $fl$sl$word'.com': **AVAILABLE**;
        else
            if [ $verbose -eq 1 ]; then
                echo $fl$sl$word'.com': NO;
            fi
        fi
    done
    for sl in `echo {a..z}`; 
    do
        whois $word$fl$sl'.com' | grep "No match" > /dev/null;
        if [ "$?" -eq "0" ]; then
            echo $word$fl$sl'.com': **AVAILABLE**;
        else
            if [ $verbose -eq 1 ]; then
                echo $fl$sl$word'.com': NO;
            fi
        fi
    done
done

Best of luck in your own domain search.

sabalaba & mek

Friday, May 6, 2011

Python to get Media Metadata

At Babo Labs, we're interested in eliminating work for our digital merchants by providing them enabling technologies. An enabling technology is one that assists a user in completing a task more productively and efficiently, while minimizing intrusiveness or inconvenience. One example of an enabling technology is Google's instant search bar which shows search engine results as you type your query, in real time (statistics show this service saves 2-5 seconds per query on average).

One way our social e-commerce platform, Babolog, accomplish this is by passively-dynamically collecting meta information about the digital media files our merchants upload, and then displaying these meaningful specifications to their customers.

Over the past month, Stephen and I have tested a variety of Python modules for extracting metadata from media files. Here''s a few good ones:

1. kaa.metadata (freevo multimedia kaa metadata package)
    • Documentation
    • Installation: Ubuntu apt install
      • sudo apt-get install python-kaa-metadata 
#example: 
import kaa.metadata

def getKaaMetadata(filepath):
    meta = kaa.metadata.parse(filepath)
    print meta
    return meta 

 2. pyPdf (for pdf files)
    • Documentation
    • Installation: ez_install.py installation
      • sudo python -m easy_install pypdf
#example: 
from pyPdf import PdfFileReader

def getPdfMetadata(filename):
    pdf = PdfFileReader(file(filename, "rb"))
    basic_info = pdf.getDocumentInfo()
    preview = []

    try:
        for outline in pdf.outlines:
            preview.append(outline['/Title'])
    except:
        preview = []

    return basic_info, preview
 
3. ID3 (for mp3 ID3 metadata)
    • Documentation
    • Installation: Ubuntu apt install 
      • sudo apt-get install python-id3 
#example: 
from ID3 import *
try:
    id3info = ID3('/some/file/moxy.mp3')
    print id3info
    id3info['TITLE'] = "Green Eggs and Ham"
    id3info['ARTIST'] = "Moxy Früvous"
    for k, v in id3info.items():
        print k, ":", v
except InvalidTagError, message:
    print "Invalid ID3 tag:", message

4. Magic (MIME inference)
#example: 
def getMimeType(filename):
    """
    Notes that the magic package has been marked as deprecated.
    We still find it useful for our needs. 
    """ 
    m = magic.open(magic.MAGIC_MIME)
    m.load()
    return m.file(filename) 

Conclusion
We've found the kaa.metadata module to be pretty __awesome__. It provides valuable metadata for a variety of different file formats and media types including: jpg, avi, mp3 (including id3 and exif). It's a great tool if you are looking for an easy all-in-one solution. For our purposes, parse the results of several services in order to obtain a tailored solution for our platform.

If you'd like to learn more about getting metadata for a specific media type, or more about what metadata these modules can fetch, just leave a comment!

Sincerely,
- Michael E. Karpeles
- Stepehen A. Balaban

Sunday, May 1, 2011

Why the world needs a digital e-commerce platform

Let the numbers speak
Before we started working on Babolog 6 months ago, we did research to see where the digital e-commerce industry was headed. For a start-up focusing specifically on social, digital e-commerce, the results looked appealing from all ends of the spectrum: sales, content production, and social tendencies.

1. Digital Sales - There's a rapidly growing market
From a sales perspective, a Pew study in Dec 2010 shows nearly two-thirds of Internet users -- 65% -- having paid to download or access some kind of online content from the Internet, ranging from movies to games to news articles [1]. The worldwide market for digital content creation products has grown from $3.04 billion from 2008 to $6.5 billion, exceeding estimates for 2012 by over 2 billion dollars [2,3].

2. Content Production - Aggregating content producers
In terms of digital content production, eMarketer predicts the number of user-generated content creators in the US will rise 9% (to 114.5 million creators) by 2013, from the 82.5 million (42.8%) recorded in 2008. This translates to over half (51.8%) of all US Internet users [2].

3. Leveraging social networking
A survey of 2,221 consumers, conducted by VG Market and Playspan in July 2010, indicates 75% of customers have spent money on virtual goods, and 32% have made purchases within social networks [4].
                  
[1] Pew Research Center's Internet & American Life Project
[2] (eMarketer, 2009)
3] (Computerworld - April 11, 2007)
[4] Women Spend More on Digital Goods ...

Sincerely,
Michael E. Karpeles
Stephen A. Balaban

http://babolog.com

Wednesday, April 13, 2011

Tech: On moving to Silicon Valley & Finding Investors

Stephen, Mike, and I are planning on heading to the San Francisco, CA area this summer to continue work on our startup, Babolog. We have quite a few contacts in the area, including some friends at Google and Accel. We're very interested in hearing feedback about people's experiences in the San Fran area. Is it worth the hype making the move? We recently had a meeting with Douglas Eck, a google employee in San Francisco and Stephen recently met with the folks at Olark to discuss the reality of city life.

Our conclusion is, living in San Francisco can be very expensive but very manageable if you live just outside of the city. However, it's important to note that an important reason people move to San Fran is the connections and in order to maintain good connections and be able to meet at a whim's notice, it's good to be local. It's hard to meet with a client for drinks when you have to be on the last train home at 12 or 1am. Another thing to keep in mind is, there are different reasons for moving to Silicon Valley. Some people are looking for talent, some for venture capital or angels, and others simply want to make connections. With social media systems as prevalent as they are, one can potentially launch a startup from anywhere. Babolog, for instance, currently operates out of Vermont.

Software engineers have a unique advantage in their field: the cost of launching a startup is minimal as writing software is cheap (though often time consuming) and there are many open source frameworks which add almost instant utility to your system. One of the largest reasons not to move to valley is, you might not need to. If your project focuses on specific niche groups and you can successfully operate in an inexpensive area, go for it. The reason to still be in CA, and the reason we will almost certainly be heading out there this summer, is not to find talent, but connections, connections, connections. This brings up another important topic which perhaps I'll write more about when I have accrued some more personal experience: acquiring venture capital and funding for your project. For now, here are some interesting reads which I hope give you greater insight on the big question many tech companies lose sleep about, "Should we be moving to Silicon Valley".

Friday, May 20, 2011 - Updates

If you are working with a small group and are looking where to live in the San Francisco area, you may find this thread useful:

San Francisco or Palo Alto for a new startup?

http://news.ycombinator.com/item?id=12290



Additional Sources:

I'm sure most of the folks at HN (YCombinator Hacker news) have already read:
http://www.paulgraham.com/start.html

The cost of living in the San Fransisco area:
http://www.jmorganmarketing.com/how-much-does-it-cost-to-live-in-san-francisco/

Some interesting perspective on incubators, angels, and VC:
http://gigaom.com/2008/09/10/5-reasons-to-move-your-startup-out-of-silicon-valley/

http://venturebeat.com/2011/04/07/5-reasons-not-to-take-a-big-vc-round/

Sincerely,
- Michael E. Karpeles
http://babolog.com

My Mashable Mashup

I decided to keep myself organised by maintaining a list of interesting reads from different news websites. Mashable is fantastic Internet news site with a wealth of knowledge and insight on topics relating to marketing, advertising, business strategy, startups, design practice, and all that is social.

 

Dealing with competition 
Interesting Business Models 
Why Large-Scale Product Customization Is Finally Viable for Business 
by  J. P. Gownder

Where does social media stand?  
The Winners & Losers of Social Networking [INFOGRAPHIC] 
by  Jolie O'Dell

Getting the word out
HOW TO: Spread Your Business Footprint Around the Web 
by Josh Catone


Hope you find these helpful!

Sincerely,
Michael E. Karpeles

Amazon's Social Human Computation Framework

I read a very enjoyable and worthwhile article today on Salon.com by
Katharine Mieszkowski called, "I make $1.45 a week and I love it". It's about the crowd sourcing human computation system called "Turk" which Amazon released back in November 2005.

Here's a summary:

"On Amazon Mechanical Turk, thousands of people are happily being paid pennies to do mind-numbing work. Is it a boon for the bored or a virtual sweatshop"

The article provides some really interesting case studies which speak volumes about trends and incentives in crowd sourcing. The most interesting involved "Aaron Koblin, a student in UCLA's Design/Media Arts program, who was writing his master's thesis about the site" [1]. He offered 2 cents per drawing and ended up collecting over 7,500 which he turned around and sold for 20 for 20$. Read the whole article who see how the crowd reacted! I think the publics' responses will surprise you.

Babo Labs has a few projects of their own up their sleeve which will allow everyone who uses our e-commerce platform to buy and sell more effective. We can't wait to share it with you when it's done!

Sincerely,
- Michael E. Karpeles
http://babolog.com
Babo Labs