Playing With Sid

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Wednesday, 8 July 2009

NLTK on Ubuntu Quick Start Guide

Posted on 09:59 by Unknown

Update July 19, 2009 : You can now use nltk python egg instead, read the NLTK Installation with Python setuptools post.

While attending a short program in computational linguistics at Dravidian University, Dr. Arul introduced me to NLTK (Natural Language Toolkit). It was full two years before that I finally decided to have a close look at it. Like most linguists at the lab I used Perl programming language. With new version of NLTK 2.0 released last month, NLTK now works with python 2.6. Here a quick start guide for NLTK on Ubuntu Linux.

Installing NLTK on Ubuntu with Python 2.6

At the time of writing this post the Debian package on NLTK download page is built for Python 2.5. Ubuntu ships with Python 2.6 by default. So you need to download the source package from the NLTK download page.

NLTK needs some dependency modules, lets install them.

sudo apt-get install python-numpy python-matplotlib prover9

Uncompress the source package and run the NLTK setup.

$ unzip nltk-2.0b3.zip

$ cd nltk-2.0b3/

$ ls build LICENSE.txt nltk PKG-INFO README.txt setup.py yaml

$ sudo python setup.py install

After finishing the NLTK setup, you should download the NLTK data which contains various corpora, tagsets and treebank data etc.

$ python

Python 2.6.2+ (release26-maint, Jun 19 2009, 15:14:35)

[GCC 4.4.0] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> import nltk

>>> nltk.download()

NLTK Data downloader window

Learning NLTK

NLTK Book coverThe best place to start is the NLTK book Natural Language Processing with Python Analyzing Text with the Natural Language Toolkit. The book is released under public domain, so you can read it online on NLTK website itself. I would recommand you to buy a copy of this book as the procceds will go into the future development of NLTK.





There aren't many videos about NLTK. I recently stumbled upon this video lecture by the trinity of NLTK Steven Bird, Ewan Klein, and Edward Loper.

If you are new to computational linguistics and need good grounding in this field you should also consider reading these texts.

Speech and Language Processing (2nd Edition) book coverSpeech and Language Processing (2nd Edition)



Natural Language Understanding (2nd Edition) book cover Natural Language Understanding (2nd Edition)



Foundations of Statistical Natural Language Processing book cover Foundations of Statistical Natural Language Processing



Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in Computational Linguistics, Linux, NLP, NLTK, python, ubuntu | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Vietnamese Language Tools: Developing Keyboards and Spell-checker
    Mentoring Mozilla Vietnamese localization team to develop Vietnamese keyboard and spell checker for Firefox OS . We built and tested Vietna...
  • Multimedia/Internet Keyboards in GNU/Linux
    As the resident Gnu/Linux guru, I sometimes help new users making transition from other OS’s.Over the years I’d ask anyone who installed Gnu...
  • New CD Writing Tools in Debian "Etch"
    From this weeks Debian Weekly News Jörg Jaspert called for testers of the new cdrkit and the new wodim program. They will be shipped ...
  • GDM Timed Login
    The Gnome Display Manager(GDM) is the graphical login program for Gnome Desktop environment. It is being completely rewritten, one of its ...
  • Gitbox Git GUI tool
    Here is another post in the series of the posts about tools that makes developers life simpler. Git was developed as distributed revision c...
  • Info-Activism Camp in Bangalore
    This is the first-ever international camp on Info-Activism and is a week-long affair. The event will bring together 120 rights advocates, te...
  • Ralink Wireless support in Puppy Linux
    My Gecko Edubook has a Ralink wireless USB adapter . Later models of this ultra-cheap netbook shipped with more ubiquitous wireless cards fr...
  • Barcamp Phnom Penh 5
    Last year we kick-started Mozilla Firefox Aurora Khmer localization effort at Barcamp Phnom Penh 4 . Now a year later, Thanks to efforts of ...
  • Locate Configuration Example
    Keep things found is not an easy skill to master. The disk drive turns into a deep and dark jungle after the first few weeks on road. Tradit...
  • Bollywood Afternoon In Hanoi
    A lot can happen over a cup of Masala Chai. Sunday afternoon's are usually Indian food at Foodshop 45 in Tay Ho, Hanoi. After a good mea...

Categories

  • "Blog Action Day 08 - Poverty"
  • "blog action day"
  • "Compiz Extras "
  • "compiz-fusion"
  • "film making"
  • "FSUG-Bangalore"
  • "GISS"
  • "Graphic Design"
  • "Linux"
  • "martial art"
  • "New Media"
  • "open movie editor"
  • "OpenCV"
  • "web authentication"
  • "web automation"
  • "web testing"
  • 01-18-2012
  • 10.04
  • 10.10
  • 3dprinting
  • 9.04
  • a
  • a11y
  • accerciser
  • accessibility
  • acpi
  • Activism
  • adobe
  • adzap
  • aegis
  • africa
  • AIR
  • ajax
  • alsa
  • AMD64
  • Andhra Pradesh
  • android
  • angling
  • Animals
  • anusaaraka
  • apache
  • apertium
  • api
  • apm
  • apple
  • apport
  • Apps Script
  • apt-get
  • apt-key
  • architecture
  • archmage
  • ardour
  • arduino
  • ARM
  • art
  • asterisk
  • atom
  • audio description
  • backlinks
  • bangalore
  • barcamp
  • barcamphanoi
  • barcampkl
  • barcamppp
  • barcampsaigon
  • barcampvte
  • bash
  • bbc
  • bcy2011
  • beagle
  • beercamp
  • Beryl
  • big buck bunny
  • biofuel
  • bittorrent
  • blackout
  • blender
  • blind
  • blogger
  • blogging
  • book
  • Boot-Process
  • boot2gecko
  • bootparam
  • braille
  • brazil
  • breakpad
  • broadcom
  • bugs
  • bzr
  • Calicut
  • cambodia
  • canon
  • Canopy
  • cartoons
  • cat
  • CC
  • CDAC
  • CDMA
  • celliax
  • censorship
  • CES 2008
  • CES08
  • CHDK
  • chennai
  • children
  • china
  • CHM
  • chmsee
  • Chromium
  • classmate PC
  • cleaning
  • Climate Change
  • cloud computing
  • cms
  • codec
  • Comedy
  • comics
  • command line
  • CommandLine
  • compiz
  • Computational Linguistics
  • console
  • cpan
  • Creative Commons
  • cron
  • css
  • curl
  • cut
  • cyanogenmod
  • DAISY
  • debian
  • debian documentation
  • debian upgrade-system
  • Debian-IN
  • deborphan
  • delhi
  • design
  • dhvani
  • django
  • DJvu
  • dmesg
  • documentation
  • dontzap
  • dots
  • dpatch
  • drupal
  • drush
  • earth hour
  • easy_install
  • ebay
  • eclipse
  • Ecuador
  • education
  • eee pc
  • eeepc
  • elinks
  • Elinks2
  • emacs
  • embedded linux
  • Environment
  • equivs
  • espeak
  • etch
  • events
  • fennec
  • ffmpeg
  • fiction
  • film
  • film making
  • find
  • findutils
  • firefox
  • firefox3
  • firefox4
  • firefoxOS
  • firmware
  • fishing
  • flying
  • foss.in
  • fossasia
  • fossin2008
  • FOSSMeet
  • free culture
  • free software
  • FreeNode
  • fsf
  • fsfs
  • fx4
  • G1
  • gadgets
  • gdm
  • gedit
  • geek humour
  • geocoding
  • Gimp
  • GISS
  • GIST
  • git
  • gnewsense
  • gnochm
  • gnome
  • google
  • google app engine
  • google earth
  • gplv3
  • grep
  • grub
  • GSM
  • gstreamer
  • gta02
  • GUI Testing
  • habba.in
  • hack
  • hackable1
  • hacker
  • handbrake
  • hanoi
  • hanoitweetup
  • hardware
  • hardy heron
  • HCU
  • hindi
  • hipatia
  • history
  • hosting
  • hotkeys
  • how to
  • HowTo
  • html
  • html5
  • HTTP
  • humour
  • hunspell
  • hyderabad
  • i810
  • ICANN41
  • iceweasel
  • identi.ca
  • IEEE
  • iffk
  • iframe
  • IISE
  • ILS
  • ILUG-D
  • IM
  • imacros
  • india
  • india_engg_students
  • Indian Languages
  • indic
  • indlinux
  • initscripts
  • Inkscape
  • intel
  • interaction design
  • internet
  • internet kiosk
  • intersat
  • Intrepid
  • Intrepid Ibex
  • ipod
  • IRC
  • jam
  • jaunty
  • Java
  • Javascript
  • Jet Man
  • josm
  • jquery
  • JSSH
  • Kannada
  • karmic
  • Kchm
  • kerala
  • kernel
  • keyboard
  • keycode
  • kid
  • kiddy video
  • kids
  • kinect
  • kiosk
  • koha
  • l10n
  • laos
  • launchpad
  • layout
  • ldap
  • lenny
  • less
  • lftp
  • libchm
  • library
  • libreoffice
  • lilo
  • Linux
  • lisp
  • local weather
  • locate
  • logging
  • lttoolbox
  • lucid
  • lv
  • machine translation system
  • madras
  • maemo
  • mailing-list
  • mallard
  • MALT
  • malware
  • man
  • manga
  • maps
  • maverick
  • mediawiki
  • meego
  • mencoder
  • merkaartor
  • Mibbit
  • micro-blogging
  • midori
  • mlocate
  • Mobile
  • moblin
  • mod_pagespeed
  • modem
  • more
  • most
  • mother
  • mozcamp
  • mozilla
  • Mozilla Crash Reporter
  • mplayer
  • MT
  • mukt.in
  • music
  • mwc2012
  • myanmar
  • mymozl10n
  • mysql
  • n70
  • nature
  • nedumangad
  • neo freerunner
  • Neo1973
  • nepal
  • netbooks
  • newbies
  • news
  • NGO
  • NITC
  • NLP
  • NLTK
  • Nokia
  • Nonprofits
  • notify-osd
  • novell
  • NUI
  • nvda
  • OCR
  • oddmuse
  • OLPC
  • ooffice
  • open movie
  • openDNS
  • openmoko
  • openNI
  • openOffice
  • openoffice.org
  • OpenStreetMap
  • opensuse
  • openvt
  • orca
  • OS
  • oscar
  • OSM
  • Package-Management
  • packaging
  • pager
  • parenting
  • patents
  • pbx
  • PDF
  • people
  • perl
  • Pets
  • Phatch
  • photography
  • php
  • php-nuke
  • phpnuke
  • pidgin
  • PIL
  • pipa
  • podcast
  • podcasting
  • pokhara
  • POS Tagger
  • postfix
  • potlatch
  • poweroff
  • powershot
  • proc
  • programming
  • pune
  • puppylinux
  • pyCairo
  • python
  • pythonegg
  • QA
  • Qmail
  • radio show
  • Recycling
  • red nose day
  • redhat
  • regex
  • RFC
  • RHEL
  • rms
  • RND
  • robots
  • rockbox
  • RSS
  • RSS/XML
  • rtorrent
  • rubber
  • ruby
  • rural
  • s60
  • sahana
  • samba
  • sampada
  • samsung
  • sbcl
  • science
  • scipy
  • search
  • security
  • SFD2011
  • shell
  • short-stories
  • shutdown
  • silk
  • singapore
  • sitecopy
  • skype
  • slocate
  • social media
  • software patents
  • software-center
  • softwarefreedomday
  • solar
  • solar eclipse
  • sopa
  • space
  • spam
  • SPE editor
  • speakers
  • spins
  • squid
  • stallman
  • stanford parser
  • startups
  • startx
  • stumpwm
  • SUSE
  • system-adminstration
  • sysvinit
  • t-shirt
  • tablet
  • tactile watch
  • tea shops
  • teacher
  • technology
  • tee
  • telugu
  • terminal
  • terminal Tags: command line
  • Tesseract
  • Testing
  • The IT Crowd
  • theatre
  • tibet
  • tracker
  • travel
  • trek
  • trekking
  • tux4kids
  • tuxmath
  • tv
  • tweets
  • twitter
  • ubuntu
  • UMPC
  • unicode
  • UNR
  • uptime
  • urdu
  • User friendly
  • uTouch
  • UX
  • UXA
  • vagrant
  • VCS
  • veli
  • vidarbha
  • video
  • video hamming
  • video hams
  • video-ham
  • vim
  • virutalization
  • visualization
  • voip
  • vsat
  • w3c
  • watches
  • water from dew
  • WATiR
  • weather stations
  • weave
  • web automation
  • web standards
  • web testing
  • web-browser
  • web2py
  • webmaker
  • wget
  • Wiki
  • wikia
  • wikipedia
  • Windows
  • Windows XP
  • wmv
  • Word Press
  • wordpress
  • worm
  • wrapzap
  • writing
  • wvdial
  • X-Window-System
  • X11
  • xchm
  • xev
  • xml
  • xmlstarlet
  • XO Laptop
  • xorg
  • xserver
  • xvidcap
  • yahoo groups
  • yahoo maps
  • yelp
  • Yves Rossy
  • Zii
  • ZTE

Blog Archive

  • ►  2013 (23)
    • ►  December (3)
    • ►  November (3)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  May (3)
    • ►  April (1)
    • ►  March (1)
    • ►  February (5)
    • ►  January (2)
  • ►  2012 (26)
    • ►  December (3)
    • ►  November (1)
    • ►  October (1)
    • ►  July (1)
    • ►  June (3)
    • ►  May (6)
    • ►  April (1)
    • ►  March (8)
    • ►  January (2)
  • ►  2011 (43)
    • ►  December (2)
    • ►  November (7)
    • ►  October (8)
    • ►  September (4)
    • ►  August (5)
    • ►  June (1)
    • ►  February (6)
    • ►  January (10)
  • ►  2010 (73)
    • ►  December (17)
    • ►  November (5)
    • ►  October (10)
    • ►  September (3)
    • ►  August (8)
    • ►  July (9)
    • ►  June (4)
    • ►  March (5)
    • ►  February (7)
    • ►  January (5)
  • ▼  2009 (108)
    • ►  December (7)
    • ►  November (10)
    • ►  October (8)
    • ►  September (6)
    • ►  August (8)
    • ▼  July (4)
      • Better Gnome Desktop Magnification with eZoom
      • NLTK Installation with Python easy_install
      • GDM Timed Login
      • NLTK on Ubuntu Quick Start Guide
    • ►  June (5)
    • ►  May (6)
    • ►  April (15)
    • ►  March (15)
    • ►  February (9)
    • ►  January (15)
  • ►  2008 (223)
    • ►  December (45)
    • ►  November (28)
    • ►  October (32)
    • ►  September (4)
    • ►  August (11)
    • ►  July (6)
    • ►  June (11)
    • ►  May (3)
    • ►  April (11)
    • ►  March (7)
    • ►  February (3)
    • ►  January (62)
  • ►  2007 (2)
    • ►  December (2)
Powered by Blogger.

About Me

Unknown
View my complete profile