Playing With Sid

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Sunday, 19 July 2009

NLTK Installation with Python easy_install

Posted on 05:26 by Unknown

Few weeks ago I wrote the NLTK on Ubuntu Quick Start Guide. Now with the release of NLTK (Natural Language Toolkit) 2.0b5 today the NLTK installation has been greatly simplified thanks to the nltk python egg (See Changelog).




To get started with NLTK install, you first need the python-setuptools package.



$ sudo apt-get install python-setuptools
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
python-setuptools
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 195kB of archives.
After this operation, 909kB of additional disk space will be used.
Get:1 http://in.archive.ubuntu.com karmic/main python-setuptools 0.6c9-0ubuntu4 [195kB]
Fetched 195kB in 9s (20.2kB/s)
Selecting previously deselected package python-setuptools.
(Reading database ... 106971 files and directories currently installed.)
Unpacking python-setuptools (from .../python-setuptools_0.6c9-0ubuntu4_all.deb) ...
Setting up python-setuptools (0.6c9-0ubuntu4) ...



Now lets install the NLTK with easy_install program.


$ sudo easy_install http://nltk.googlecode.com/files/nltk-2.0b5-py2.6.egg

Downloading http://nltk.googlecode.com/files/nltk-2.0b5-py2.6.egg
Processing nltk-2.0b5-py2.6.egg
creating /usr/local/lib/python2.6/dist-packages/nltk-2.0b5-py2.6.egg
Extracting nltk-2.0b5-py2.6.egg to /usr/local/lib/python2.6/dist-packages
Adding nltk 2.0b5 to easy-install.pth file

Installed /usr/local/lib/python2.6/dist-packages/nltk-2.0b5-py2.6.egg
Processing dependencies for nltk==2.0b5
Searching for PyYAML==3.08
Reading http://pypi.python.org/simple/PyYAML/
Reading http://pyyaml.org/wiki/PyYAML
Best match: PyYAML 3.08
Downloading http://pyyaml.org/download/pyyaml/PyYAML-3.08.zip
Processing PyYAML-3.08.zip
Running PyYAML-3.08/setup.py -q bdist_egg --dist-dir /tmp/easy_install-T7Y0La/PyYAML-3.08/egg-dist-tmp-vRjvDM
build/temp.linux-i686-2.6/check_libyaml.c:2:18: error: yaml.h: No such file or directory
build/temp.linux-i686-2.6/check_libyaml.c: In function ‘main’:
build/temp.linux-i686-2.6/check_libyaml.c:5: error: ‘yaml_parser_t’ undeclared (first use in this function)
build/temp.linux-i686-2.6/check_libyaml.c:5: error: (Each undeclared identifier is reported only once
build/temp.linux-i686-2.6/check_libyaml.c:5: error: for each function it appears in.)
build/temp.linux-i686-2.6/check_libyaml.c:5: error: expected ‘;’ before ‘parser’
build/temp.linux-i686-2.6/check_libyaml.c:6: error: ‘yaml_emitter_t’ undeclared (first use in this function)
build/temp.linux-i686-2.6/check_libyaml.c:6: error: expected ‘;’ before ‘emitter’
build/temp.linux-i686-2.6/check_libyaml.c:8: warning: implicit declaration of function ‘yaml_parser_initialize’
build/temp.linux-i686-2.6/check_libyaml.c:8: error: ‘parser’ undeclared (first use in this function)
build/temp.linux-i686-2.6/check_libyaml.c:9: warning: implicit declaration of function ‘yaml_parser_delete’
build/temp.linux-i686-2.6/check_libyaml.c:11: warning: implicit declaration of function ‘yaml_emitter_initialize’
build/temp.linux-i686-2.6/check_libyaml.c:11: error: ‘emitter’ undeclared (first use in this function)
build/temp.linux-i686-2.6/check_libyaml.c:12: warning: implicit declaration of function ‘yaml_emitter_delete’

libyaml is not found or a compiler error: forcing --without-libyaml
(if libyaml is installed correctly, you may need to
specify the option --include-dirs or uncomment and
modify the parameter include_dirs in setup.cfg)
zip_safe flag not set; analyzing archive contents...
Adding PyYAML 3.08 to easy-install.pth file

Installed /usr/local/lib/python2.6/dist-packages/PyYAML-3.08-py2.6-linux-i686.egg
Finished processing dependencies for nltk==2.0b5



Now you done, import the NLTK and start downloading the NTLK data.


$ python
Python 2.6.2+ (release26-maint, Jun 19 2009, 15:14:35)
[GCC 4.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download()
NLTK Downloader
---------------------------------------------------------------------------
d) Download l) List c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> l

Packages:
/usr/local/lib/python2.6/dist-packages/nltk-2.0b5-py2.6.egg/nltk/__init__.py:588: DeprecationWarning: object.__new__() takes no parameters
[ ] maxent_ne_chunker... ACE Named Entity Chunker (Maximum entropy)
[ ] abc................. Australian Broadcasting Commission 2006
[ ] brown............... Brown Corpus
[ ] alpino.............. Alpino Dutch Treebank
[ ] cess_cat............ CESS-CAT Treebank
[ ] brown_tei........... Brown Corpus (TEI XML Version)
[ ] cmudict............. The Carnegie Mellon Pronouncing Dictionary (0.6)
[ ] biocreative_ppi..... BioCreAtIvE (Critical Assessment of Information
Extraction Systems in Biology)
[ ] cess_esp............ CESS-ESP Treebank
[ ] chat80.............. Chat-80 Data Files
[ ] city_database....... City Database
[ ] conll2002........... CONLL 2002 Named Entity Recognition Corpus
[ ] conll2000........... CONLL 2000 Chunking Corpus
[ ] conll2007........... Dependency Treebanks from CoNLL 2007 (Catalan
and Basque Subset)
[ ] dependency_treebank. Dependency Parsed Treebank
[ ] floresta............ Portuguese Treebank
[ ] genesis............. Genesis Corpus
[ ] gazetteers.......... Gazeteer Lists
Hit Enter to continue:


Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in Computational Linguistics, easy_install, Linux, NLP, NLTK, python, pythonegg, ubuntu | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Vietnamese Language Tools: Developing Keyboards and Spell-checker
    Mentoring Mozilla Vietnamese localization team to develop Vietnamese keyboard and spell checker for Firefox OS . We built and tested Vietna...
  • Multimedia/Internet Keyboards in GNU/Linux
    As the resident Gnu/Linux guru, I sometimes help new users making transition from other OS’s.Over the years I’d ask anyone who installed Gnu...
  • New CD Writing Tools in Debian "Etch"
    From this weeks Debian Weekly News Jörg Jaspert called for testers of the new cdrkit and the new wodim program. They will be shipped ...
  • GDM Timed Login
    The Gnome Display Manager(GDM) is the graphical login program for Gnome Desktop environment. It is being completely rewritten, one of its ...
  • Gitbox Git GUI tool
    Here is another post in the series of the posts about tools that makes developers life simpler. Git was developed as distributed revision c...
  • Info-Activism Camp in Bangalore
    This is the first-ever international camp on Info-Activism and is a week-long affair. The event will bring together 120 rights advocates, te...
  • Ralink Wireless support in Puppy Linux
    My Gecko Edubook has a Ralink wireless USB adapter . Later models of this ultra-cheap netbook shipped with more ubiquitous wireless cards fr...
  • Barcamp Phnom Penh 5
    Last year we kick-started Mozilla Firefox Aurora Khmer localization effort at Barcamp Phnom Penh 4 . Now a year later, Thanks to efforts of ...
  • Locate Configuration Example
    Keep things found is not an easy skill to master. The disk drive turns into a deep and dark jungle after the first few weeks on road. Tradit...
  • Bollywood Afternoon In Hanoi
    A lot can happen over a cup of Masala Chai. Sunday afternoon's are usually Indian food at Foodshop 45 in Tay Ho, Hanoi. After a good mea...

Categories

  • "Blog Action Day 08 - Poverty"
  • "blog action day"
  • "Compiz Extras "
  • "compiz-fusion"
  • "film making"
  • "FSUG-Bangalore"
  • "GISS"
  • "Graphic Design"
  • "Linux"
  • "martial art"
  • "New Media"
  • "open movie editor"
  • "OpenCV"
  • "web authentication"
  • "web automation"
  • "web testing"
  • 01-18-2012
  • 10.04
  • 10.10
  • 3dprinting
  • 9.04
  • a
  • a11y
  • accerciser
  • accessibility
  • acpi
  • Activism
  • adobe
  • adzap
  • aegis
  • africa
  • AIR
  • ajax
  • alsa
  • AMD64
  • Andhra Pradesh
  • android
  • angling
  • Animals
  • anusaaraka
  • apache
  • apertium
  • api
  • apm
  • apple
  • apport
  • Apps Script
  • apt-get
  • apt-key
  • architecture
  • archmage
  • ardour
  • arduino
  • ARM
  • art
  • asterisk
  • atom
  • audio description
  • backlinks
  • bangalore
  • barcamp
  • barcamphanoi
  • barcampkl
  • barcamppp
  • barcampsaigon
  • barcampvte
  • bash
  • bbc
  • bcy2011
  • beagle
  • beercamp
  • Beryl
  • big buck bunny
  • biofuel
  • bittorrent
  • blackout
  • blender
  • blind
  • blogger
  • blogging
  • book
  • Boot-Process
  • boot2gecko
  • bootparam
  • braille
  • brazil
  • breakpad
  • broadcom
  • bugs
  • bzr
  • Calicut
  • cambodia
  • canon
  • Canopy
  • cartoons
  • cat
  • CC
  • CDAC
  • CDMA
  • celliax
  • censorship
  • CES 2008
  • CES08
  • CHDK
  • chennai
  • children
  • china
  • CHM
  • chmsee
  • Chromium
  • classmate PC
  • cleaning
  • Climate Change
  • cloud computing
  • cms
  • codec
  • Comedy
  • comics
  • command line
  • CommandLine
  • compiz
  • Computational Linguistics
  • console
  • cpan
  • Creative Commons
  • cron
  • css
  • curl
  • cut
  • cyanogenmod
  • DAISY
  • debian
  • debian documentation
  • debian upgrade-system
  • Debian-IN
  • deborphan
  • delhi
  • design
  • dhvani
  • django
  • DJvu
  • dmesg
  • documentation
  • dontzap
  • dots
  • dpatch
  • drupal
  • drush
  • earth hour
  • easy_install
  • ebay
  • eclipse
  • Ecuador
  • education
  • eee pc
  • eeepc
  • elinks
  • Elinks2
  • emacs
  • embedded linux
  • Environment
  • equivs
  • espeak
  • etch
  • events
  • fennec
  • ffmpeg
  • fiction
  • film
  • film making
  • find
  • findutils
  • firefox
  • firefox3
  • firefox4
  • firefoxOS
  • firmware
  • fishing
  • flying
  • foss.in
  • fossasia
  • fossin2008
  • FOSSMeet
  • free culture
  • free software
  • FreeNode
  • fsf
  • fsfs
  • fx4
  • G1
  • gadgets
  • gdm
  • gedit
  • geek humour
  • geocoding
  • Gimp
  • GISS
  • GIST
  • git
  • gnewsense
  • gnochm
  • gnome
  • google
  • google app engine
  • google earth
  • gplv3
  • grep
  • grub
  • GSM
  • gstreamer
  • gta02
  • GUI Testing
  • habba.in
  • hack
  • hackable1
  • hacker
  • handbrake
  • hanoi
  • hanoitweetup
  • hardware
  • hardy heron
  • HCU
  • hindi
  • hipatia
  • history
  • hosting
  • hotkeys
  • how to
  • HowTo
  • html
  • html5
  • HTTP
  • humour
  • hunspell
  • hyderabad
  • i810
  • ICANN41
  • iceweasel
  • identi.ca
  • IEEE
  • iffk
  • iframe
  • IISE
  • ILS
  • ILUG-D
  • IM
  • imacros
  • india
  • india_engg_students
  • Indian Languages
  • indic
  • indlinux
  • initscripts
  • Inkscape
  • intel
  • interaction design
  • internet
  • internet kiosk
  • intersat
  • Intrepid
  • Intrepid Ibex
  • ipod
  • IRC
  • jam
  • jaunty
  • Java
  • Javascript
  • Jet Man
  • josm
  • jquery
  • JSSH
  • Kannada
  • karmic
  • Kchm
  • kerala
  • kernel
  • keyboard
  • keycode
  • kid
  • kiddy video
  • kids
  • kinect
  • kiosk
  • koha
  • l10n
  • laos
  • launchpad
  • layout
  • ldap
  • lenny
  • less
  • lftp
  • libchm
  • library
  • libreoffice
  • lilo
  • Linux
  • lisp
  • local weather
  • locate
  • logging
  • lttoolbox
  • lucid
  • lv
  • machine translation system
  • madras
  • maemo
  • mailing-list
  • mallard
  • MALT
  • malware
  • man
  • manga
  • maps
  • maverick
  • mediawiki
  • meego
  • mencoder
  • merkaartor
  • Mibbit
  • micro-blogging
  • midori
  • mlocate
  • Mobile
  • moblin
  • mod_pagespeed
  • modem
  • more
  • most
  • mother
  • mozcamp
  • mozilla
  • Mozilla Crash Reporter
  • mplayer
  • MT
  • mukt.in
  • music
  • mwc2012
  • myanmar
  • mymozl10n
  • mysql
  • n70
  • nature
  • nedumangad
  • neo freerunner
  • Neo1973
  • nepal
  • netbooks
  • newbies
  • news
  • NGO
  • NITC
  • NLP
  • NLTK
  • Nokia
  • Nonprofits
  • notify-osd
  • novell
  • NUI
  • nvda
  • OCR
  • oddmuse
  • OLPC
  • ooffice
  • open movie
  • openDNS
  • openmoko
  • openNI
  • openOffice
  • openoffice.org
  • OpenStreetMap
  • opensuse
  • openvt
  • orca
  • OS
  • oscar
  • OSM
  • Package-Management
  • packaging
  • pager
  • parenting
  • patents
  • pbx
  • PDF
  • people
  • perl
  • Pets
  • Phatch
  • photography
  • php
  • php-nuke
  • phpnuke
  • pidgin
  • PIL
  • pipa
  • podcast
  • podcasting
  • pokhara
  • POS Tagger
  • postfix
  • potlatch
  • poweroff
  • powershot
  • proc
  • programming
  • pune
  • puppylinux
  • pyCairo
  • python
  • pythonegg
  • QA
  • Qmail
  • radio show
  • Recycling
  • red nose day
  • redhat
  • regex
  • RFC
  • RHEL
  • rms
  • RND
  • robots
  • rockbox
  • RSS
  • RSS/XML
  • rtorrent
  • rubber
  • ruby
  • rural
  • s60
  • sahana
  • samba
  • sampada
  • samsung
  • sbcl
  • science
  • scipy
  • search
  • security
  • SFD2011
  • shell
  • short-stories
  • shutdown
  • silk
  • singapore
  • sitecopy
  • skype
  • slocate
  • social media
  • software patents
  • software-center
  • softwarefreedomday
  • solar
  • solar eclipse
  • sopa
  • space
  • spam
  • SPE editor
  • speakers
  • spins
  • squid
  • stallman
  • stanford parser
  • startups
  • startx
  • stumpwm
  • SUSE
  • system-adminstration
  • sysvinit
  • t-shirt
  • tablet
  • tactile watch
  • tea shops
  • teacher
  • technology
  • tee
  • telugu
  • terminal
  • terminal Tags: command line
  • Tesseract
  • Testing
  • The IT Crowd
  • theatre
  • tibet
  • tracker
  • travel
  • trek
  • trekking
  • tux4kids
  • tuxmath
  • tv
  • tweets
  • twitter
  • ubuntu
  • UMPC
  • unicode
  • UNR
  • uptime
  • urdu
  • User friendly
  • uTouch
  • UX
  • UXA
  • vagrant
  • VCS
  • veli
  • vidarbha
  • video
  • video hamming
  • video hams
  • video-ham
  • vim
  • virutalization
  • visualization
  • voip
  • vsat
  • w3c
  • watches
  • water from dew
  • WATiR
  • weather stations
  • weave
  • web automation
  • web standards
  • web testing
  • web-browser
  • web2py
  • webmaker
  • wget
  • Wiki
  • wikia
  • wikipedia
  • Windows
  • Windows XP
  • wmv
  • Word Press
  • wordpress
  • worm
  • wrapzap
  • writing
  • wvdial
  • X-Window-System
  • X11
  • xchm
  • xev
  • xml
  • xmlstarlet
  • XO Laptop
  • xorg
  • xserver
  • xvidcap
  • yahoo groups
  • yahoo maps
  • yelp
  • Yves Rossy
  • Zii
  • ZTE

Blog Archive

  • ►  2013 (23)
    • ►  December (3)
    • ►  November (3)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  May (3)
    • ►  April (1)
    • ►  March (1)
    • ►  February (5)
    • ►  January (2)
  • ►  2012 (26)
    • ►  December (3)
    • ►  November (1)
    • ►  October (1)
    • ►  July (1)
    • ►  June (3)
    • ►  May (6)
    • ►  April (1)
    • ►  March (8)
    • ►  January (2)
  • ►  2011 (43)
    • ►  December (2)
    • ►  November (7)
    • ►  October (8)
    • ►  September (4)
    • ►  August (5)
    • ►  June (1)
    • ►  February (6)
    • ►  January (10)
  • ►  2010 (73)
    • ►  December (17)
    • ►  November (5)
    • ►  October (10)
    • ►  September (3)
    • ►  August (8)
    • ►  July (9)
    • ►  June (4)
    • ►  March (5)
    • ►  February (7)
    • ►  January (5)
  • ▼  2009 (108)
    • ►  December (7)
    • ►  November (10)
    • ►  October (8)
    • ►  September (6)
    • ►  August (8)
    • ▼  July (4)
      • Better Gnome Desktop Magnification with eZoom
      • NLTK Installation with Python easy_install
      • GDM Timed Login
      • NLTK on Ubuntu Quick Start Guide
    • ►  June (5)
    • ►  May (6)
    • ►  April (15)
    • ►  March (15)
    • ►  February (9)
    • ►  January (15)
  • ►  2008 (223)
    • ►  December (45)
    • ►  November (28)
    • ►  October (32)
    • ►  September (4)
    • ►  August (11)
    • ►  July (6)
    • ►  June (11)
    • ►  May (3)
    • ►  April (11)
    • ►  March (7)
    • ►  February (3)
    • ►  January (62)
  • ►  2007 (2)
    • ►  December (2)
Powered by Blogger.

About Me

Unknown
View my complete profile