View
 

Release Log

Page history last edited by Shion Deysarkar 1 year, 5 months ago

 

Release - 1.4 - 20 April 2010


  • Crawl Packages released.  Purchase pre-configured crawls and start using 80legs in a few simple clicks.
  • Marketplace released.  The 80app Store has been merged with the Marketplace.  Products available:  Crawl Packages, 80apps.

 

Release - 1.31 - 12 April 2010


  • Python API released

 

Release - 1.3 - 19 January 2010


  • 80app Store: pre-built 80apps available for purchase.  Developers can sell their own 80apps to earn money.
  • 80app Packs: each subscription plan now includes select pre-built 80apps from the 80app Store absolutely free!

 

Release - 1.2 - 21 December 2009


  • Absolutely free crawling for crawls up to 100,000 pages
  • New monthly subscription plans for added features and larger crawls

 

Release - 1.0 - 22 September 2009


  • True web-scale crawling: crawl up to 2 billion pages per day
  • Usability: easily design your own crawls using an intuitive job form

 

Release - 0.90 - 25 August 2009


  • Crawling performance
    • Larger crawls now supported - max # of pages per job increased to 10 million
    • Implemented three types of crawls to support specific user needs - fast, comprehensive, and breadth-first
    • Can now crawl https:// pages
    • Better handling of only crawling on current domain
    • Crawler tries to fetch page more than once before returning NO_RESPONSE error
    • Crawls no longer hang and slow down the job when stuck on certain domains
  • API
    • Official release of the Java API - programmatically submit jobs, download results, etc.
    • Sample desktop application for using 80legs available (including source code)
  • 80apps
    • Users can now load in external JARs into their 80app
  • Jobs
    • New and easier-to-use job form in the web portal (as well as updated interface throughout the portal)
  • User accounts
    • Users can set notifications for when jobs complete and other events
  • Seed lists
    • Fixed several bugs resulting from special characters in the seed lists

 

 

Release - 0.83 - 8 July 2009


  • New seed list upload functionality
    • Seed lists can now be uploaded separately before a crawl
    • Up to 1 GB allowable for seed list size
  • Job form switches certain fields to defaults in a more convenient way
  • Code section in portal provides more information
  • Results section in portal provides more information
  • Several performance improvements for crawling and back-end data store
  • Calls to constructors in 80apps are only charged once per 1000 URLs (note: 80legs is still free to use for now)

 

 

Release - 0.82 - 26 June 2009


  • Smarter URL selection for larger crawls
  • Sandbox jobs run automatically and the user gets access to stdout from their 80app
  • Domain throttling information in the portal
  • Time estimates shown in the portal
  • Crawled results files additions: 
  • Several improvements for large job performance
  • Fixed problem with multiple Loading Code errors
  • Improved default link parsing
  • JAR approval process can now pass in uploaded data
  • Better web portal login behavior
  • Domain throttling information shown for runs in web portal
  • Estimated time to completion shown for runs in web portal

 

 

Release - 0.81 - 17 June 2009


  • Many different improvements to the handling of 80apps to help developers get started
  • Payment system beta (80legs will still be free for now, but we'll have our users add pretend money to their accounts)
  • Allow larger result sizes (1KB/page is still free with a 10KB/page max)
  • More settings for crawl control
  • More crawl status messages (see here)
  • Many more backend improvements based on user feedback from 0.80
  • Results returned faster after job completes
  • Improved 80app class loading time

 

 

Release - 0.80 - 3 June 2009


This major release is the first time 80legs customers are able to use their own 80apps (custom code).

  • 80apps initial release (first I80App release with parseLinks() and processDocument())
  • Option to analyze specific MIME types
  • Option to preserve query strings when crawling
  • Resulting crawl list shows status codes and other reasons for failing to crawl (e.g. robots.txt, DNS, etc)
  • Better handling of failed URLs
  • Sandbox server for testing custom code on your own machine using the 80legs framework.
  • Stop problem jobs automatically

 

Release 0.76 - 22 April 2009


  • Portal keeps you logged in until you're inactive for 30 minutes
  • Reduce duplicate pages crawled for small jobs (there are still quite a few, but reducing it further potentially increases the job runtime significantly)
  • Throttle based on parent domains and IP addresses
  • Option to only crawl on domains from seed list
  • Gzip/deflate crawling
  • Improved foreign language content selection (UTF-8)
  • Increase portal limit to 1M maxPagesToCrawl
  • Improved URL parsing

 

 

Release 0.75 - 17 April 2009


  • Lots of internal improvements
  • Fix problem canceling jobs
  • Improve success rate on certain page reading failures
  • Other page reading and robots.txt improvements

 

 

Release 0.711 - 14 April 2009


  • Centralized Robot/DNS higher performance.
  • Removed long pauses from some longer-running jobs
  • In the Portal, made seed lists writable when copying job
  • In the Portal, added help section on actions you can do on jobs (copy, cancel, delete)
  • Improved robots.txt handling - fixed empty disallow string
  • Fix for jobs sometimes not starting right and showing completed even though it has zero pages done
  • Refined crawler regular expressions (about: links not being ignored)
  • Fixed formatting in Portal result files
  • Cancel job bug in Portal fixed

 

 

Release 0.71 - 13 April 2009


  • Compress Portal results files
  • Set user-agent as 80bot (Fix for Alibaba.com)
  • In the Portal, take an old job and resubmit, allowing for parameter changes ("Copy Job" link)
  • Global robots.txt and DNS Manager
  • Fix for ending conditions for some crawls
  • Option to "Include crawled but not analyzed" pages in results
  • Add change log to external website and within portal
  • Add explanation for result file naming convention
  • Refined crawler regular expressions (some mailto: and javascript: links were not being ignored)

 

 

Release 0.7 - 9 April 2009


 

 

 

 

Comments (0)

You don't have permission to comment on this page.