| |
Release Log
Page history
last edited
by Shion Deysarkar 1 year, 8 months ago
Release - 1.4 - 20 April 2010
- Crawl Packages released. Purchase pre-configured crawls and start using 80legs in a few simple clicks.
- Marketplace released. The 80app Store has been merged with the Marketplace. Products available: Crawl Packages, 80apps.
Release - 1.31 - 12 April 2010
Release - 1.3 - 19 January 2010
- 80app Store: pre-built 80apps available for purchase. Developers can sell their own 80apps to earn money.
- 80app Packs: each subscription plan now includes select pre-built 80apps from the 80app Store absolutely free!
Release - 1.2 - 21 December 2009
- Absolutely free crawling for crawls up to 100,000 pages
- New monthly subscription plans for added features and larger crawls
Release - 1.0 - 22 September 2009
- True web-scale crawling: crawl up to 2 billion pages per day
- Usability: easily design your own crawls using an intuitive job form
Release - 0.90 - 25 August 2009
- Crawling performance
- Larger crawls now supported - max # of pages per job increased to 10 million
- Implemented three types of crawls to support specific user needs - fast, comprehensive, and breadth-first
- Can now crawl https:// pages
- Better handling of only crawling on current domain
- Crawler tries to fetch page more than once before returning NO_RESPONSE error
- Crawls no longer hang and slow down the job when stuck on certain domains
- API
- Official release of the Java API - programmatically submit jobs, download results, etc.
- Sample desktop application for using 80legs available (including source code)
- 80apps
- Users can now load in external JARs into their 80app
- Jobs
- New and easier-to-use job form in the web portal (as well as updated interface throughout the portal)
- User accounts
- Users can set notifications for when jobs complete and other events
- Seed lists
- Fixed several bugs resulting from special characters in the seed lists
Release - 0.83 - 8 July 2009
- New seed list upload functionality
- Seed lists can now be uploaded separately before a crawl
- Up to 1 GB allowable for seed list size
- Job form switches certain fields to defaults in a more convenient way
- Code section in portal provides more information
- Results section in portal provides more information
- Several performance improvements for crawling and back-end data store
- Calls to constructors in 80apps are only charged once per 1000 URLs (note: 80legs is still free to use for now)
Release - 0.82 - 26 June 2009
- Smarter URL selection for larger crawls
- Sandbox jobs run automatically and the user gets access to stdout from their 80app
- Domain throttling information in the portal
- Time estimates shown in the portal
- Crawled results files additions:
- Several improvements for large job performance
- Fixed problem with multiple Loading Code errors
- Improved default link parsing
- JAR approval process can now pass in uploaded data
- Better web portal login behavior
- Domain throttling information shown for runs in web portal
- Estimated time to completion shown for runs in web portal
Release - 0.81 - 17 June 2009
- Many different improvements to the handling of 80apps to help developers get started
- Payment system beta (80legs will still be free for now, but we'll have our users add pretend money to their accounts)
- Allow larger result sizes (1KB/page is still free with a 10KB/page max)
- More settings for crawl control
- More crawl status messages (see here)
- Many more backend improvements based on user feedback from 0.80
- Results returned faster after job completes
- Improved 80app class loading time
Release - 0.80 - 3 June 2009
This major release is the first time 80legs customers are able to use their own 80apps (custom code).
- 80apps initial release (first I80App release with parseLinks() and processDocument())
- Option to analyze specific MIME types
- Option to preserve query strings when crawling
- Resulting crawl list shows status codes and other reasons for failing to crawl (e.g. robots.txt, DNS, etc)
- Better handling of failed URLs
- Sandbox server for testing custom code on your own machine using the 80legs framework.
- Stop problem jobs automatically
Release 0.76 - 22 April 2009
- Portal keeps you logged in until you're inactive for 30 minutes
- Reduce duplicate pages crawled for small jobs (there are still quite a few, but reducing it further potentially increases the job runtime significantly)
- Throttle based on parent domains and IP addresses
- Option to only crawl on domains from seed list
- Gzip/deflate crawling
- Improved foreign language content selection (UTF-8)
- Increase portal limit to 1M maxPagesToCrawl
- Improved URL parsing
Release 0.75 - 17 April 2009
- Lots of internal improvements
- Fix problem canceling jobs
- Improve success rate on certain page reading failures
- Other page reading and robots.txt improvements
Release 0.711 - 14 April 2009
- Centralized Robot/DNS higher performance.
- Removed long pauses from some longer-running jobs
- In the Portal, made seed lists writable when copying job
- In the Portal, added help section on actions you can do on jobs (copy, cancel, delete)
- Improved robots.txt handling - fixed empty disallow string
- Fix for jobs sometimes not starting right and showing completed even though it has zero pages done
- Refined crawler regular expressions (about: links not being ignored)
- Fixed formatting in Portal result files
- Cancel job bug in Portal fixed
Release 0.71 - 13 April 2009
- Compress Portal results files
- Set user-agent as 80bot (Fix for Alibaba.com)
- In the Portal, take an old job and resubmit, allowing for parameter changes ("Copy Job" link)
- Global robots.txt and DNS Manager
- Fix for ending conditions for some crawls
- Option to "Include crawled but not analyzed" pages in results
- Add change log to external website and within portal
- Add explanation for result file naming convention
- Refined crawler regular expressions (some mailto: and javascript: links were not being ignored)
Release 0.7 - 9 April 2009
Release Log
|
|
Tip: To turn text into a link, highlight the text, then click on a page or file from the list above.
|
|
|
Comments (0)
You don't have permission to comment on this page.