View
 

80legs API

Page history last edited by Aliya 8 months, 1 week ago

Table of Contents


 

Introduction


The 80legs API is a simple programmatic interface that allows you to access the functionality that 80legs provides outside of the web portal.  

 

The 80legs API is currently available in the following programming languages:

  • Java
  • .NET
  • Python

 

We expect to add support for other languages based on demand.  If you would like to request access for another language, let us know by contacting us at support@80legs.com.

 

The API makes synchronous calls to our web service.  The API creates a socket connection over port 80.

 

Getting Started


To get started with the 80legs API, follow these steps:

  1. Access the Web Portal - Access the 80legs web portal.  If you don't have access, please contact us and we'll send you instructions.
  2. Retrieve your API Security Token - To use the 80legs API, you will need your API credentials.  These credentials are sent to 80legs with each API call.  To retrieve your credentials:
    1. In the web portal, select "My Accounts" from the top menu. 
    2. Under "80legs Application and API Information", you should be able to see your security token.  This token is unique to your account and it will be used to authenticate with the 80legs API.
  3. Read the "Important Concepts" Section Below - The Important Concepts section will familiarize you with some basic concepts that you will need to be familiar with to communicate with the 80legs API.
  4. Write Code to Communicate with the API - Follow one of the links below for instructions on how to write code to communicate with the API:
    1. Java version
    2. .NET version
    3. Python version
  5. Add your IP Address(es) (Optional) - If you would like additional security, add your IP address to the list of "Authorized IP Address(es)."  Only the IP addresses provided will be allowed to access the 80legs API.  Configuring your IPs will add an extra level of security for your account.

 

Important Concepts


API Updates

The current version of the API is Version 1.0.  A version number must be specified with each request made to the API.   When there is a significant update to the API in the future, the version number will be changed and backwards compatibility will be maintained if possible.

 

Throttling

Request throttling, often used as a method to ensure QoS (Quality of Service) for a variety of network and application uses, is used by 80legs as an attempt to not overwhelm the system.  For the moment, API requests are throttled to 3 requests per second per API client. 

 

UTF-8 Character Encoding

The 80legs API assumes that all data is in Unicode, specifically, the Unicode (or UCS) Transformation Format, 8-bit encoding form (UTF-8). The API will always return data in UTF-8.

 

Security

The 80legs API is protected to ensure that only authorized 80legs users use it.  As mentioned earlier, there are two levels of security:
  • Authentication - Each request must contain authentication credentials in the form of an API Token.  This token is available in the web portal under Account > Settings > API Information.
  • IP Address Restriction (Optional) - If an IP address is added to your account from the web portal, then this feature will become active.  Only the authorized IP addresses will be able to access the API when this feature is active.  The feature is automatically disabled when no IP addresses are configured for the account.

 

Basic Terminology

Some important terms for the API are below.  If you are familiar with the features available on the portal, you may skip this section.

 

Term
Description
Crawl Job The basic entity in 80legs.  To use 80legs, you must create jobs, configure their settings, and run them.  Jobs have four categories of settings: some general setting, crawl, analysis, and result.
Environment The environment in which your job will run.  80legs provides a 'sandbox' environment and a 'live' environment.
  • Sandbox environment - Lets you test your jobs in a more limited environment by running them on a small number of sample pages at no cost.
  • Live environment - Runs your job normally at regular pricing.
Frequency Refers to how often you want your job to repeat. Available options are:
  • Do Not Repeat - Your job will only run once.
  • Repeat Daily - Your job will run on a daily basis.
  • Repeat Weekly - Your job will run on a weekly basis.
  • Repeat Monthly - Your job will run on a monthly basis.

 

For repeating options, you can choose a start date, end date, and frequency interval.  If you set 'Do Not Repeat', the job will be queued to start immediately.  It will typically start within 2-3 minutes. 

 

The frequency interval refers to how often it will be repeated. For example if the Frequency type is DAILY, then setting interval to 1, would mean repeat every one day. In the case of WEEKLY, it would mean repeat every 1 week.   The interval should never be set to 0.

Job Runs Each time a job executes and performs a crawl, it is referred to as a run.  A job can have one or more job runs.
Job Settings

Basic job object that has the fields required to create a job.  A job has four categories of settings.  It contains information such as name, environment, frequency, start date, end date, Analysis Settings, Crawl Settings and Result Settings.

  • Crawl Settings - Settings that will determine the URLs that the crawl will use.
  • Analysis Settings - Settings that specify which pages are actually analyzed during the crawl.
  • Result Settings - Contains information about what type of results 80legs will generate and file size limits.
Job Overview

Provides details about the overall status of the job, the status of the latest Run and other statistical information.

  • Status - A job can be in the following states: Created, Completed, Canceled, Deleted and Error.  
  • Run Status - A job run can be in the following states: Not in queue, In Queue, Running, Completed, Canceled and Error.
Run Results

A job can have more than one run.  This provides the information about the results for a specific run. There can be two types of result files for each run:

  • Crawled URLs - A list of the URLs the run crawled, along with additional meta information.
  • Analyzed URLs - A file containing the URLs and results of your analysis on pages your run analyzed.
Seed Lists You have a choice of either adding a list of URLs to a job or setting a Seedlist Id in a job.  The Seedlist Id refers to a previously uploaded seed list file.
Data File You can upload data files that are needed by your custom code.  Once a data file is uploaded to 80legs, you can use it in your jobs.  This is an option which can be set when running custom code.
Analysis Method

Specify what type of analysis you want to run.  Available options are:

  • Regular expression list - 80legs will match the content of pages using a list of regular expressions you provide.
  • Keyword list - 80legs will match the content of pages using a list of keywords you provide.
  • Code - 80legs will run your custom analysis code on web content and return the results of that code.

Crawl Type

(Added in 0.90 Release)

Specifies what type of crawl is going to be run. Available options are:

  • Fast - The "Fast Crawl" option increases the speed of a crawl by crawling multiple depth levels at a time. 
  • Comprehensive - The "Comprehensive" option ensures that all pages are crawled in each depth level and is faster than the "True Breadth-First Crawl" option. 
  • Breadth-First - The "True Breadth-First Crawl" option can be very slow as it waits for each depth level to complete before proceeding.

Outgoing Links to Crawl

(Added in 0.90 Release)

(Update August, 2010)

80legs crawls by following the outgoing links it finds on URLs from your seed list. You can configure which outgoing links 80legs should follow as it is crawling. Available options are:

  • CRAWL_ALL_LINKS - Crawl all links found
  • LINKS_FROM_SAME_PARENT_DOMAIN - Crawl links from the same parent domain for each URL in my seed list - When crawling, 80legs will start with a URL in your seed list and follow links that are from the same parent domain as that URL. This will be done separately for each URl in your seed list. Example: If your seed list has two items, http://test1.msn.com and http://test2.yahoo.com, the crawl starting from http://test1.msn.com will only follow links that contain the parent domain "msn.com" and the crawl from http://test2.yahoo.com will only follow links that contain the parent domain "yahoo.com".
  • LINKS_FROM_SAME_FULLY_QUALIFIED_DOMAIN - Crawl links from the same fully qualified domain for each URL in my seed list and treat "www.domain.com" and "domain.com" as the same domains - When crawling, 80legs will start with a URL in your seed list and follow links that are from the same fully qualified domain as that URL.  This will be done separately from each URL in your seed list. Example: If your seed list has two items, http://test1.msn.com and http://test2.yahoo.com, the crawl starting from http://test1.msn.com will only follow links that contain the fully qualified domain "test1.msn.com" and the crawl from http://test2.yahoo.com will only follow links that contain the fully qualified domain "test2.yahoo.com".  www.domain.com and domain.com will be treated as the same domain.
  • LINKS_FROM_SAME_FULLY_QUALIFIED_DOMAIN_WITH_RESTRICTED_HOST - Crawl links from the same fully qualified domain for each URL in my seed list and treat "www.domain.com" and "domain.com" as different domains - When crawling, 80legs will start with a URL in your seed list and follow links that are from the same fully qualified domain as that URL.  This will be done separately from each URL in your seed list. Example: If your seed list has two items, http://test1.msn.com and http://test2.yahoo.com, the crawl starting from http://test1.msn.com will only follow links that contain the fully qualified domain "test1.msn.com" and the crawl from http://test2.yahoo.com will only follow links that contain the fully qualified domain "test2.yahoo.com".  www.domain.com and domain.com will be treated as different domains.

Queue Number

(Added in 0.91 Release)

When a job is created, 80legs puts the job in a queue, afterwhich the job runs.  If there is, however, waiting queue items, the 80legs will assign each queue item a queue number based on the order they are created. If there is no waiting queue items to be processed, the value will simply be 0.
Crawl Packages Crawl packages are pre-built crawls, designed and set up by the 80legs team, that crawl specific domains and retrieve specific information from those domains.  Crawl packages typically crawl websites with significant amounts of data, such as e-commerce sites, social networks, real estate sites, blogs and so on.  You can access the crawl package jobs using the API.

 

Job Object Mapping


 

Note: Job is a conceptual object that is not being returned in the current version of the API.

 

Objects


 The following objects are specific to 80legs:

 

JobSetting

Fields
Required Description 
id   Identifier for the job
name x The name of your job.  Can be at most 256 characters.
environmentType x The environment in which your job will run.  80legs provides a 'sandbox' environment and a 'live' environment. Use EnvironmentType enum to set to "Live" or "Sandbox".
frequencyType x How often do you want the job to repeat.  Use FrequencyType enum to set to "Does not repeat", "Daily", "Weekly" or "Monthly".
frequencyInterval   Optional. For repeating options, you can add frequency interval. If you choose 'Does not repeat', the job interval is set to -1. For any other case, the job interval should be set to greater than or equal to 1. For example if the Frequency type is DAILY, then setting interval to 1, would mean that the job would repeat every day. In the case of WEEKLY, the job would repeat every 1 week.   The interval should never be set to 0. Use setFrequencyInterval() to set this the interval. 
startDate   Calendar type
endDate   Calendar type.  This is the date when the job repetition should finish.  If the frequencyType is "Does not repeat", then the endDate is equal to the startDate.
crawlSetting x See CrawlSetting object
analysisSetting x See AnalysisSetting object
resultSetting x See ResultSetting object
initialCreditReserved   How much of the account credit is reserved for the job after the it is created.
user   See User object

 

 

CrawlSetting

Fields
Required Description 
id   int.  Identifies the crawl settings.
seedList or seedListId x ArrayList<String> or int.  These are the URLs from which your crawl will start. Add seed URLs in seed list Array or set seedListId. seedListId is for an already uploaded text file.
 
outgoingLinkToCrawl x

enum OutgoingLinkType.  Default is CRAWL_ALL_LINKS. The other options are LINKS_FROM_SAME_FULLY_QUALIFIED_DOMAIN and LINKS_FROM_SAME_PARENT_DOMAIN.

 

LINKS_FROM_SAME_FULLY_QUALIFIED_DOMAIN = When crawling, 80legs will start with a URL in your seed list and follow links that are from the same fully qualified domain as that URL. This will be done separately for each URL in your seed list. For example: If your seed list has two items, http://test1.msn.com and http://test2.yahoo.com, the crawl starting from http://test1.msn.com will only follow links that contain the fully qualified domain "test1.msn.com" and the crawl from http://test2.yahoo.com will only follow links that contain the fully qualified domain "test2.yahoo.com".

 

LINKS_FROM_SAME_PARENT_DOMAIN = When crawling, 80legs will start with a URL in your seed list and follow links that are from the same parent domain as that URL. This will be done separately for each URL in your seed list. For example: If your seed list has two items, http://test1.msn.com and http://test2.yahoo.com, the crawl starting from http://test1.msn.com will only follow links that contain the parent domain "msn.com" and the crawl from http://test2.yahoo.com will only follow links that contain the parent domain "yahoo.com".

crawlRegularExpression   String type. Use this field to specify which outgoing links you want to crawl from a page.  If outgoing link to crawl is set to CRAWL_ALL_LINKS, the crawl regular expression can be empty or not.  If it is set to LINKS_FROM_SAME_FULLY_ QUALIFIED_DOMAIN or LINKS_FROM_SAME_PARENT_DOMAIN, crawl regular expression will not be used. 
mimeTypeList x ArrayList<String> Type. Add to the Mime type array list - "text" required if Analysis method is "Regular Expression List" or "Keyword List".  http://en.wikipedia.org/wiki/Mime_type
isPreservingQueryStringWhenCrawling   Boolean type. Default is true. Setting to false will remove the query strings from URLs. (This is not available in the portal)
maxNumberOfUrls x Max of 100 allowed if Environment is Sandbox.  Cannot be more than 10,000,000 URLs. Must be at least 1.
maxNumberOfUrlsPerPage x  
depthLevel x Use this field to specify how deep you want to crawl.   Range must be greater than or equal to 0.
crawlType x Specifies whether the crawl type is FAST, COMPREHENSIVE or BREADTH_FIRST (Included in the new release of portal and API as of August 25th).

 

 

AnalysisSetting

Fields
Required Description 
analysisRegularExpression   You can choose to only process documents with URLs that match a specific regular expression.  It is used to specify which pages are actually analyzed during your crawl.  The crawl regular expression tells 80legs which pages to crawl, while this expression on which pages to run your analysis.
 
mimeTypeList x This option allows you to specify exactly which types of pages you would like to run an analysis on.  If a page is not one of the specified MIME types, then it is skipped. Add MIME type to the Mime type list - "text" required if Analysis method is "Regular Expression List" or "Keyword List".
analysisMethod x What type of analysis do you want to run. Set using the Analysis Method enum to "Regular Expression List", "Keyword List", "Code", "EIGHTY_APP".
analysisMethodList   Required IF either "Regular Expression List" or "Keyword List" is set for the Analysis Method.
codeId   Required IF analysisMethod is set to "Code". This is the code Id for a JAR file created by you that will be used to perform the analysis.  Default is 0.
dataId   The identifier for the data file that will be used during the analysis. Default is 0.
eightyAppVersionId   Required IF analysisMethod is set to EIGHTY_APP.

 

 

ResultSetting

Fields
Required Description 
hasCrawlUrlsInResult   boolean Type.  If true, 80legs will generate two sets of result files. The first set will contain results for pages that were analyzed. The second set will contain the URLs of pages that were crawled, but not analyzed. If false, 80legs will only generate result files that contain the URLs of pages that were analyzed. Default is true.
resultType x

Use JobResultType enum to set to "Count Array", "Boolean Array", "Unique or Total Count" if either "Regular Expression List" or "Keyword List" is set for the Analysis Method or "Code Result" if "Code" is set for Analysis Method. 

  • Unique and total count - 80legs outputs the # of unique matches and total # of matches for your content selection strings (i.e., keywords or regular expressions)
  • Boolean array - 80legs outputs the two numbers above plus a 1 or 0 for each string, depending on whether or not that string was found
  • Count array - 80legs outputs the unique and total count plus the total count for each string
  • Code results - If you select to analyze content using code, result type will default to this option
maxResultFileSizeInMB x Must be between 10 and 100.  This is the unzipped size.

 

 

JobSummary

Fields
Required Description 
id   Identifies the job.
name   The name of the job.
environmentType   The environment in which your job will run.  80legs provides a 'sandbox' environment and a 'live' environment. It is of EnvironmentType enum type.  Possible values are "Live" or "Sandbox".
frequencyType   How often does the job repeat. FrequencyType enum type = "Does not repeat", "Daily", "Weekly" or "Monthly".
frequencyType > interval   Available from the getInterval() method from the FrequencyType.
 
startDate   Calendar type. The date/time the job was created.
endDate   Calendar type.  This is the date when the job repetition should finish.  If the frequencyType is "Does not repeat", then the endDate is equal to the startDate.
jobStatus   Provides current status of the job (JobStatusType enum type = Created, Completed, Cancelled, Deleted and Error. )
user   A User object, has User ID. This id belongs to the user who created the job.
activeRuns   Number of job runs that are active
completedRuns   Number of job runs that have completed
jobErrorList   job Error list.  Contains error message if the jobStatus = "ERROR"
latestRunStatusType   The status of the latest run that was completed for the job.

 

 

JobOverview

Fields
Required Description 
id   Identifies the job
jobStatus   Provides current status of the job (JobStatusType enum type = Created, Completed, Cancelled, Deleted and Error. )
latestRunStatusType   Provides latest run status of the job (JobRunStatusType enum type = In Queue, Cancelled, Running, Completed Error or Unknown. )
latestRunQueueNumber   The queue number is only relevant when the queue status is In Queue ( not yet started ). If the status is 1 (running), or 2 (completed), the queue number will be 0.
numRunsCompleted   Total number of runs completed for the job.
amountChargedAllRuns    
initialCreditReserved   This is the amount reserved from your account balance at the start of each run.  It is reflected in your available balance.  This reserved amount is an estimate and is calculated using the "Max number of URLs" field specified at the time of job creation - (double)
latestThrottledDomains   Provides information about the latest domains that were throttled.
totalPagesCrawled   Total pages crawled
totalPagesNeeded
 
  Total number of pages needed
totalPagesProcessed
 
  Total number of pages analyzed
 
totalSecondCpu   Total CPU hours used
totalSecondLeft   Total CPU hours left
totalGeneralExceptionsParse    
totalSecurityExceptionsParse    
totalGeneralExceptionsProcess    
totalSecurityExceptionsProcess  
 
totalExceptionsLoadCode   Tell how many times load code errors took place.  A load code error happens when a node we send work to is unable to load the 80app being used.  For most apps this will be insignificant, but larger apps will have more problems.
startDate   Calendar type.  The start/datetime when the job was created.
endDate   Calendar type.  This is the date when the job repetition should finish.  If the frequencyType is "Does not repeat", then the endDate is equal to the startDate.
jobErrorList   Job Error List.  Contains error message if the jobStatus = "ERROR"
latestRunResultsPosted   Provides information of whether all the results are posted for the latest run.

 

 

JobRun

Fields
Required Description 
id   Identifier for the job run
dateCreated   Date the job run was created after you submitted the job.
 
dateStarted (new)   Date the job run actually started running. 
dateCompleted   Date the job run was completed. This is set when the job completes successfully.
dateEnded (new)   Date the job stopped running (whether it was canceled, deleted, or completed successfully). 
runStatus
 
  Provides run status of the job (JobRunStatusType enum type = In Queue, Cancelled, Running, Completed Error or Unknown. )
numberPagesCrawled
 
  Number of pages crawled during the job run.  This continues to change until the job is completed.
numberPagesNeeded
 
  Number of pages needed during the job run.  This continues to change until the job is completed.
numberPagesProcessed
 
  Number of pages processed during the job run.  This continues to change until the job is completed.
numSecondCpu
 
  Number of seconds CPU
numGeneralExceptionsParse
 
  This keeps track of the number of times the parseLinks() method threw a general Exception.
numSecurityExceptionsParse
 
  This keeps track of the number of times the parseLinks() method threw a SecurityException. This is usually because the parseLinks() method attempted to do something that is not allowed, such as accessing the disk or making network requests.
numGeneralExceptionsProcess
 
  This keeps track of the number of times the processDocument() method threw a general Exception.
numSecurityExceptionsProcess   This keeps track of the number of times the processDocument() method threw a SecurityException. This is usually because the processDocument() method attempted to do something that is not allowed, such as accessing the disk or making network requests.
numExceptionsLoadCode   Number of load code exceptions.  A load code error happens when a node we send work to is unable to load the 80app being used.  For most apps this will be insignificant, but larger apps will have more problems.
numConstructorTimeoutGood   Number of good constructor timeouts
numConstructorTimeoutBad   Number of bad constructor timeouts
numComputeTimeoutGood   Number of good compute timeouts. This means your code was taking too long to run, but we were able to stop the process.
numComputeTimeoutBad   Number of bad compute timeouts. This means your code was taking too long and we had to kill the JVM.
numSecondsLeft   Number of seconds left
throttledDomains   Domains that were throttled. (domain.com D0=04:51:46 information basically means "your jobs are currently throttled on domain.com at this depth level)
pluraVersionGuid   PluraVersion GUID.  This is needed for getting standard output files for jobs created for sandbox server.
runResults   Arraylist of RunResults
contructorTimeouts   Constructor timeout status. Possible values "OK" or "BAD"
computeTimeouts   Compute timeout status. Possible values "OK" or "BAD"
allTimeouts   Overall timeouts based on values of constructor and compute timeouts.
queueNumber  

The queue number is only relevant when the queue status is "In Queue" ( not yet started ). If the status is 1 (running), or 2 (completed), the queue number will be 0.

donePostingResult   It may take up to 10 minutes for all of the results to be posted. Previously, there was no way to determine whether all the results have been posted after a job has completed.  If this field is set to 1, all the results have been posted for the job after it has completed.  Exception: If a job has 0 pages crawled (i.e. if it's canceled before it starts), then the doneResultPosting flag will *never* be set (because there will not be a result file).

 

 

RunResult

Fields
Required Description 
id   Identifies the run result.
 
resultFileType   The file can contain information about the URLs that were crawled or analyzed for the job.
chunkNumber   The result file is divided into files which is sequentially marked by this chunk number.  The limit of the file is set by max size of each result file under Result Setting.
resultFileName   The actual result file name.
resultLocation   The location where the result file can be found.
zippedFileSizeInBytes   Lets user to see the file size before it is downloaded.
pluraVersionGuid   This can be used to get std out information for jobs that were for the Sandbox Environment.
resultStatus   This specifies whether the result file has been CREATED or DELETED.  The result files will be deleted after a certain time based on the 80legs Storage Policy.  (Included in the new release of API as of August 25th).
fileCreationTime   The date/time the result file was zip and posted.
noOfTimesDownloaded   This number is how many times the result has been downloaded.  Each result can be download a maximum of 5 times. 
crawlPackageId   If the run result is for a crawl package job, this field has the crawl package id. 
eightyAppVersionId   If the run result is for a job that used an 80 app, this field keeps track of the 80app version.  This is useful later for post processing the results.

 

 

CodeFile

Fields
Required Description 
id   Identifier for code file
userId   Identifies the user who uploaded the code file.
 
name x Code name which is unique to the user
dateUploaded   Date code was uploaded
sha1Value   Hash key value of file
fileSizeInBytes   File size in bytes
fileName x The name of the actual jar file that was uploaded.
codeStatus   Provides the status of the code.  Uses CodeStatusType = Awaiting Approval, Approved, Denied, Unknown.
codeErrors   List of code errors that happened while running the code approval process.
maxNodeHeapSpaceMB   The max node heap size required for the code to run.  This is specified by the user during the code upload process.

 

 

DataFile

Fields
Required Description 
id   Identifier for code file
userId   Identifies the user who uploaded the data file.
 
name x Data name which is unique to the user
fileName x The name of the actual file that was uploaded
fileSizeInBytes   File size in bytes
dateCreated   The date the data file was uploaded
isDeleted   Useful when the data file has been deleted.

 

 

SeedlistFile

Fields
Required Description 
id   Identifier for code file
userId   Identifies the user who uploaded the seed list file.
 
name x Name which is unique to the user
fileName x The name of the actual file that was uploaded
fileSizeInBytes   File size in bytes 
dateCreated   When the seed list file was uploaded
isDeleted   Useful when the seed list file has been deleted.

 

 

AccountBalance

Fields
Required Description 
currentBalance   double Type.  The current balance of the account.
availableBalance   double Type.  This is the current balance minus the amount reserved for jobs and minus the amount currently requested for withdrawal.  When a job is created, an initial amount is reserved for the job to ensure that there is enough credit to allow that job to run to completion.  This initial reserved amount is only an estimate.  The actual charge for the job will be subtracted from the current balance after the job starts running.

 

User

Fields
Required Description 
email   The e-mail of the User.
firstName   The first Name of the User.
lastName   The last Name of the User.
address   The address of the User.
city   The city of the User.
state   The state abbreviation.
postalCode   The postal code in string format.
country   The country of the User.
primaryNumber   The primary number of the User that was provided.
secondaryNumber   The secondary number of the User.

 

EightyApp

Fields
Required Description 
name   The name of the 80app.
author   The organization or the author who wrote the 80app.
description   The description or what the eighty app is intended for.
instructions   Instructions on how to use the 80app properly.
data   This is the data input that the author may provide that can be used if not user input date is required.
requiresUserInputData   If this is true, the user is required to provide a data input. If not, the author may have a data input file that can be used.
visibility   Visibility enum. Only public eighty app is available through the app store. Useful for developers. If 80app is PRIVATE, it will not be available for purchase from the marketplace. It will only be available for the developer to use for testing purposes.
url   The url that may have more information about the 80app.
supportedStatus   Information whether the 80app is supported by the author or is deprecated.
releaseDate   The date/time the 80app was released to the public.
earnings   Provides information about how much the 80app has earned the developer (this information is not available through the api).
earningsByUsage   Provides information about how much earnings have been accumulated for the developer through app usage. (this information is not available through the api).
earningsByPurchases   Provides information about how much earning have been accumulated for the developer through app purchases. (this information is not available through the api).
latestVersion   The latest version that is released.
includedInMyAppPack   Indicates that the eighty app is part of an App pack that is available to the user as part of their SLA Plan.
initialOneTimePrice   Initial purchase price of the 80app. Once purchased, the 80app will be available in the Analysis Settings > Analysis to Run > 80apps section of the Job Form.
pricePerUseFixed   This is a fixed price that will be charged each time you create a job using the 80app.
pricePerUsePerMpa   This is the price that will be charged based on the number of pages analyzed during a job run.
postProcessingCodeList   List of PostProcessingCodeFile that can be used to post process the 80app. If there are more than one post processing code files, at least one has to be the default one.  This list only contains information about the code and the id.
category   Category the eighty app belongs to in the Marketplace. It can be Business, Real Estate, Retail & Shopping and Social Networks.

 

Crawl Package

Fields
Required Description 
id   Identifier for the crawl package
userId   This is not provided through the API.
name   The Name of the Crawl package.
price   The price for the crawl package.
SLAPrice   The price for the crawl package if you are subscribed to a SLA Plan.
subscriptionDate   The date you subscribed to the crawl package.
subscriptionDeletedDate   The date your subscription was canceled.
category   The category assigned to the crawl package.
dateReleased   The date/time the crawl package was released.
updateFrequency   How often the crawl package is updated.
averageUnzippedFilesizeInMB   The size of each result file (unzipped).
averageZippedFilesizeInMB   The zipped file size of each zip file.
summary   This contains information such as the total result file size in MB, total number of files and the last time the crawl package was updated.

 

 

FAQ


Check out the FAQ for answers to a wide variety of questions.

 

Exception Handling


Exception handling in Java is done through a try-catch-finally block.  The "try" block of code is where you put the code which may throw an exception.  The "catch" block of code is where you put the code which will execute after an exception is thrown in the try block. This is often used to display an error message or to mitigate problems caused by the exception.

 

The API throws the following exceptions:

  •     EightyLegsFileNotFoundException - This exception is thrown when a file is not found.
  •     EightyLegsIOException - This exception is thrown when there is an error during file processing.
  •     EightyLegsParameterException - This exception is thrown when a parameter is of an incorrect type.
  •     EightyLegsXmlException - This exception is thrown when the system has a problem parsing the XML response from the internal web service.
  •     EightyLegsException - This exception is thrown when the API receives an exception from the internal web service.  The API creates and throws this exception.
  •     EightyLegsConnectionException - This exception is thrown when the API cannot connect to the internal web service.
  •     EightyLegsAPIException - This exception is thrown when the API incurs an error and it does not belong to any of the others provided.

 

As of March 2010, all the above exceptions now have have a parent exception class: EightyLegsCommonException.  Please get the latest version of the API. 

 

Note:  If you see an error on this page, please let us know by submitting a ticket through the portal.  Thank you.

 

 

 

and treat "www.domain.com" and "domain.com" as different domains

Comments (0)

You don't have permission to comment on this page.