| |
80legs API
Page history
last edited
by Aliya 8 months, 1 week ago
Table of Contents
Introduction
The 80legs API is a simple programmatic interface that allows you to access the functionality that 80legs provides outside of the web portal.
The 80legs API is currently available in the following programming languages:
We expect to add support for other languages based on demand. If you would like to request access for another language, let us know by contacting us at support@80legs.com.
The API makes synchronous calls to our web service. The API creates a socket connection over port 80.
Getting Started
To get started with the 80legs API, follow these steps:
- Access the Web Portal - Access the 80legs web portal. If you don't have access, please contact us and we'll send you instructions.
- Retrieve your API Security Token - To use the 80legs API, you will need your API credentials. These credentials are sent to 80legs with each API call. To retrieve your credentials:
- In the web portal, select "My Accounts" from the top menu.
- Under "80legs Application and API Information", you should be able to see your security token. This token is unique to your account and it will be used to authenticate with the 80legs API.
- Read the "Important Concepts" Section Below - The Important Concepts section will familiarize you with some basic concepts that you will need to be familiar with to communicate with the 80legs API.
- Write Code to Communicate with the API - Follow one of the links below for instructions on how to write code to communicate with the API:
- Java version
- .NET version
- Python version
- Add your IP Address(es) (Optional) - If you would like additional security, add your IP address to the list of "Authorized IP Address(es)." Only the IP addresses provided will be allowed to access the 80legs API. Configuring your IPs will add an extra level of security for your account.
Important Concepts
API Updates
The current version of the API is Version 1.0. A version number must be specified with each request made to the API. When there is a significant update to the API in the future, the version number will be changed and backwards compatibility will be maintained if possible.
Throttling
Request throttling, often used as a method to ensure QoS (Quality of Service) for a variety of network and application uses, is used by 80legs as an attempt to not overwhelm the system. For the moment, API requests are throttled to 3 requests per second per API client.
UTF-8 Character Encoding
The 80legs API assumes that all data is in Unicode, specifically, the Unicode (or UCS) Transformation Format, 8-bit encoding form (UTF-8). The API will always return data in UTF-8.
Security
The 80legs API is protected to ensure that only authorized 80legs users use it. As mentioned earlier, there are two levels of security:
- Authentication - Each request must contain authentication credentials in the form of an API Token. This token is available in the web portal under Account > Settings > API Information.
- IP Address Restriction (Optional) - If an IP address is added to your account from the web portal, then this feature will become active. Only the authorized IP addresses will be able to access the API when this feature is active. The feature is automatically disabled when no IP addresses are configured for the account.
Basic Terminology
Some important terms for the API are below. If you are familiar with the features available on the portal, you may skip this section.
Term
|
Description
|
| Crawl Job |
The basic entity in 80legs. To use 80legs, you must create jobs, configure their settings, and run them. Jobs have four categories of settings: some general setting, crawl, analysis, and result. |
| Environment |
The environment in which your job will run. 80legs provides a 'sandbox' environment and a 'live' environment.
- Sandbox environment - Lets you test your jobs in a more limited environment by running them on a small number of sample pages at no cost.
- Live environment - Runs your job normally at regular pricing.
|
| Frequency |
Refers to how often you want your job to repeat. Available options are:
- Do Not Repeat - Your job will only run once.
- Repeat Daily - Your job will run on a daily basis.
- Repeat Weekly - Your job will run on a weekly basis.
- Repeat Monthly - Your job will run on a monthly basis.
For repeating options, you can choose a start date, end date, and frequency interval. If you set 'Do Not Repeat', the job will be queued to start immediately. It will typically start within 2-3 minutes.
The frequency interval refers to how often it will be repeated. For example if the Frequency type is DAILY, then setting interval to 1, would mean repeat every one day. In the case of WEEKLY, it would mean repeat every 1 week. The interval should never be set to 0.
|
| Job Runs |
Each time a job executes and performs a crawl, it is referred to as a run. A job can have one or more job runs. |
| Job Settings |
Basic job object that has the fields required to create a job. A job has four categories of settings. It contains information such as name, environment, frequency, start date, end date, Analysis Settings, Crawl Settings and Result Settings.
- Crawl Settings - Settings that will determine the URLs that the crawl will use.
- Analysis Settings - Settings that specify which pages are actually analyzed during the crawl.
- Result Settings - Contains information about what type of results 80legs will generate and file size limits.
|
| Job Overview |
Provides details about the overall status of the job, the status of the latest Run and other statistical information.
- Status - A job can be in the following states: Created, Completed, Canceled, Deleted and Error.
- Run Status - A job run can be in the following states: Not in queue, In Queue, Running, Completed, Canceled and Error.
|
| Run Results |
A job can have more than one run. This provides the information about the results for a specific run. There can be two types of result files for each run:
- Crawled URLs - A list of the URLs the run crawled, along with additional meta information.
- Analyzed URLs - A file containing the URLs and results of your analysis on pages your run analyzed.
|
| Seed Lists |
You have a choice of either adding a list of URLs to a job or setting a Seedlist Id in a job. The Seedlist Id refers to a previously uploaded seed list file. |
| Data File |
You can upload data files that are needed by your custom code. Once a data file is uploaded to 80legs, you can use it in your jobs. This is an option which can be set when running custom code. |
| Analysis Method |
Specify what type of analysis you want to run. Available options are:
- Regular expression list - 80legs will match the content of pages using a list of regular expressions you provide.
- Keyword list - 80legs will match the content of pages using a list of keywords you provide.
- Code - 80legs will run your custom analysis code on web content and return the results of that code.
|
|
Crawl Type
(Added in 0.90 Release)
|
Specifies what type of crawl is going to be run. Available options are:
- Fast - The "Fast Crawl" option increases the speed of a crawl by crawling multiple depth levels at a time.
- Comprehensive - The "Comprehensive" option ensures that all pages are crawled in each depth level and is faster than the "True Breadth-First Crawl" option.
- Breadth-First - The "True Breadth-First Crawl" option can be very slow as it waits for each depth level to complete before proceeding.
|
|
Outgoing Links to Crawl
(Added in 0.90 Release)
(Update August, 2010)
|
80legs crawls by following the outgoing links it finds on URLs from your seed list. You can configure which outgoing links 80legs should follow as it is crawling. Available options are:
- CRAWL_ALL_LINKS - Crawl all links found
- LINKS_FROM_SAME_PARENT_DOMAIN - Crawl links from the same parent domain for each URL in my seed list - When crawling, 80legs will start with a URL in your seed list and follow links that are from the same parent domain as that URL. This will be done separately for each URl in your seed list. Example: If your seed list has two items, http://test1.msn.com and http://test2.yahoo.com, the crawl starting from http://test1.msn.com will only follow links that contain the parent domain "msn.com" and the crawl from http://test2.yahoo.com will only follow links that contain the parent domain "yahoo.com".
- LINKS_FROM_SAME_FULLY_QUALIFIED_DOMAIN - Crawl links from the same fully qualified domain for each URL in my seed list and treat "www.domain.com" and "domain.com" as the same domains - When crawling, 80legs will start with a URL in your seed list and follow links that are from the same fully qualified domain as that URL. This will be done separately from each URL in your seed list. Example: If your seed list has two items, http://test1.msn.com and http://test2.yahoo.com, the crawl starting from http://test1.msn.com will only follow links that contain the fully qualified domain "test1.msn.com" and the crawl from http://test2.yahoo.com will only follow links that contain the fully qualified domain "test2.yahoo.com". www.domain.com and domain.com will be treated as the same domain.
- LINKS_FROM_SAME_FULLY_QUALIFIED_DOMAIN_WITH_RESTRICTED_HOST - Crawl links from the same fully qualified domain for each URL in my seed list and treat "www.domain.com" and "domain.com" as different domains - When crawling, 80legs will start with a URL in your seed list and follow links that are from the same fully qualified domain as that URL. This will be done separately from each URL in your seed list. Example: If your seed list has two items, http://test1.msn.com and http://test2.yahoo.com, the crawl starting from http://test1.msn.com will only follow links that contain the fully qualified domain "test1.msn.com" and the crawl from http://test2.yahoo.com will only follow links that contain the fully qualified domain "test2.yahoo.com". www.domain.com and domain.com will be treated as different domains.
|
|
Queue Number
(Added in 0.91 Release)
|
When a job is created, 80legs puts the job in a queue, afterwhich the job runs. If there is, however, waiting queue items, the 80legs will assign each queue item a queue number based on the order they are created. If there is no waiting queue items to be processed, the value will simply be 0. |
| Crawl Packages |
Crawl packages are pre-built crawls, designed and set up by the 80legs team, that crawl specific domains and retrieve specific information from those domains. Crawl packages typically crawl websites with significant amounts of data, such as e-commerce sites, social networks, real estate sites, blogs and so on. You can access the crawl package jobs using the API.
|
Job Object Mapping

Note: Job is a conceptual object that is not being returned in the current version of the API.
Objects
The following objects are specific to 80legs:
JobSetting
Fields
|
Required |
Description
|
| id |
|
Identifier for the job |
| name |
x |
The name of your job. Can be at most 256 characters. |
| environmentType |
x |
The environment in which your job will run. 80legs provides a 'sandbox' environment and a 'live' environment. Use EnvironmentType enum to set to "Live" or "Sandbox". |
| frequencyType |
x |
How often do you want the job to repeat. Use FrequencyType enum to set to "Does not repeat", "Daily", "Weekly" or "Monthly". |
| frequencyInterval |
|
Optional. For repeating options, you can add frequency interval. If you choose 'Does not repeat', the job interval is set to -1. For any other case, the job interval should be set to greater than or equal to 1. For example if the Frequency type is DAILY, then setting interval to 1, would mean that the job would repeat every day. In the case of WEEKLY, the job would repeat every 1 week. The interval should never be set to 0. Use setFrequencyInterval() to set this the interval.
|
| startDate |
|
Calendar type |
| endDate |
|
Calendar type. This is the date when the job repetition should finish. If the frequencyType is "Does not repeat", then the endDate is equal to the startDate.
|
| crawlSetting |
x |
See CrawlSetting object |
| analysisSetting |
x |
See AnalysisSetting object |
| resultSetting |
x |
See ResultSetting object |
| initialCreditReserved |
|
How much of the account credit is reserved for the job after the it is created. |
| user |
|
See User object |
CrawlSetting
Fields
|
Required |
Description
|
| id |
|
int. Identifies the crawl settings. |
| seedList or seedListId |
x |
ArrayList<String> or int. These are the URLs from which your crawl will start. Add seed URLs in seed list Array or set seedListId. seedListId is for an already uploaded text file. |
| outgoingLinkToCrawl |
x |
enum OutgoingLinkType. Default is CRAWL_ALL_LINKS. The other options are LINKS_FROM_SAME_FULLY_QUALIFIED_DOMAIN and LINKS_FROM_SAME_PARENT_DOMAIN.
LINKS_FROM_SAME_FULLY_QUALIFIED_DOMAIN = When crawling, 80legs will start with a URL in your seed list and follow links that are from the same fully qualified domain as that URL. This will be done separately for each URL in your seed list. For example: If your seed list has two items, http://test1.msn.com and http://test2.yahoo.com, the crawl starting from http://test1.msn.com will only follow links that contain the fully qualified domain "test1.msn.com" and the crawl from http://test2.yahoo.com will only follow links that contain the fully qualified domain "test2.yahoo.com".
LINKS_FROM_SAME_PARENT_DOMAIN = When crawling, 80legs will start with a URL in your seed list and follow links that are from the same parent domain as that URL. This will be done separately for each URL in your seed list. For example: If your seed list has two items, http://test1.msn.com and http://test2.yahoo.com, the crawl starting from http://test1.msn.com will only follow links that contain the parent domain "msn.com" and the crawl from http://test2.yahoo.com will only follow links that contain the parent domain "yahoo.com".
|
| crawlRegularExpression |
|
String type. Use this field to specify which outgoing links you want to crawl from a page. If outgoing link to crawl is set to CRAWL_ALL_LINKS, the crawl regular expression can be empty or not. If it is set to LINKS_FROM_SAME_FULLY_ QUALIFIED_DOMAIN or LINKS_FROM_SAME_PARENT_DOMAIN, crawl regular expression will not be used. |
| mimeTypeList |
x |
ArrayList<String> Type. Add to the Mime type array list - "text" required if Analysis method is "Regular Expression List" or "Keyword List". http://en.wikipedia.org/wiki/Mime_type |
| isPreservingQueryStringWhenCrawling |
|
Boolean type. Default is true. Setting to false will remove the query strings from URLs. (This is not available in the portal) |
| maxNumberOfUrls |
x |
Max of 100 allowed if Environment is Sandbox. Cannot be more than 10,000,000 URLs. Must be at least 1. |
| maxNumberOfUrlsPerPage |
x |
|
| depthLevel |
x |
Use this field to specify how deep you want to crawl. Range must be greater than or equal to 0. |
| crawlType |
x |
Specifies whether the crawl type is FAST, COMPREHENSIVE or BREADTH_FIRST (Included in the new release of portal and API as of August 25th). |
AnalysisSetting
Fields
|
Required |
Description
|
| analysisRegularExpression |
|
You can choose to only process documents with URLs that match a specific regular expression. It is used to specify which pages are actually analyzed during your crawl. The crawl regular expression tells 80legs which pages to crawl, while this expression on which pages to run your analysis. |
| mimeTypeList |
x |
This option allows you to specify exactly which types of pages you would like to run an analysis on. If a page is not one of the specified MIME types, then it is skipped. Add MIME type to the Mime type list - "text" required if Analysis method is "Regular Expression List" or "Keyword List". |
| analysisMethod |
x |
What type of analysis do you want to run. Set using the Analysis Method enum to "Regular Expression List", "Keyword List", "Code", "EIGHTY_APP". |
| analysisMethodList |
|
Required IF either "Regular Expression List" or "Keyword List" is set for the Analysis Method. |
| codeId |
|
Required IF analysisMethod is set to "Code". This is the code Id for a JAR file created by you that will be used to perform the analysis. Default is 0. |
| dataId |
|
The identifier for the data file that will be used during the analysis. Default is 0. |
| eightyAppVersionId |
|
Required IF analysisMethod is set to EIGHTY_APP. |
ResultSetting
Fields
|
Required |
Description
|
| hasCrawlUrlsInResult |
|
boolean Type. If true, 80legs will generate two sets of result files. The first set will contain results for pages that were analyzed. The second set will contain the URLs of pages that were crawled, but not analyzed. If false, 80legs will only generate result files that contain the URLs of pages that were analyzed. Default is true. |
| resultType |
x |
Use JobResultType enum to set to "Count Array", "Boolean Array", "Unique or Total Count" if either "Regular Expression List" or "Keyword List" is set for the Analysis Method or "Code Result" if "Code" is set for Analysis Method.
- Unique and total count - 80legs outputs the # of unique matches and total # of matches for your content selection strings (i.e., keywords or regular expressions)
- Boolean array - 80legs outputs the two numbers above plus a 1 or 0 for each string, depending on whether or not that string was found
- Count array - 80legs outputs the unique and total count plus the total count for each string
- Code results - If you select to analyze content using code, result type will default to this option
|
| maxResultFileSizeInMB |
x |
Must be between 10 and 100. This is the unzipped size. |
JobSummary
Fields
|
Required |
Description
|
| id |
|
Identifies the job. |
| name |
|
The name of the job. |
| environmentType |
|
The environment in which your job will run. 80legs provides a 'sandbox' environment and a 'live' environment. It is of EnvironmentType enum type. Possible values are "Live" or "Sandbox". |
| frequencyType |
|
How often does the job repeat. FrequencyType enum type = "Does not repeat", "Daily", "Weekly" or "Monthly". |
| frequencyType > interval |
|
Available from the getInterval() method from the FrequencyType. |
| startDate |
|
Calendar type. The date/time the job was created.
|
| endDate |
|
Calendar type. This is the date when the job repetition should finish. If the frequencyType is "Does not repeat", then the endDate is equal to the startDate.
|
| jobStatus |
|
Provides current status of the job (JobStatusType enum type = Created, Completed, Cancelled, Deleted and Error. ) |
| user |
|
A User object, has User ID. This id belongs to the user who created the job. |
| activeRuns |
|
Number of job runs that are active |
| completedRuns |
|
Number of job runs that have completed |
| jobErrorList |
|
job Error list. Contains error message if the jobStatus = "ERROR" |
| latestRunStatusType |
|
The status of the latest run that was completed for the job. |
JobOverview
Fields
|
Required |
Description
|
| id |
|
Identifies the job |
| jobStatus |
|
Provides current status of the job (JobStatusType enum type = Created, Completed, Cancelled, Deleted and Error. ) |
| latestRunStatusType |
|
Provides latest run status of the job (JobRunStatusType enum type = In Queue, Cancelled, Running, Completed Error or Unknown. ) |
| latestRunQueueNumber |
|
The queue number is only relevant when the queue status is In Queue ( not yet started ). If the status is 1 (running), or 2 (completed), the queue number will be 0. |
| numRunsCompleted |
|
Total number of runs completed for the job. |
| amountChargedAllRuns |
|
|
| initialCreditReserved |
|
This is the amount reserved from your account balance at the start of each run. It is reflected in your available balance. This reserved amount is an estimate and is calculated using the "Max number of URLs" field specified at the time of job creation - (double) |
| latestThrottledDomains |
|
Provides information about the latest domains that were throttled. |
| totalPagesCrawled |
|
Total pages crawled |
totalPagesNeeded |
|
Total number of pages needed |
totalPagesProcessed |
|
Total number of pages analyzed |
| totalSecondCpu |
|
Total CPU hours used |
| totalSecondLeft |
|
Total CPU hours left |
| totalGeneralExceptionsParse |
|
|
| totalSecurityExceptionsParse |
|
|
| totalGeneralExceptionsProcess |
|
|
| totalSecurityExceptionsProcess |
|
|
| totalExceptionsLoadCode |
|
Tell how many times load code errors took place. A load code error happens when a node we send work to is unable to load the 80app being used. For most apps this will be insignificant, but larger apps will have more problems. |
| startDate |
|
Calendar type. The start/datetime when the job was created.
|
| endDate |
|
Calendar type. This is the date when the job repetition should finish. If the frequencyType is "Does not repeat", then the endDate is equal to the startDate.
|
| jobErrorList |
|
Job Error List. Contains error message if the jobStatus = "ERROR" |
| latestRunResultsPosted |
|
Provides information of whether all the results are posted for the latest run. |
JobRun
Fields
|
Required |
Description
|
| id |
|
Identifier for the job run |
| dateCreated |
|
Date the job run was created after you submitted the job. |
| dateStarted (new) |
|
Date the job run actually started running. |
| dateCompleted |
|
Date the job run was completed. This is set when the job completes successfully. |
| dateEnded (new) |
|
Date the job stopped running (whether it was canceled, deleted, or completed successfully).
|
runStatus |
|
Provides run status of the job (JobRunStatusType enum type = In Queue, Cancelled, Running, Completed Error or Unknown. ) |
numberPagesCrawled |
|
Number of pages crawled during the job run. This continues to change until the job is completed.
|
numberPagesNeeded |
|
Number of pages needed during the job run. This continues to change until the job is completed. |
numberPagesProcessed |
|
Number of pages processed during the job run. This continues to change until the job is completed. |
numSecondCpu |
|
Number of seconds CPU |
numGeneralExceptionsParse |
|
This keeps track of the number of times the parseLinks() method threw a general Exception. |
numSecurityExceptionsParse |
|
This keeps track of the number of times the parseLinks() method threw a SecurityException. This is usually because the parseLinks() method attempted to do something that is not allowed, such as accessing the disk or making network requests. |
numGeneralExceptionsProcess |
|
This keeps track of the number of times the processDocument() method threw a general Exception. |
| numSecurityExceptionsProcess |
|
This keeps track of the number of times the processDocument() method threw a SecurityException. This is usually because the processDocument() method attempted to do something that is not allowed, such as accessing the disk or making network requests. |
| numExceptionsLoadCode |
|
Number of load code exceptions. A load code error happens when a node we send work to is unable to load the 80app being used. For most apps this will be insignificant, but larger apps will have more problems. |
| numConstructorTimeoutGood |
|
Number of good constructor timeouts |
| numConstructorTimeoutBad |
|
Number of bad constructor timeouts |
| numComputeTimeoutGood |
|
Number of good compute timeouts. This means your code was taking too long to run, but we were able to stop the process. |
| numComputeTimeoutBad |
|
Number of bad compute timeouts. This means your code was taking too long and we had to kill the JVM. |
| numSecondsLeft |
|
Number of seconds left |
| throttledDomains |
|
Domains that were throttled. (domain.com D0=04:51:46 information basically means "your jobs are currently throttled on domain.com at this depth level) |
| pluraVersionGuid |
|
PluraVersion GUID. This is needed for getting standard output files for jobs created for sandbox server. |
| runResults |
|
Arraylist of RunResults |
| contructorTimeouts |
|
Constructor timeout status. Possible values "OK" or "BAD" |
| computeTimeouts |
|
Compute timeout status. Possible values "OK" or "BAD" |
| allTimeouts |
|
Overall timeouts based on values of constructor and compute timeouts. |
| queueNumber |
|
The queue number is only relevant when the queue status is "In Queue" ( not yet started ). If the status is 1 (running), or 2 (completed), the queue number will be 0.
|
| donePostingResult |
|
It may take up to 10 minutes for all of the results to be posted. Previously, there was no way to determine whether all the results have been posted after a job has completed. If this field is set to 1, all the results have been posted for the job after it has completed. Exception: If a job has 0 pages crawled (i.e. if it's canceled before it starts), then the doneResultPosting flag will *never* be set (because there will not be a result file). |
RunResult
Fields
|
Required |
Description
|
| id |
|
Identifies the run result. |
| resultFileType |
|
The file can contain information about the URLs that were crawled or analyzed for the job. |
| chunkNumber |
|
The result file is divided into files which is sequentially marked by this chunk number. The limit of the file is set by max size of each result file under Result Setting. |
| resultFileName |
|
The actual result file name. |
| resultLocation |
|
The location where the result file can be found. |
| zippedFileSizeInBytes |
|
Lets user to see the file size before it is downloaded. |
| pluraVersionGuid |
|
This can be used to get std out information for jobs that were for the Sandbox Environment. |
| resultStatus |
|
This specifies whether the result file has been CREATED or DELETED. The result files will be deleted after a certain time based on the 80legs Storage Policy. (Included in the new release of API as of August 25th). |
| fileCreationTime |
|
The date/time the result file was zip and posted. |
| noOfTimesDownloaded |
|
This number is how many times the result has been downloaded. Each result can be download a maximum of 5 times. |
| crawlPackageId |
|
If the run result is for a crawl package job, this field has the crawl package id. |
| eightyAppVersionId |
|
If the run result is for a job that used an 80 app, this field keeps track of the 80app version. This is useful later for post processing the results. |
CodeFile
Fields
|
Required |
Description
|
| id |
|
Identifier for code file |
| userId |
|
Identifies the user who uploaded the code file. |
| name |
x |
Code name which is unique to the user |
| dateUploaded |
|
Date code was uploaded |
| sha1Value |
|
Hash key value of file |
| fileSizeInBytes |
|
File size in bytes |
| fileName |
x |
The name of the actual jar file that was uploaded. |
| codeStatus |
|
Provides the status of the code. Uses CodeStatusType = Awaiting Approval, Approved, Denied, Unknown. |
| codeErrors |
|
List of code errors that happened while running the code approval process. |
| maxNodeHeapSpaceMB |
|
The max node heap size required for the code to run. This is specified by the user during the code upload process.
|
DataFile
Fields
|
Required |
Description
|
| id |
|
Identifier for code file |
| userId |
|
Identifies the user who uploaded the data file. |
| name |
x |
Data name which is unique to the user |
| fileName |
x |
The name of the actual file that was uploaded |
| fileSizeInBytes |
|
File size in bytes |
| dateCreated |
|
The date the data file was uploaded |
| isDeleted |
|
Useful when the data file has been deleted. |
SeedlistFile
Fields
|
Required |
Description
|
| id |
|
Identifier for code file |
| userId |
|
Identifies the user who uploaded the seed list file. |
| name |
x |
Name which is unique to the user |
| fileName |
x |
The name of the actual file that was uploaded |
| fileSizeInBytes |
|
File size in bytes |
| dateCreated |
|
When the seed list file was uploaded |
| isDeleted |
|
Useful when the seed list file has been deleted. |
AccountBalance
Fields
|
Required |
Description
|
| currentBalance |
|
double Type. The current balance of the account. |
| availableBalance |
|
double Type. This is the current balance minus the amount reserved for jobs and minus the amount currently requested for withdrawal. When a job is created, an initial amount is reserved for the job to ensure that there is enough credit to allow that job to run to completion. This initial reserved amount is only an estimate. The actual charge for the job will be subtracted from the current balance after the job starts running. |
User
Fields
|
Required |
Description
|
| email |
|
The e-mail of the User. |
| firstName |
|
The first Name of the User. |
| lastName |
|
The last Name of the User. |
| address |
|
The address of the User. |
| city |
|
The city of the User. |
| state |
|
The state abbreviation. |
| postalCode |
|
The postal code in string format. |
| country |
|
The country of the User. |
| primaryNumber |
|
The primary number of the User that was provided. |
| secondaryNumber |
|
The secondary number of the User. |
EightyApp
Fields
|
Required |
Description
|
| name |
|
The name of the 80app.
|
| author |
|
The organization or the author who wrote the 80app.
|
| description |
|
The description or what the eighty app is intended for.
|
| instructions |
|
Instructions on how to use the 80app properly.
|
| data |
|
This is the data input that the author may provide that can be used if not user input date is required.
|
| requiresUserInputData |
|
If this is true, the user is required to provide a data input. If not, the author may have a data input file that can be used. |
| visibility |
|
Visibility enum. Only public eighty app is available through the app store. Useful for developers. If 80app is PRIVATE, it will not be available for purchase from the marketplace. It will only be available for the developer to use for testing purposes. |
| url |
|
The url that may have more information about the 80app.
|
| supportedStatus |
|
Information whether the 80app is supported by the author or is deprecated.
|
| releaseDate |
|
The date/time the 80app was released to the public. |
| earnings |
|
Provides information about how much the 80app has earned the developer (this information is not available through the api). |
| earningsByUsage |
|
Provides information about how much earnings have been accumulated for the developer through app usage. (this information is not available through the api). |
| earningsByPurchases |
|
Provides information about how much earning have been accumulated for the developer through app purchases. (this information is not available through the api). |
| latestVersion |
|
The latest version that is released. |
| includedInMyAppPack |
|
Indicates that the eighty app is part of an App pack that is available to the user as part of their SLA Plan. |
| initialOneTimePrice |
|
Initial purchase price of the 80app. Once purchased, the 80app will be available in the Analysis Settings > Analysis to Run > 80apps section of the Job Form. |
| pricePerUseFixed |
|
This is a fixed price that will be charged each time you create a job using the 80app. |
| pricePerUsePerMpa |
|
This is the price that will be charged based on the number of pages analyzed during a job run. |
| postProcessingCodeList |
|
List of PostProcessingCodeFile that can be used to post process the 80app. If there are more than one post processing code files, at least one has to be the default one. This list only contains information about the code and the id. |
| category |
|
Category the eighty app belongs to in the Marketplace. It can be Business, Real Estate, Retail & Shopping and Social Networks. |
Crawl Package
Fields
|
Required |
Description
|
| id |
|
Identifier for the crawl package
|
| userId |
|
This is not provided through the API.
|
| name |
|
The Name of the Crawl package.
|
| price |
|
The price for the crawl package. |
| SLAPrice |
|
The price for the crawl package if you are subscribed to a SLA Plan.
|
| subscriptionDate |
|
The date you subscribed to the crawl package.
|
| subscriptionDeletedDate |
|
The date your subscription was canceled. |
| category |
|
The category assigned to the crawl package.
|
| dateReleased |
|
The date/time the crawl package was released.
|
| updateFrequency |
|
How often the crawl package is updated. |
| averageUnzippedFilesizeInMB |
|
The size of each result file (unzipped). |
| averageZippedFilesizeInMB |
|
The zipped file size of each zip file. |
| summary |
|
This contains information such as the total result file size in MB, total number of files and the last time the crawl package was updated. |
Check out the FAQ for answers to a wide variety of questions.
Exception Handling
Exception handling in Java is done through a try-catch-finally block. The "try" block of code is where you put the code which may throw an exception. The "catch" block of code is where you put the code which will execute after an exception is thrown in the try block. This is often used to display an error message or to mitigate problems caused by the exception.
The API throws the following exceptions:
- EightyLegsFileNotFoundException - This exception is thrown when a file is not found.
- EightyLegsIOException - This exception is thrown when there is an error during file processing.
- EightyLegsParameterException - This exception is thrown when a parameter is of an incorrect type.
- EightyLegsXmlException - This exception is thrown when the system has a problem parsing the XML response from the internal web service.
- EightyLegsException - This exception is thrown when the API receives an exception from the internal web service. The API creates and throws this exception.
- EightyLegsConnectionException - This exception is thrown when the API cannot connect to the internal web service.
- EightyLegsAPIException - This exception is thrown when the API incurs an error and it does not belong to any of the others provided.
As of March 2010, all the above exceptions now have have a parent exception class: EightyLegsCommonException. Please get the latest version of the API.
Note: If you see an error on this page, please let us know by submitting a ticket through the portal. Thank you.
and treat "www.domain.com" and "domain.com" as different domains
80legs API
|
|
Tip: To turn text into a link, highlight the text, then click on a page or file from the list above.
|
|
|
|
|
Comments (0)
You don't have permission to comment on this page.