Table of Contents
- General
- I get an error message of exception.EightyLegsException: c_mime type: You must select at least one MIME type. I am clearly setting the MIME type in my code. What am I missing?
- I get an exception of EightyLegsException: Unauthorized User. What does this Exception mean?
- How do I get all the jobs regardless of the status?
- I get an exception of EightyLegsException: Incorrect string value: '\xEF\xBF\xBD' for column 'crawl_expression' at row 1. What does this Exception mean?
- I get an exception of EightyLegsException: Incorrect string value: '\xEF\xBF\xBD' for column 'analysis_expression' at row 1. What does this Exception mean?
- I get an exception of Result file has Expired. What does this Exception mean?
- If I set ignoreBadUrl is set to true for the uploadSeedlists method, how do I get the information on the bad urls that were removed.
- I try to provide the seedlist url using the crawlRequest.getSeedList().addAll(urls) statement rather than uploading a seedlist file, and I get an exception of com.eightylegs.customer.exception.EightyLegsAPIException: Error from API: Unknown server response status:HTTP/1.1 504 Gateway Time-out.
- What does this exception mean?
- I used (Java method: int uploadSeedList(File file, String seedListName, boolean ignoreBadURLs, String ignoreBadUrlMessage) and .NET method: uploadSeedlists(FileInfo file, String seedlistName, bool ignoreBadURLs) method and kept the ignoreBadURLs option to true. I get an exception of EightyLegs.Domain.Exception.EightyLegsException: file content: Your Seed List has been uploaded, but the following Seeds were removed due to formatting problems:[Line Number 1: url .... ]. What does this mean?
- What is the difference between an 80app and the API?
- In the AnalysisSetting class, I see methods setCodeId and setEightyAppVersionId. To use my own 80app, what should I use?
- When I try to upload a Large Seed list, I get java.lang.OutOfMemoryError: Java heap space Exception. How can I upload a large Seedlist.
- I keep on getting com.eightylegs.customer.exception.EightyLegsIOException: Error processing file: Server returned HTTP response code: 502 for URL error. What is the problem?
- How can I downloads the jobs that are available for a subscribed Crawl Package?
- How do I get access to the Crawl Packages that I have access to using the API?
- I get an Exception: com.eightylegs.customer.exception.EightyLegsException: eighty app version id: 80app Version id is not valid for". What does this Exception mean?
- Java
- How do I set the "Repeat Forever" flag during the creation of a job?
- .NET
- I get an exception of EightyLegs.Domain.Exception.EightyLegsAPIException: Error from API: Could not find any recognizable digits. What does this mean?
- Python
- Is there any option to suppress the JVM activity report that gets displayed?
- How do I resintall or uninstall a previous python api?
- What are all the possible values of jobRun.runStatus? I was expecting to get Queue, Cancelled etc, but instead I get a numeric value.
- How do I get the name of the FrequencyType from a jobRun or jobSetting?
|
General
I get an error message of exception.EightyLegsException: c_mime type: You must select at least one MIME type. I am clearly setting the MIME type in my code. What am I missing?
A MIME type needs to be specified for both Crawl Setting and Analysis Setting. The two MIME types are different. The crawl MIME type controls which pages are searched for links via parseLinks(). The analysis MIME type controls which pages go into processDocument().
I get an exception of EightyLegsException: Unauthorized User. What does this Exception mean?
This exception is thrown when the user is trying to retrieve a job, code, data or a seedlist that does not belong to them or if the said object does not exist.
How do I get all the jobs regardless of the status?
To get all the jobs, pass in a null, instead of a specific JobStatus. In the new version, you can use ALL in the JobStatus enum to get all the jobs.
I get an exception of EightyLegsException: Incorrect string value: 'xEFxBFxBD' for column 'crawl_expression' at row 1. What does this Exception mean?
This problem occurred, because you did not set the crawl regular expression. crawlRequest.setCrawlRegularExpression(""); This has been fixed in the latest version of the API.
I get an exception of EightyLegsException: Incorrect string value: 'xEFxBFxBD' for column 'analysis_expression' at row 1. What does this Exception mean?
This problem occurred, because you did not set the analysis regular expression. analysisReq.setAnalysisRegularExpression(""); This has been fixed in the latest version of the API.
I get an exception of Result file has Expired. What does this Exception mean?
80legs keeps result files for at least 7 days on our system. After that it gets deleted. Please check out our Storage Policy for Result Files.
If I set ignoreBadUrl is set to true for the uploadSeedlists method, how do I get the information on the bad urls that were removed.
This method now had an overloaded method that takes in a string. This parameter is called validationMessage. The information on the urls that were removed is assigned to this variable.
I try to provide the seedlist url using the crawlRequest.getSeedList().addAll(urls) statement rather than uploading a seedlist file, and I get an exception of com.eightylegs.customer.exception.EightyLegsAPIException: Error from API: Unknown server response status:HTTP/1.1 504 Gateway Time-out.
What does this exception mean?
If you don't have a fast connection, this can potentially take a few minutes. Additionally, since we do processing of each individual url and insert these urls in the database, the server takes a considerable amount of time. It seems like in this case, the client and server connection is timed out as the server does not respond in time. A solution to this problem is to upload the urls in a seedlist file and then associate the seedlist file to the job. This way, you can reuse the seedlist for future jobs too.
I used (Java method: int uploadSeedList(File file, String seedListName, boolean ignoreBadURLs, String ignoreBadUrlMessage) and .NET method: uploadSeedlists(FileInfo file, String seedlistName, bool ignoreBadURLs) method and kept the ignoreBadURLs option to true. I get an exception of EightyLegs.Domain.Exception.EightyLegsException: file content: Your Seed List has been uploaded, but the following Seeds were removed due to formatting problems:[Line Number 1: url .... ]. What does this mean?
An error was incorrectly being thrown. The contents of the exceptions should have been returned as a message. This problem has been fixed. An overloaded method has been added which requires another parameter for validation messages. The method signature is: UploadSeedList(FileInfo file, String seedListName, bool ignoreBadURLs, String ignoreBadUrlMessage). If there are any urls that are bad, the seedlist will be added and the message of the bad urls will be returned in the ignoreBadUrlMessage String. The fix for this was uploaded on Nov 25th, 2009.
What is the difference between an 80app and the API?
The 80app is for controlling how you process pages you crawl. It has to be written in Java. The API is for controlling how you submit crawl jobs, download results, etc. It can be written in Java, Python, or .NET.
In the AnalysisSetting class, I see methods setCodeId and setEightyAppVersionId. To use my own 80app, what should I use?
To use your own app, you would need to upload code and setCodeId. setEightyAppVersionId should be used when you want to use an 80app that is available from 80legs. For a prebuilt 80app, you can find the version id's by going to the Marketplace and clicking on Show Mine link next to the 80apps. 13 is the eighty app version Id for the Return Page Content (Text/HTML only) - Version 1.0 80 app.
When I try to upload a Large Seed list, I get java.lang.OutOfMemoryError: Java heap space Exception. How can I upload a large Seedlist.
If you are using uploadSeedList(File file, String seedListName, boolean ignoreBadURLs, StringBuilder ignoreBadUrlMessage) method where you are setting ignoreBadURLs= true, the problem could be that you are getting a huge list of urls that may be bad in the seedlist.
You can try to double your mean heap size. If this does not work, try using uploadSeedList(File file, String seedListName, boolean ignoreBadURLs). This will ignore the not send back the bad url message, but instead send the detail to you by e-mail.
I keep on getting com.eightylegs.customer.exception.EightyLegsIOException: Error processing file: Server returned HTTP response code: 502 for URL error. What is the problem?
Please make sure that you are using 1.0.7 or above version of Java API. You can find the version information in the Manifest.mf file of the jar. If you require a .NET API, please contact support @ 80legs.com.
How can I downloads the jobs that are available for a subscribed Crawl Package?
There are overloaded methods available where you have to provide the CrawlPackageId If the CrawlPackageId = -1, then the method downloads your jobs. However, if you provide the CrawlPackageId, the system checks if you are subscribed to the crawl package and gives you access to the jobs for the crawl package. The job settings for the jobs in the crawl package cannot be access.
How do I get access to the Crawl Packages that I have access to using the API?
The retrieveAvailableCrawlPackagesByUser() can be used to get all the information about the crawl packages that you may have access to.
I get an Exception: com.eightylegs.customer.exception.EightyLegsException: eighty app version id: 80app Version id is not valid for". What does this Exception mean?
You need to set the eighty app version id for the 80app that you are using: analysisReq.setEightyAppVersionId(13); // this is the id for the eighty app version for for the Return Page Content (Text/HTML only). You can find the version id's by going to the Marketplace and clicking on Show Mine link next to the 80apps. 13 is the eighty app version Id for the Return Page Content (Text/HTML only) - Version 1.0 80 app.
Java
How do I set the "Repeat Forever" flag during the creation of a job?
The method createJob(JobSetting job, boolean repeatForever) is available for this. If you have an older version of the api that does not have this method, set End Date using DateFormatter.convertUnixTimestampToCalendar(-1);
.NET
I get an exception of EightyLegs.Domain.Exception.EightyLegsAPIException: Error from API: Could not find any recognizable digits. What does this mean?
There is a problem of processing of chunked response. This issue has been fixed.
Python
Is there any option to suppress the JVM activity report that gets displayed?
We have not found a way to suppress the activity report. As for what the JVM activity report is for, startJVM method is called by python called before any other jpype features can be used. It will initialize the specified JVM.
How do I resintall or uninstall a previous python api?
All the files need to be removed manually. If you don't know the list of all files, you can reinstall it with the --record option, and take a look at the list this produces. Something like: python setup.py install --record
You will probably go to your python package directory and remove your .egg file, e.g.: In python 2.5(ubuntu): /usr/lib/python2.5/site-packages/ and in python 2.6(ubuntu): /usr/local/lib/python2.6/dist-packages/
What are all the possible values of jobRun.runStatus? I was expecting to get Queue, Cancelled etc, but instead I get a numeric value.
You should be able to get that information using:
domain.JobRunStatusType.whatis(jobRun.runStatus) where jobRun is the variable
How do I get the name of the FrequencyType from a jobRun or jobSetting?
The frequency enum has no integer value. In order to print out the name, you need to use jobSummary.frequencyType.name.
Comments (0)
You don't have permission to comment on this page.