• Add check in the C code for options that we hav an appropriate R & C data type for the option, based on the options type.
  • curlEscape the names of the parameters in getForm().
    See ~/Projects/CaseStudies/JobsWordMining/careerbuilder.R
    potential problem with httpheader in RCurl when only one element in the vector. name seems to disappear. curlPerform(url = "http://www.chemspider.com/MassSpecAPI.asmx", post = TRUE, postfields = body, verbose = TRUE, .opts = list(httpheader = c(SOAPAction = '"http://www.chemspider.com/GetExtendedCompoundInfo"', Accept = "text/xml"))) curlPerform(url = "http://www.chemspider.com/MassSpecAPI.asmx", post = TRUE, postfields = body, verbose = TRUE, .opts = list(httpheader = c(SOAPAction = '"http://www.chemspider.com/GetExtendedCompoundInfo"')))
  • In HTTP 300 replies, where we follow the location, we can end up with the wrong Content-Type. Does libcurl give us the new header?
    See tests/airports.R in RHTMLForms/
  • Rather than looking at Content-Type, we also have to look at Content-Encoding to see if it is gzip or something binary.
    u = getURLContent('http://www.omegahat.net/SVGAnnotation/shape.svgz', .opts = list(verbose = TRUE))
    
  • Avoid repeatedly setting the same curl options in getURLContent(), initializing the curl option, etc.
    Put a break point in the C code that sets the options and/or in the central R code.
  • getURLContent() and parsing the response WHEN THE REQUEST IS FTP!!
  • FTP upload.
    Done?
  • CURLOPT_QUOTE and linked list.
  • Allow the analysis/reading of the header in the response to adapt the Curl handle to change the reader.
    e.g. the compression example in Examples/Example.xml in the Books/XMLTechnologies.
  • Use raw() to deal with binary content.
  • Free the multi curl handle.
    curlDupHandle might be very problematic by not copying its data.
    If we don't put this under our memory management, then we won't free those elements. Thinking about this, does it actually happen? I think not. Need more thought.
    Bring in the URI support, etc. from httpClient.
    Passwords with HTTPS don't seem to be working for me at present.
  • Next

    Check all the isProtected computations in R are correct.
    When call curlPerform() from getURL(), we explicitly pass the curl object, so it is okay.

    We could add the .isProtected argument to indicate whether the caller wants to guarantee that the allocation is okay. Add in future releases.

    Support for different settings in the forms.
    e.g. files,
    Need structure for the data. Allow things like
    Look at libwww.
    Have a default CURL options that is used throughout.
    Put in a default value for .opts which is a call to, say, getDefaultCURLOptions()
    Compute the Option constants from the C code just once and store it.
    Allow CURLOPT_WRITEDATA/CURLOPT_WRITEFUNCTION to be specified as a connection.
    This way we can use R's writing facilities in the C code without having to go to the top-level code. Mmmm. Maybe we should go up to the R level for this using a function.
    Automate the generation of the enums.
    GccTranslationUnit.
    Recursive calls
    e.g. calls from within the callback to gather text to download other files.
    Look at the asychronous downloads. And also can do it directly if one nows the base URI
    Simplify the code for the options.
    Add a reader for CURLOPT_READFUNCTION
    For uploading files.
    Could get more sophisticated by having an expception class for CURL with the error code and message.

    Done

    Illustrate how to use an R connection with the write callback for gathering text.
    xmlTreeParse() with a connection that the XML parser can call to get more text.
    Done. See tests/xmlParse.xml
  • C routines as handlers for the functions.
  • i.e. for the writefunction, pass a C routine and deal with binary data coming down the pipe.
    Done. Could be done more elegantly.
    Tidy the package
    Make this a namespace and register the routines.
    Fill in the conversions for the opts.
    Do the coercion to the right type. Handle protected and not protected cases.

    We can tell what the target data type is from the number encoded in the option id. These jump into different ranges for the different types. So we can test for type, or coerce, in R. Unfortunately, only up to OBJECTPOINT

    Check releasing the form HTTPPOST list.
    Should be ok. Make certain we release the memory for the slist elements. For HTTPPOST, this is protected because we don't leave it in the CURL because we reset the HTTPPOST field.
    setCurlHeaders via regular options
    Check releasing the form HTTPHEADER list.
    Bits for initializing the global state.
    CIRL_GLOBAL_SSL, CIRL_GLOBAL_WIN32, NOTHING, ALL

    Check if R initializes this on Windows and accordingly turn it off here.

    Get the names of the curl error codes to put on the status.
    Already there. See asCurlErrorCode
    Memory management!!!
    Release the curl object if possible by knowing whether this is a local use of the data or whether it will persist.

    For persistence, we will have to collect data structures and know how to release them. We can collect them as linked lists and associate them with the CURL handle via a table or simple linked list.

    There doesn't seem to be a hook to tag something into the CURL handle.

    Callback for the password for an HTTPS connection.
    Doesn't appear to be used in libcurl anymore. Do we use the ssl context callback?
    PASSWDFUNCTION is deprecated. So it appears that the caller has to set it in the USERPWD.
    Can read header separately with a headerfunction option in the same way as writefunction.
    curl_easy_getinfo()
    Done
    Passwords. Find out why they aren't behaving - in https.
    getURL("https://secureweb.ucdavis.edu:443") getURL("https://my.ucdavis.edu") netrc file.

    Without https, they are working, either via CURLOPT_USERPWD or the .netrc file.

    Keep alives for connection.
    Should be done by default in libcurl? If not, just add it to the httpheader option.
    Check the error handling from curl.
    R_CURL_CHECK_ERROR.
    Allow access to setting header information.
    Done via the options now. httpheaders This combines the names and values if there are names and all the entries don't have a : in them.

    See setCurlHeaders(). Note that we can include this directly in the converter for an R object to a CURL option. We just need to tidy up after this.

    curl_slist_append(). curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers);

    HTTPS
    "https://sourceforge.net/" works.

    Not anymore.

    Version information
    Handle the features bitfield
    Automate this (unfortunately #defines)
    Redirects and relocations for URIs.
    Set the FOLLOWLOCATION by default.
    Finalizer on the curls.
    Make the curl perform function take options.
    Done.
    Make the converters know whether the objects are protected are not.
    Form handling
    Test on winnie:cgi-bin/form1.pl and
     postForm("http://www.speakeasy.org/~cgires/perl_form.cgi",
                "some_text" = "Duncan",
                "choice" = "Ho",
                "radbut" = "eep",
    #            "box" = "box1"
                "box" = "box1, box2"
              # and try c("box1", "box2")
              )
    

    Duncan Temple Lang <duncan@research.bell-labs.com>
    Last modified: Tue Dec 11 15:19:07 PST 2012