Title: | General Network (HTTP/FTP/...) Client Interface for R |
---|---|
Description: | A wrapper for 'libcurl' <https://curl.se/libcurl/> Provides functions to allow one to compose general HTTP requests and provides convenient functions to fetch URIs, get & post forms, etc. and process the results returned by the Web server. This provides a great deal of control over the HTTP/FTP/... connection and the form of the request while providing a higher-level interface than is available just using R socket connections. Additionally, the underlying implementation is robust and extensive, supporting FTP/FTPS/TFTP (uploads and downloads), SSL/HTTPS, telnet, dict, ldap, and also supports cookies, redirects, authentication, etc. |
Authors: | CRAN Team [ctb, cre] (de facto maintainer since 2013), Duncan Temple Lang [aut] |
Maintainer: | CRAN Team <[email protected]> |
License: | BSD_3_clause + file LICENSE |
Version: | 1.98-1.16 |
Built: | 2024-11-09 06:31:08 UTC |
Source: | CRAN |
These variables are symbolic constants that allow
use to specify different combinations of schemes
for HTTP authentication in a request to a Web server.
We can combine them via the |
operator to
indicate that libcurl should try them in order until one works.
AUTH_BASIC | AUTH_DIGEST
AUTH_BASIC | AUTH_DIGEST
These functions encode and decode strings using base64 representations.
base64
can be used as a single entry point with an argument to
encode or decode. The other two functions perform the specific action.
base64(txt, encode = !inherits(txt, "base64"), mode = "character")
base64(txt, encode = !inherits(txt, "base64"), mode = "character")
txt |
character string to encode or decode |
encode |
logical value indicating whether the desired action is to encode or decode the object.
If |
mode |
a character string which is either "raw" or "character".
This controls the type of vector that is returned.
If this is "raw", a raw vector is created. Otherwise, a character
vector of length 1 is returned and its element is the text version
of the original data given in |
This calls the routines in libcurl. These are not declared in the curl header files. So the support may need to be handled carefully on some platforms, e.g. Microsoft Windows.
If encode is TRUE
, a character vector
with a class named base64
.
If decode is TRUE
, a simple string.
This is currently not vectorized.
We might extend this to work with raw objects.
Duncan Temple Lang
libcurl - https://curl.se/ Wikipedia's explanation of base 64 encoding - https://en.wikipedia.org/wiki/Base64
# encode and then decode a simple string. txt = "Some simple text for base 64 to handle" x = base64(txt) base64(x) # encode to a raw vector x = base64("Simple text", TRUE, "raw") # decode to a character string. ans = base64Decode(x) ans == txt # decoded to a raw format. ans = base64Decode(x, "raw") # Binary data # f = paste(R.home(), "doc", "html", "logo.jpg", sep = .Platform$file.sep) f = system.file("examples", "logo.jpg", package = "RCurl") img = readBin(f, "raw", file.info(f)[1, "size"]) b64 = base64Encode(img, "raw") back = base64Decode(b64, "raw") identical(img, back) # alternatively, we can encode to a string and then decode back again # to raw and see that we preserve the date. enc = base64Encode(img, "character") dec = base64Decode(enc, "raw") identical(img, dec) # The following would be the sort of computation we could do if we # could have in-memory raw connections. # We would save() some objects to such an in-memory binary/raw connection # and then encode the resulting raw vector into a character vector. # Then we can insert that into a message, e.g. an email message or # an XML document and when we receive it in a different R session # we would get the string and reverse the encoding from the string to # a raw vector # In the absence of that in-memory connection facility in save(), # we can use a file. x = 1:10 # save two objects - a function and a vector f = paste(tempfile(), "rda", sep = ".") save(base64, x, file = f) # now read the results back from that file as a raw vector data = readBin(f, "raw", file.info(f)[1,"size"]) # base64 encode it txt = base64Encode(data, "character") if(require(XML)) { tt = xmlTree("r:data", namespaces = c(r = "http://www.r-project.org")) tt$addNode(newXMLTextNode(txt)) out = saveXML(tt) doc = xmlRoot(xmlTreeParse(out, asText = TRUE)) rda = base64Decode(xmlValue(doc), "raw") f = tempfile() writeBin(rda, f) e = new.env() load(f, e) objects(e) } # we'd like to be able to do # con = rawConnection(raw(), 'r+') # save(base64, x, file = con) # txt = base64Encode(rawConnectionValue(con), "character") # ... write and read xml stuff # val = xmlValue(doc) # rda = base64Decode(val, "raw") # e = new.env() # input = rawConnection(o, "r") # load(input, e)
# encode and then decode a simple string. txt = "Some simple text for base 64 to handle" x = base64(txt) base64(x) # encode to a raw vector x = base64("Simple text", TRUE, "raw") # decode to a character string. ans = base64Decode(x) ans == txt # decoded to a raw format. ans = base64Decode(x, "raw") # Binary data # f = paste(R.home(), "doc", "html", "logo.jpg", sep = .Platform$file.sep) f = system.file("examples", "logo.jpg", package = "RCurl") img = readBin(f, "raw", file.info(f)[1, "size"]) b64 = base64Encode(img, "raw") back = base64Decode(b64, "raw") identical(img, back) # alternatively, we can encode to a string and then decode back again # to raw and see that we preserve the date. enc = base64Encode(img, "character") dec = base64Decode(enc, "raw") identical(img, dec) # The following would be the sort of computation we could do if we # could have in-memory raw connections. # We would save() some objects to such an in-memory binary/raw connection # and then encode the resulting raw vector into a character vector. # Then we can insert that into a message, e.g. an email message or # an XML document and when we receive it in a different R session # we would get the string and reverse the encoding from the string to # a raw vector # In the absence of that in-memory connection facility in save(), # we can use a file. x = 1:10 # save two objects - a function and a vector f = paste(tempfile(), "rda", sep = ".") save(base64, x, file = f) # now read the results back from that file as a raw vector data = readBin(f, "raw", file.info(f)[1,"size"]) # base64 encode it txt = base64Encode(data, "character") if(require(XML)) { tt = xmlTree("r:data", namespaces = c(r = "http://www.r-project.org")) tt$addNode(newXMLTextNode(txt)) out = saveXML(tt) doc = xmlRoot(xmlTreeParse(out, asText = TRUE)) rda = base64Decode(xmlValue(doc), "raw") f = tempfile() writeBin(rda, f) e = new.env() load(f, e) objects(e) } # we'd like to be able to do # con = rawConnection(raw(), 'r+') # save(base64, x, file = con) # txt = base64Encode(rawConnectionValue(con), "character") # ... write and read xml stuff # val = xmlValue(doc) # rda = base64Decode(val, "raw") # e = new.env() # input = rawConnection(o, "r") # load(input, e)
These two functions are used to collect the contents of the header of
an HTTP response via the headerfunction
option of a curl handle
and then processing that text into both the name: value pairs
and also the initial line of the response that provides the
status of the request.
basicHeaderGatherer
is a simple special case of
basicTextGatherer
with the built-in post-processing
step done by parseHTTPHeader
.
basicHeaderGatherer(txt = character(), max = NA) parseHTTPHeader(lines, multi = TRUE)
basicHeaderGatherer(txt = character(), max = NA) parseHTTPHeader(lines, multi = TRUE)
txt |
any initial text that we want included with the header.
This is passed to |
max |
This is passed directly to
|
lines |
the text as a character vector from the response header
that
|
multi |
a logical value controlling whether we check for
multiple HTTP headers in the lines of text. This is caused
by a Continue being concatenated with the actual response.
When this is |
The return value is the same as basicTextGatherer
,
i.e. a list with
update
, value
and reset
function elements.
The value
element will invoke parseHTTPHeader
on the contents read during the processing of the libcurl request
and return that value.
Duncan Temple Lang
Curl homepage https://curl.se/
basicTextGatherer
curlPerform
curlSetOpt
if(url.exists("https://www.omegahat.net/RCurl/index.html")) withAutoprint({ h = basicHeaderGatherer() getURI("https://www.omegahat.net/RCurl/index.html", headerfunction = h$update) names(h$value()) h$value() })
if(url.exists("https://www.omegahat.net/RCurl/index.html")) withAutoprint({ h = basicHeaderGatherer() getURI("https://www.omegahat.net/RCurl/index.html", headerfunction = h$update) names(h$value()) h$value() })
These functions create callback functions that can be used to with the libcurl engine when it passes information to us when it is available as part of the HTTP response.
basicTextGatherer
is a generator function that returns a closure which is
used to cumulate text provided in callbacks from the libcurl
engine when it reads the response from an HTTP request.
debugGatherer
can be used with the debugfunction
libcurl option in a call and the associated update
function is called whenever libcurl has information
about the header, data and general messages about the
request.
These functions return a list of functions.
Each time one calls basicTextGatherer
or
debugGatherer
, one gets a new, separate
collection of functions. However, each
collection of functions (or instance) shares
the variables across the functions and across calls.
This allows them to store data persistently across
the calls without using a global variable.
In this way, we can have multiple instances of the collection
of functions, with each instance updating its own local state
and not interfering with those of the others.
We use an S3 class named RCurlCallbackFunction
to indicate
that the collection of funcions can be used as a callback.
The update
function is the one that is actually used
as the callback function in the CURL option.
The value
function can be invoked to get the current
state that has been accumulated by the
update
function. This is typically used
when the request is complete.
One can reuse the same collection of functions across
different requests. The information will be cumulated.
Sometimes it is convenient to reuse the object but
reset the state to its original empty value, as it had
been created afresh. The reset
function in the collection
permits this.
multiTextGatherer
is used when we are downloading multiple
URIs concurrently in a single libcurl operation. This merely
uses the tools of basicTextGatherer
applied to each of
several URIs. See getURIAsynchronous
.
basicTextGatherer(txt = character(), max = NA, value = NULL, .mapUnicode = TRUE) multiTextGatherer(uris, binary = rep(NA, length(uris))) debugGatherer()
basicTextGatherer(txt = character(), max = NA, value = NULL, .mapUnicode = TRUE) multiTextGatherer(uris, binary = rep(NA, length(uris))) debugGatherer()
txt |
an initial character vector to start things. We allow this to be specified so that one can initialize the content. |
max |
if specified as an integer this controls the total number of characters that will be read. If more are read, the function tells libcurl to stop! |
uris |
for |
value |
if specified, a function that is called when retrieving the text usually after the completion of the request and the processing of the response. This function can be used to convert the result into a different format, e.g. parse an XML document, read values from table in the text. |
.mapUnicode |
a logical value that controls whether the resulting text is processed to map components of the form \uxxxx to their appropriate Unicode representation. |
binary |
a logical vector that indicates which URIs yield binary content |
This is called when the libcurl engine finds sufficient
data on the stream from which it is reading the response.
It cumulates these bytes and hands them to a C routine in
this package which calls the actual gathering function (or a suitable
replacement) returned as the update
component from this function.
Both the basicTextGatherer
and debugGatherer
functions return an object of class
RCurlCallbackFunction
.
basicTextGatherer
extends this with the class
RCurlTextHandler
and
debugGatherer
extends this with the class
RCurlDebugHandler
.
Each of these has the same basic structure,
being a list of 3 functions.
update |
the function that is called with the text from the callback routine and which processes this text by accumulating it into a vector |
value |
a function that returns the text cumulated across the
callbacks. This takes an argument |
reset |
a function that resets the internal state to its original, empty value. This can be used to reuse the same object across requests but to avoid cumulating new input with the material from previous requests. |
multiTextGatherer
returns a list with an element corresponding
to each URI. Each element is an object obtained by calling
basicTextGatherer
, i.e. a collection of 3 functions with
shared state.
Duncan Temple Lang
Curl homepage https://curl.se/
if(url.exists("https://www.omegahat.net/RCurl/index.html")) withAutoprint({ txt = getURL("https://www.omegahat.net/RCurl/index.html", write = basicTextGatherer()) h = basicTextGatherer() txt = getURL("https://www.omegahat.net/RCurl/index.html", write = h$update) ## Cumulate across pages. txt = getURL("https://www.omegahat.net/index.html", write = h$update) headers = basicTextGatherer() txt = getURL("https://www.omegahat.net/RCurl/index.html", header = TRUE, headerfunction = headers$update) ## Now read the headers. cat(headers$value()) headers$reset() ## Debugging callback d = debugGatherer() x = getURL("https://www.omegahat.net/RCurl/index.html", debugfunction = d$update, verbose = TRUE) cat(names(d$value())) d$value()[["headerIn"]] ## This hung on Solaris ## 2022-02-08 philosophy.html is malformed UTF-8 uris = c("https://www.omegahat.net/RCurl/index.html", "https://www.omegahat.net/RCurl/philosophy.html") ## Not run: g = multiTextGatherer(uris) txt = getURIAsynchronous(uris, write = g) names(txt) # no names this way nchar(txt) # Now don't use names for the gatherer elements. g = multiTextGatherer(length(uris)) txt = getURIAsynchronous(uris, write = g) names(txt) nchar(txt) ## End(Not run) }) ## Not run: Sys.setlocale(,"en_US.latin1") Sys.setlocale(,"en_US.UTF-8") uris = c("https://www.omegahat.net/RCurl/index.html", "https://www.omegahat.net/RCurl/philosophy.html") g = multiTextGatherer(uris) txt = getURIAsynchronous(uris, write = g) ## End(Not run)
if(url.exists("https://www.omegahat.net/RCurl/index.html")) withAutoprint({ txt = getURL("https://www.omegahat.net/RCurl/index.html", write = basicTextGatherer()) h = basicTextGatherer() txt = getURL("https://www.omegahat.net/RCurl/index.html", write = h$update) ## Cumulate across pages. txt = getURL("https://www.omegahat.net/index.html", write = h$update) headers = basicTextGatherer() txt = getURL("https://www.omegahat.net/RCurl/index.html", header = TRUE, headerfunction = headers$update) ## Now read the headers. cat(headers$value()) headers$reset() ## Debugging callback d = debugGatherer() x = getURL("https://www.omegahat.net/RCurl/index.html", debugfunction = d$update, verbose = TRUE) cat(names(d$value())) d$value()[["headerIn"]] ## This hung on Solaris ## 2022-02-08 philosophy.html is malformed UTF-8 uris = c("https://www.omegahat.net/RCurl/index.html", "https://www.omegahat.net/RCurl/philosophy.html") ## Not run: g = multiTextGatherer(uris) txt = getURIAsynchronous(uris, write = g) names(txt) # no names this way nchar(txt) # Now don't use names for the gatherer elements. g = multiTextGatherer(length(uris)) txt = getURIAsynchronous(uris, write = g) names(txt) nchar(txt) ## End(Not run) }) ## Not run: Sys.setlocale(,"en_US.latin1") Sys.setlocale(,"en_US.UTF-8") uris = c("https://www.omegahat.net/RCurl/index.html", "https://www.omegahat.net/RCurl/philosophy.html") g = multiTextGatherer(uris) txt = getURIAsynchronous(uris, write = g) ## End(Not run)
This is the constructor function for creating an internal data
structure
that is used when reading binary data from an HTTP request
via RCurl. It is used with the native routine
R_curl_write_binary_data
for collecting
the response from the HTTP query into a buffer that stores
the bytes. The contents can then be brought back into R
as a raw
vector and then used in different ways,
e.g. uncompressed with the Rcompression
package,
or written to a file via writeBin
.
binaryBuffer(initialSize = 5000)
binaryBuffer(initialSize = 5000)
initialSize |
a number giving the size (number of bytes) to
allocate for the buffer. In most cases, the size won't make an
enormous difference. If this is small, the
|
An object of class RCurlBinaryBuffer
which is to be treated
as an opaque data for the most part. When passing this as the value of
the file
option, one will have to pass the ref slot.
After the contents have been read, one can convert this object to an R
raw vector using as(buf, "raw")
.
Duncan Temple Lang
Curl homepage https://curl.se/
R_curl_write_binary_data
if(url.exists("https://www.omegahat.net/RCurl/xmlParse.html.gz")) { buf = binaryBuffer() # Now fetch the binary file. getURI("https://www.omegahat.net/RCurl/xmlParse.html.gz", write = getNativeSymbolInfo("R_curl_write_binary_data")$address, file = buf@ref) # Convert the internal data structure into an R raw vector b = as(buf, "raw") if (getRversion() >= "4") txt = memDecompress(b, asChar = TRUE) ## or txt = Rcompression::gunzip(b) }
if(url.exists("https://www.omegahat.net/RCurl/xmlParse.html.gz")) { buf = binaryBuffer() # Now fetch the binary file. getURI("https://www.omegahat.net/RCurl/xmlParse.html.gz", write = getNativeSymbolInfo("R_curl_write_binary_data")$address, file = buf@ref) # Convert the internal data structure into an R raw vector b = as(buf, "raw") if (getRversion() >= "4") txt = memDecompress(b, asChar = TRUE) ## or txt = Rcompression::gunzip(b) }
This function and class allow us to work with C-level
FILE
handles.
The intent is to be able to pass these to libcurl
as options so that it can read or write from or to the file.
We can also do this with R connections and specify callback functions
that manipulate these connections. But using
the C-level FILE handle is likely to be significantly faster for large
files.
The close
method allows us to explicitly flush and close the file
from within R.
CFILE(filename, mode = "r")
CFILE(filename, mode = "r")
filename |
the name of the file on disk |
mode |
a string specifying how to open the file, read or write, text or binary. |
This is a simple interface to the C routine fopen
.
An object of class CFILE
which is
has a single slot name ref
which is an external pointer
holding the address of the FILE object in C.
Duncan Temple Lang
Man page for fopen
curlPerform
and the readdata
## Not run: filename = system.file("tests", "amazon3.R", package = "RCurl") f = CFILE(filename) if(url.exists('http://s3.amazonaws.com/')) curlPerform(url = "http://s3.amazonaws.com/RRupload/duncan2", upload = TRUE, readdata = f@ref, infilesize = file.info(filename)[1, "size"]) ## End(Not run)
## Not run: filename = system.file("tests", "amazon3.R", package = "RCurl") f = CFILE(filename) if(url.exists('http://s3.amazonaws.com/')) curlPerform(url = "http://s3.amazonaws.com/RRupload/duncan2", upload = TRUE, readdata = f@ref, infilesize = file.info(filename)[1, "size"]) ## End(Not run)
When one provides an R function to process the body of the R rep
chunkToLineReader(f, verbose = FALSE)
chunkToLineReader(f, verbose = FALSE)
f |
a function that is to be called each time
the |
verbose |
a logical value. If |
This constructs a closure and then processes each chunk as they are passed to the read function. It strips away any text that does not form a complete line at the end of the chunk and holds this to be added to the next chunk being processed.
A list with two components
read |
the function that will do the actual reading from the
HTTP response stream and call the function |
comp2 |
Description of 'comp2' |
...
Duncan Temple Lang
Curl homepage https://curl.se/
getURI
and the write
argument.
getForm
, postForm
curlPerform
# Read a rectangular table of data into R from the URL # and add up the values and the number of values read. summer = function() { total = 0.0 numValues = 0 list(read = function(txt) { con = textConnection(txt) on.exit(close(con)) els = scan(con) numValues <<- numValues + length(els) total <<- total + sum(els) "" }, result = function() c(total = total, numValues = numValues)) } s = summer() ## Not run: ## broken, 2022-07-29 if(url.exists("https://www.omegahat.net/RCurl/matrix.data")) getURL("https://www.omegahat.net/RCurl/matrix.data", write = chunkToLineReader(s$read)$read) ## End(Not run)
# Read a rectangular table of data into R from the URL # and add up the values and the number of values read. summer = function() { total = 0.0 numValues = 0 list(read = function(txt) { con = textConnection(txt) on.exit(close(con)) els = scan(con) numValues <<- numValues + length(els) total <<- total + sum(els) "" }, result = function() c(total = total, numValues = numValues)) } s = summer() ## Not run: ## broken, 2022-07-29 if(url.exists("https://www.omegahat.net/RCurl/matrix.data")) getURL("https://www.omegahat.net/RCurl/matrix.data", write = chunkToLineReader(s$read)$read) ## End(Not run)
This is a generic function and methods for making a copy of an object such as a curl handle, C-level pointer to a file, etc.
clone(x, ...)
clone(x, ...)
x |
the object to be cloned. |
... |
additional parameters for methods |
Typically, an object of the same class and “value”
as the input - x
.
Duncan Temple Lang
h = getCurlHandle(verbose = TRUE) other = dupCurlHandle(h) curlSetOpt(curl = h, verbose = FALSE)
h = getCurlHandle(verbose = TRUE) other = dupCurlHandle(h) curlSetOpt(curl = h, verbose = FALSE)
This is a generic function that is used within the RCurl package to force the completion of an HTTP request. If the request is asynchronous, this essentially blocks until the request is completed by repeatedly asking for more information to be retrieved from the HTTP connection.
complete(obj, ...)
complete(obj, ...)
obj |
the object which is to be completed. This is typically a
|
... |
additional arguments intended to be used by specific methods. |
The value is typically not of interest, but rather the side effect of processing the pending requests.
Duncan Temple Lang
https://curl.se/, specifically the multi interface of libcurl.
MultiCURLHandle-class
push
, pop
## Not run: # it does not exist if(url.exists("http://eeyore.ucdavis.edu/cgi-bin/testForm1.pl")) { f = system.file("NAMESPACE", package = "RCurl") postForm("http://eeyore.ucdavis.edu/cgi-bin/testForm1.pl", "fileData" = fileUpload(f)) postForm("http://eeyore.ucdavis.edu/cgi-bin/testForm1.pl", "fileData" = fileUpload("", paste(readLines(f), collapse = "\n"), "text/plain")) postForm("http://eeyore.ucdavis.edu/cgi-bin/testForm1.pl", "fileData" = fileUpload(f, paste(readLines(f), collapse = "\n") ), .opts = list(verbose = TRUE, header = TRUE)) } ## End(Not run)
## Not run: # it does not exist if(url.exists("http://eeyore.ucdavis.edu/cgi-bin/testForm1.pl")) { f = system.file("NAMESPACE", package = "RCurl") postForm("http://eeyore.ucdavis.edu/cgi-bin/testForm1.pl", "fileData" = fileUpload(f)) postForm("http://eeyore.ucdavis.edu/cgi-bin/testForm1.pl", "fileData" = fileUpload("", paste(readLines(f), collapse = "\n"), "text/plain")) postForm("http://eeyore.ucdavis.edu/cgi-bin/testForm1.pl", "fileData" = fileUpload(f, paste(readLines(f), collapse = "\n") ), .opts = list(verbose = TRUE, header = TRUE)) } ## End(Not run)
These are classes and coercion methods
for enumeration types in RCurl corresponding to symbolic
constants in libcurl.
The actual constants are not exported, but are defined within
the package. So we can use them with code such as
RCurl:::CURLINFO_DATA_IN
.
Duncan Temple Lang
This function is called to raise an error or warning that arises from a curl operation when making a request. This is called from C code that encounters the error and this function is responsible for generating the error.
curlError(type, msg, asError = TRUE)
curlError(type, msg, asError = TRUE)
type |
the type of the error or a status code identifying the
type of the error. Typically this is an integer value that
identifies the type of the Curl error. The value corresponds
to one of the enumerated value of type |
msg |
the error message, as a character vector of length 1 |
asError |
a logical value that indicates whether to raise an error or a warning |
This calls warning
or stop
with the relevant condition
object.
The object is always of basic (S3) class
GenericCurlError, error, condition
or
GenericCurlError, warning, condition
.
When the type
value corresponds to a
CURLCode
value, the condition has the primary class given by that
CURLCode
's name, e.g. COULDNT_RESOLVE_HOST
,
TOO_MANY_REDIRECTS
(with the CURLE prefix removed).
Duncan Temple Lang
libcurl documentation.
# This illustrates generating and catching an error. # We intentionally give a mis-spelled URL. tryCatch(curlPerform(url = "ftp.wcc.nrcs.usda.govx"), COULDNT_RESOLVE_HOST = function(x) cat("resolve problem\n"), error = function(x) cat(class(x), "got it\n"))
# This illustrates generating and catching an error. # We intentionally give a mis-spelled URL. tryCatch(curlPerform(url = "ftp.wcc.nrcs.usda.govx"), COULDNT_RESOLVE_HOST = function(x) cat("resolve problem\n"), error = function(x) cat(class(x), "got it\n"))
These functions convert between URLs that are
human-readable and those that have special characters
escaped. For example, to send a URL with a space,
we need to represent the space as %20
.
curlPercentEncode
uses a different format than the
curlEscape
function and this is needed for x-www-form-encoded POST submissions.
curlEscape(urls) curlUnescape(urls) curlPercentEncode(x, amp = TRUE, codes = PercentCodes, post.amp = FALSE)
curlEscape(urls) curlUnescape(urls) curlPercentEncode(x, amp = TRUE, codes = PercentCodes, post.amp = FALSE)
urls |
a character vector giving the strings to be escaped or unescaped. |
x |
the strings to be encoded via the percent-encoding method |
amp |
a logical value indicating whether to encode & characters. |
codes |
the named character vector giving the encoding map. The names are the characters we encode, the values are what we encode them as. |
post.amp |
a logical value controlling whether the resulting string is further processed to escape the percent (%) prefixes with the code for percent, i.e. %25. |
This calls curl_escape
or curl_unescape
in the libcurl library.
A character vector that has corresponding elements to the input with the characters escaped or not.
Duncan Temple Lang
Curl homepage https://curl.se/
Percent encoding explained in https://en.wikipedia.org/wiki/Percent-encoding
curlEscape("http://www.abc.com?x=a is a sentence&a b=and another") # Reverse it should get back original curlUnescape(curlEscape("http://www.abc.com?x=a is a sentence&a b=and another"))
curlEscape("http://www.abc.com?x=a is a sentence&a b=and another") # Reverse it should get back original curlUnescape(curlEscape("http://www.abc.com?x=a is a sentence&a b=and another"))
These are enums and bit fields defining constants used in libcurl and used in R to specify values symbolically.
CurlFeatureBits
CurlFeatureBits
named integer vectors. The names give the symbolic constants that can be used in R and C code. These are mapped to their integer equivalents and used in C-level computations.
libcurl (see https://curl.se/)
These functions provide a way to both start/initialize
and stop/uninitialize the libcurl engine.
There is no need to call
curlGlobalInit
as it is done implicitly the
first time one uses the libcurl facilities.
However, this function does permit one to explicitly
control how it is initialized.
Specifically, on Windows one might want to
avoid re-initializing the Win32 socket facilities
if the host application (e.g. R) has already done this.
curlGlobalInit
should only be called once per R session.
Subsequent calls will have no effect, or may confuse the libcurl engine.
One can reclaim the resources the libcurl engine is
consuming via the curlGlobalCleanup
function
when one no longer needs the libcurl facilities in
an R session.
curlGlobalInit(flags = c("ssl", "win32")) curlGlobalCleanup()
curlGlobalInit(flags = c("ssl", "win32")) curlGlobalCleanup()
flags |
flags indicating which features to activate.
These come from the |
curlGobalInit
returns a status code which should be 0.
curlGlobalCleanup
returns NULL
in all cases.
Duncan Temple Lang
Curl homepage https://curl.se/
# Activate only the SSL. curlGlobalInit("ssl") ## Not run: # Don't run these automatically as we should only call this function # once per R session # Everything, the default. curlGlobalInit() # Nothing. curlGlobalInit("none") curlGlobalInit(0) ## End(Not run)
# Activate only the SSL. curlGlobalInit("ssl") ## Not run: # Don't run these automatically as we should only call this function # once per R session # Everything, the default. curlGlobalInit() # Nothing. curlGlobalInit("none") curlGlobalInit(0) ## End(Not run)
This is the basic class used for performing HTTP requests
in the RCurl package.
In R, this is a reference to a C-level data structure
so we treat it as an opaque data type. However,
essentially we can think of this as an with
a set of options that persists across calls, i.e. HTTP requests.
The numerous options that one can set can be see via
getCurlOptionsConstants
.
The object can keep a connection to a Web server open for a period of time
across calls.
This class differs from MultiCURLHandle-class
as it
can handle only one request at a time and blocks until the request
is completed (normally or abnormally).
The other class can handle asynchronous, multiple connections.
A virtual Class: No objects may be created from it.
Class "oldClass"
, directly.
Duncan Temple Lang
https://curl.se/, the libcurl web site.
getURL
, getForm
, postForm
dupCurlHandle
,
getCurlHandle
,
MultiCURLHandle-class
These functions provide a constructor
and accessor methods
for the (currently S3) class CURLOptions
.
This class is a way to group and manage options settings
for CURL.
These functions manage a named list of options
where the names are elements of a fixed.
Not all elements need be set, but
these functions take care of expanding names
to match the fixed set, while allowing callers
to use abbreviated/partial names.
Names that do not match (via pmatch
)
will cause an error.
The set of possible names is given by
names(getCurlOptionsConstants())
or more directly with listCurlOptions()
.
mapCurlOptNames
handles the partial matching and
expansion of the names of the options for all the functions
that handle CURL options.
Currently this uses pmatch
to
perform the matching and so rejects words
that are ambiguous, i.e. have multiple matches
within the set of permissible option names.
As a result, "head" will match both
"header" and "headerfunction".
We may change this behavior in the future, but
we encourage using the full names for readability of code if nothing
else.
curlOptions(..., .opts = list()) getCurlOptionsConstants() ## S3 replacement method for class 'CURLOptions' x[i] <- value ## S3 replacement method for class 'CURLOptions' x[[i]] <- value listCurlOptions() getCurlOptionTypes(opts = getCurlOptionsConstants())
curlOptions(..., .opts = list()) getCurlOptionsConstants() ## S3 replacement method for class 'CURLOptions' x[i] <- value ## S3 replacement method for class 'CURLOptions' x[[i]] <- value listCurlOptions() getCurlOptionTypes(opts = getCurlOptionsConstants())
... |
name-value pairs identifying the settings for the options of interest. |
.opts |
a named list of options, typically a previously created
|
x |
a |
i |
the name(s) of the option elements being accessed. These can be partial names matching elements in the set of known options. Other names will cause an error. |
value |
the values to assign to the options identified via |
opts |
the options whose type description are of interest in the call. |
These functions use mapCurlOptNames
to match and hence expand the names the callers
provide.
curlOptions
returns an object
of class CURLOptions
which is simply
a named list.
getCurlConstants
returns a named vector identifying
the names of the possible options and their associated
values. These values are used in the C code and also each integer
encodes the type of the argument expected by the C code
for that option.
getCurlOptionTypes
returns human-readable,
heuristic descriptions of the types expected for the different options.
These are integer/logical corresponding to "long" in the RCurl documentation;
string/object pointer corresponding to "char *" or ;
function corresponding to a function/routine pointer;
large number corresponding to a curl_off_t
.
Duncan Temple Lang
Curl homepage https://curl.se/
tt = basicTextGatherer() myOpts = curlOptions(verbose = TRUE, header = TRUE, writefunc = tt[[1]]) # note that the names are expanded, e.g. writefunc is now writefunction. names(myOpts) myOpts[["header"]] myOpts[["header"]] <- FALSE # Using the abbreviation "hea" is an error as it matches # both # myOpts[["hea"]] <- FALSE # Remove the option from the list myOpts[["header"]] <- NULL
tt = basicTextGatherer() myOpts = curlOptions(verbose = TRUE, header = TRUE, writefunc = tt[[1]]) # note that the names are expanded, e.g. writefunc is now writefunction. names(myOpts) myOpts[["header"]] myOpts[["header"]] <- FALSE # Using the abbreviation "hea" is an error as it matches # both # myOpts[["hea"]] <- FALSE # Remove the option from the list myOpts[["header"]] <- NULL
These function causes the HTTP query, that has been specified
via the different options in this and other calls, to be sent and processed.
Unlike in curl itself,
for curlPerform
one can specify all the options
in this call as an atomic invocation.
This avoids having to set the options and then perform
the action. Instead, this is all done in one call.
For curlMultiPerform
, one must add the relevant
CURLHandle-class
objects to the
MultiCURLHandle-class
objects
before issuing the call to curlMultiPerform
.
curlPerform(..., .opts = list(), curl = getCurlHandle(), .encoding = integer()) curlMultiPerform(curl, multiple = TRUE)
curlPerform(..., .opts = list(), curl = getCurlHandle(), .encoding = integer()) curlMultiPerform(curl, multiple = TRUE)
curl |
for |
... |
a named list of curl options to set after the handle has been created. |
.opts |
a named list or |
multiple |
a logical value. If |
.encoding |
an integer or a string that explicitly identifies the
encoding of the content that is returned by the HTTP server in its
response to our query. The possible strings are
‘UTF-8’ or ‘ISO-8859-1’
and the integers should be specified symbolically
as Note that the encoding argument is not a regular libcurl option and
is handled specially by RCurl. But as a result, it is not unset
in subsequent uses of the curl handle ( |
A integer value indicating the status of the request. This should be 0 as other errors will generate errors.
Duncan Temple Lang
Curl homepage https://curl.se/
getURL
postForm
getForm
curlSetOpt
if(url.exists("https://www.omegahat.net/RCurl")) withAutoprint({ h = basicTextGatherer() curlPerform(url = "https://www.omegahat.net/RCurl", writefunction = h$update) # Now read the text that was cumulated during the query response. cat(h$value()) }) ## this no longer exists if(url.exists("http://services.soaplite.com/hibye.cgi")) withAutoprint({ # SOAP request body = '<?xml version="1.0" encoding="UTF-8"?>\ <SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" \ xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" \ xmlns:xsd="http://www.w3.org/1999/XMLSchema" \ xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" \ xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance">\ <SOAP-ENV:Body>\ <namesp1:hi xmlns:namesp1="http://www.soaplite.com/Demo"/>\ </SOAP-ENV:Body>\ </SOAP-ENV:Envelope>\n' h$reset() curlPerform(url = "http://services.soaplite.com/hibye.cgi", httpheader=c(Accept="text/xml", Accept="multipart/*", SOAPAction='"http://www.soaplite.com/Demo#hi"', 'Content-Type' = "text/xml; charset=utf-8"), postfields=body, writefunction = h$update, verbose = TRUE ) body = h$value() }) # Using a C routine as the reader of the body of the response. if(url.exists("https://www.omegahat.net/RCurl/index.html")) withAutoprint({ routine = getNativeSymbolInfo("R_internalWriteTest", PACKAGE = "RCurl")$address curlPerform(URL = "https://www.omegahat.net/RCurl/index.html", writefunction = routine) })
if(url.exists("https://www.omegahat.net/RCurl")) withAutoprint({ h = basicTextGatherer() curlPerform(url = "https://www.omegahat.net/RCurl", writefunction = h$update) # Now read the text that was cumulated during the query response. cat(h$value()) }) ## this no longer exists if(url.exists("http://services.soaplite.com/hibye.cgi")) withAutoprint({ # SOAP request body = '<?xml version="1.0" encoding="UTF-8"?>\ <SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" \ xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" \ xmlns:xsd="http://www.w3.org/1999/XMLSchema" \ xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" \ xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance">\ <SOAP-ENV:Body>\ <namesp1:hi xmlns:namesp1="http://www.soaplite.com/Demo"/>\ </SOAP-ENV:Body>\ </SOAP-ENV:Envelope>\n' h$reset() curlPerform(url = "http://services.soaplite.com/hibye.cgi", httpheader=c(Accept="text/xml", Accept="multipart/*", SOAPAction='"http://www.soaplite.com/Demo#hi"', 'Content-Type' = "text/xml; charset=utf-8"), postfields=body, writefunction = h$update, verbose = TRUE ) body = h$value() }) # Using a C routine as the reader of the body of the response. if(url.exists("https://www.omegahat.net/RCurl/index.html")) withAutoprint({ routine = getNativeSymbolInfo("R_internalWriteTest", PACKAGE = "RCurl")$address curlPerform(URL = "https://www.omegahat.net/RCurl/index.html", writefunction = routine) })
This function allows us to set values for the
possible options in the CURL data structure
that defines the HTTP request.
These options persist across calls in the
CURLHandle
object.
curlSetOpt(..., .opts = list(), curl = getCurlHandle(), .encoding = integer(), .forceHeaderNames = FALSE, .isProtected = FALSE)
curlSetOpt(..., .opts = list(), curl = getCurlHandle(), .encoding = integer(), .forceHeaderNames = FALSE, .isProtected = FALSE)
... |
a named list of curl options to set after the handle has been created. |
.opts |
a named list or |
curl |
the |
.encoding |
an integer or a string that explicitly identifies the
encoding of the content that is returned by the HTTP server in its
response to our query. The possible strings are
‘UTF-8’ or ‘ISO-8859-1’
and the integers should be specified symbolically
as |
.forceHeaderNames |
a logical value which if |
.isProtected |
a logical vector (or value that is repeated) specifying which
of the values in ... and |
An integer value giving the status of the return. This should be 0 as if there was an error in the libcurl mechiansim, we will throw it there.
Duncan Temple Lang
Curl homepage https://curl.se/
if(url.exists("https://www.omegahat.net")) { curl = getCurlHandle() # Note the header that extends across two lines with the second line # prefixed with white space. curlSetOpt( .opts = list(httpheader = c(Date = "Wed, 1/2/2000 10:01:01", foo="abc\n extra line"), verbose = TRUE), curl = curl) ans = getURL("https://www.omegahat.net", curl = curl) }
if(url.exists("https://www.omegahat.net")) { curl = getCurlHandle() # Note the header that extends across two lines with the second line # prefixed with white space. curlSetOpt( .opts = list(httpheader = c(Date = "Wed, 1/2/2000 10:01:01", foo="abc\n extra line"), verbose = TRUE), curl = curl) ans = getURL("https://www.omegahat.net", curl = curl) }
This function queries the Curl library to provide information about its characteristics when it was compiled. This tells the user about its capabilities and can be used to determine strategies.
curlVersion(id = 0)
curlVersion(id = 0)
id |
an integer value between 0 and 3 inclusive. The idea is that one specifies the identifier for the version of interest. In fact, all values seem to yield the same result. |
A list
age |
integer giving the number of this libcurl, 0 is FIRST, 1 is SECOND, 2 is THIRD |
version |
the version identifier as a string, e.g. |
version_num |
the value as an integer |
host |
the machine on which the libcurl was configured/built. |
features |
a named integer vector of bits indicating what features of libcurl were configured and built into this version. These are features such as ipv6, ssl, libz, largefile, ntlm (Microsoft "authorization"). |
ssl_version |
the string identifying the SSL version. |
ssl_version_num |
the number identifying the SSL version |
libz_version |
the string identifying the version of libz. |
protocols |
a character vector of the supported HTTP protocols, e.g. http, https, ftp, ldap, gopher, telnet |
ares |
name of the asynchronous DNS (domain name service) lookup library. This is often simply the empty string indicating it is not there. |
ares_num |
the number for the ares library |
libidn |
the name of the IDN (internationalized domain names)
library being used. This field only appears in version 3 of libcurl.
If you are using version 2 (e.g. curl-7.11.2), this will be
|
See the man page for curl_version_info
for a description of these fields.
features
in R is a named integer vector
detailing the different features.
Duncan Temple Lang
Curl homepage https://curl.se/
curl_version_info
in the libcurl documentation.
curlVersion()
curlVersion()
This function is used for the writefunction
option in a curl HTTP request.
The idea is that we read the header of the HTTP response
and when our code determines that the header is complete
(the presence of a blank line), it examines the contents
of the header and finds a Content-Type field.
It uses the value of this to determine the nature of the
body of the HTTP response and dynamically (re)sets the reader
for the curl handle appropriately. If the content is binary,
it collects the content into a raw
vector;
if it is text, it sets the appropriate character encoding
and collects the content into a character vector.
This function is like basicTextGatherer
but behaves dynamically by determining how to read the content
based on the header of the HTTP response.
This function returns a list of functions that are used
to update and query a shared state across calls.
dynCurlReader(curl = getCurlHandle(), txt = character(), max = NA, value = NULL, verbose = FALSE, binary = NA, baseURL = NA, isHTTP = NA, encoding = NA)
dynCurlReader(curl = getCurlHandle(), txt = character(), max = NA, value = NULL, verbose = FALSE, binary = NA, baseURL = NA, isHTTP = NA, encoding = NA)
curl |
the curl handle to be used for the request. It is
essential that this handle be used in the low-level call to
|
txt |
initial value of the text. This is almost always an empty character vector. |
max |
the maximum number of characters to read. This is almost
always |
value |
a function that can be specified which will be used to
convert the body of the response from text or raw in a customized
manner,
e.g. uncompress a gzip body. This can als be done explicitly
with a call |
verbose |
a logical value indicating whether messages about progress and operations are written on the console as the header and body are processed. |
binary |
a logical value indicating whether the caller knows
whether the resulting content is binary ( |
baseURL |
the URL of the request which can be used to follow links to other URLs that are described relative to this. |
isHTTP |
a logical value indicating whether the request/download uses
HTTP or not. If this is |
encoding |
a string that allows the caller to specify and override the encoding of the result. This is used to convert text returned from the server. |
A list with 5 elements all of which are functions. These are
update |
the function that does the actual reading/processing of the content that libcurl passes to it from the header and the body. This is the work-horse of the reader. |
value |
a function to get the body of the response |
header |
a function to get the content of the HTPP header |
reset |
a function to reset the internal contents which allows the same reader to be re-used in subsequent HTTP requests |
curl |
accessor function for the curl handle specified in the call to create this dynamic reader object. |
This list has the S3 class vector
c("DynamicRCurlTextHandler", "RCurlTextHandler", "RCurlCallbackFunction")
Duncan Temple Lang
libcurl https://curl.se/
basicTextGatherer
curlPerform
getURLContent
# Each of these examples can be done with getURLContent(). # These are here just to illustrate the dynamic reader. if(url.exists("https://www.omegahat.net/Rcartogram/demo.jpg")) withAutoprint({ header = dynCurlReader() curlPerform(url = "https://www.omegahat.net/Rcartogram/demo.jpg", headerfunction = header$update, curl = header$curl()) class( header$value() ) length( header$value() ) }) if(url.exists("https://www.omegahat.net/dd.gz")) withAutoprint({ # gzip example. header = dynCurlReader() curlPerform(url = "https://www.omegahat.net/dd.gz", headerfunction = header$update, curl = header$curl()) class( header$value() ) length( header$value() ) if (getRversion() >= "4") cat(memDecompress(header$value(), asChar = TRUE)) ## or cat(Rcompression::gunzip(header$value())) }) # Character encoding example ## Not run: header = dynCurlReader() curlPerform(url = "http://www.razorvine.net/test/utf8form/formaccepter.sn", postfields = c(text = "ABC", outputencoding = "UTF-8"), verbose = TRUE, writefunction = header$update, curl = header$curl()) class( header$value() ) Encoding( header$value() ) ## End(Not run)
# Each of these examples can be done with getURLContent(). # These are here just to illustrate the dynamic reader. if(url.exists("https://www.omegahat.net/Rcartogram/demo.jpg")) withAutoprint({ header = dynCurlReader() curlPerform(url = "https://www.omegahat.net/Rcartogram/demo.jpg", headerfunction = header$update, curl = header$curl()) class( header$value() ) length( header$value() ) }) if(url.exists("https://www.omegahat.net/dd.gz")) withAutoprint({ # gzip example. header = dynCurlReader() curlPerform(url = "https://www.omegahat.net/dd.gz", headerfunction = header$update, curl = header$curl()) class( header$value() ) length( header$value() ) if (getRversion() >= "4") cat(memDecompress(header$value(), asChar = TRUE)) ## or cat(Rcompression::gunzip(header$value())) }) # Character encoding example ## Not run: header = dynCurlReader() curlPerform(url = "http://www.razorvine.net/test/utf8form/formaccepter.sn", postfields = c(text = "ABC", outputencoding = "UTF-8"), verbose = TRUE, writefunction = header$update, curl = header$curl()) class( header$value() ) Encoding( header$value() ) ## End(Not run)
This function creates an object that describes all of the details
needed to include the contents of a file in the submission of an
HTTP request, typically a multi-part form submitted via
postForm
.
The idea is that we want to transfer the contents of a file or a
buffer of data within R that is not actually stored on the file
system but is resident in the R session. We want to be able to
specify either the name of the file and have RCurl read the contents
when they are needed, or alternatively specify the contents ourselves
if it makes sense that we already have the contents in R, e.g. that
they are dynamically generated. Additionally, we may need to specify
the type of data in the file/buffer via the Content-Type field for
this parameter in the request.
This function allows us to specify either the file name or contents
and optionally the content type.
This is used as an element in of the params
argument
postForm
and the native C code understands and processes
objects returned from this function.
fileUpload(filename = character(), contents = character(), contentType = character())
fileUpload(filename = character(), contents = character(), contentType = character())
filename |
the name of the file that RCurl is to pass in the form
submission/HTTP request. If this is specified and no value for
|
contents |
either a character vector or a |
contentType |
a character string (vector of length 1) giving the type of the content, e.g. text/plain, text/html, which helps the server receiving the data to interpret the contents. If omitted, this is omitted from the form submission and the recipient left to guess. |
An object of (S3) class FileUploadInfo
with fields
filename
, contents
and contentType
.
Duncan Temple Lang
This function is currently made available so it can be called from C code to find the charset from the HTTP header in the response from an HTTP server. It maps this charset value to one of the known R encodings (UTF-8 or LATIN1) or returns the native encoding.
This will most likely be removed in the future.
findHTTPHeaderEncoding(str)
findHTTPHeaderEncoding(str)
str |
one or more lines from the HTTP header |
NA
or an integer value indicating the encoding
to be used. This integer corresponds to the cetype_t
enumeration
in Rinternals.h.
Duncan Temple Lang
Writing R Extensions Manual and the section(s) on character encodings
findHTTPHeaderEncoding("Content-Type: text/html;charset=ISO-8859-1\r\n") findHTTPHeaderEncoding("Content-Type: text/html; charset=utf-8\r\n")
findHTTPHeaderEncoding("Content-Type: text/html;charset=ISO-8859-1\r\n") findHTTPHeaderEncoding("Content-Type: text/html; charset=utf-8\r\n")
This function is a relatively simple wrapper for curlPerform
which allows the caller to upload a file to an FTP server.
One can upload the contents of a file from the local file system or
the contents already in memory.
One specifies the FTP server and the fully-qualified file name and path where the contents are
to be stored.
One can specify the user login and password via the userpwd
option
for curlPerform
via the ... parameter, or
one can put this information directly in the target URL (i.e. to
)
in the form ftp://login:[email protected]/path/to/file
.
This function can handle binary or text content.
ftpUpload(what, to, asText = inherits(what, "AsIs") || is.raw(what), ..., curl = getCurlHandle())
ftpUpload(what, to, asText = inherits(what, "AsIs") || is.raw(what), ..., curl = getCurlHandle())
what |
the name of a local file or the contents to be uploaded. This can
can be text or binary content. This can also be an open connection.
If this value is |
to |
the URL to which the content is to be uploaded. This should be the ftp server
with the prefix |
asText |
a logical value indicating whether to treat the value of |
... |
additional arguments passed on to |
curl |
the curl handle to use for the |
The result of the curlPerform
call.
One can also provide additional FTP commands that are executed
before and after the upload as part of the request.
Use the prequote, quote, and postquote options in curlPerform
for these.
Duncan Temple Lang
FTP, libcurl
## Not run: ftpUpload(I("Some text to be uploaded into a file\nwith several lines"), "ftp://login:password@laptop17/ftp/zoe", ) ftpUpload(I("Some text to be uploaded into a file\nwith several lines"), "ftp://laptop17/ftp/zoe", userpwd = "login:password" ) ftpUpload(system.file("examples", "system.png", package = "RCurl"), "ftp://login:password@laptop17/ftp/Election.rda", postquote = c("CWD subdir", "RNFR Election.rda", "RNTO ElectionPolls.rda") ) ## End(Not run)
## Not run: ftpUpload(I("Some text to be uploaded into a file\nwith several lines"), "ftp://login:password@laptop17/ftp/zoe", ) ftpUpload(I("Some text to be uploaded into a file\nwith several lines"), "ftp://laptop17/ftp/zoe", userpwd = "login:password" ) ftpUpload(system.file("examples", "system.png", package = "RCurl"), "ftp://login:password@laptop17/ftp/Election.rda", postquote = c("CWD subdir", "RNFR Election.rda", "RNTO ElectionPolls.rda") ) ## End(Not run)
This function allows one to download binary content.
This is a convenience function that is a call to
getURL
with suitable values
for the write
and file
options
for the Curl handle. These take care of processing
the body of the response to the Curl request into a
vector of "raw" elements.
Binary content from POST forms or other requests that are not simple
URL requests can be implemented using the same approach as this
function, i.e., specifying the same values as in the body of this function for
write
and file
in the call to curlPerform
.
getBinaryURL(url, ..., .opts = list(), curl = getCurlHandle(), .buf = binaryBuffer(.len), .len = 5000)
getBinaryURL(url, ..., .opts = list(), curl = getCurlHandle(), .buf = binaryBuffer(.len), .len = 5000)
url |
the URL identifying the content to download.
This can be a regular URL or a
|
... |
additional arguments that are passed to |
.opts |
a list of named values that are passed to
|
curl |
an optional curl handle used in |
.buf |
a raw vector in which to insert the body of the response. This is a parameter to allow the caller to reuse an existing buffer. |
.len |
an non-negative integer which is used as the length for the buffer in which to store the binary data in the response. The buffer is extended if it is not big enough but this allows the caller to provide context specific knowledge about the length of the response, e.g. the size of the file being downloaded, and avoid expanding the buffer as the material is being processed. |
A "raw" vector.
Duncan Temple Lang
u = "https://www.omegahat.net/RCurl/data.gz" if(url.exists(u)) withAutoprint({ content = getBinaryURL(u) if (getRversion() >= "4") withAutoprint({ x <- memDecompress(content, asChar = TRUE) read.csv(textConnection(x)) }) else withAutoprint({ tmp = tempfile() writeBin(content, con = tmp) read.csv(gzfile(tmp)) unlink(tmp) }) # Working from the Content-Type in the header of the HTTP response. h = basicTextGatherer() content = getBinaryURL(u, .opts = list(headerfunction = h$update)) header = parseHTTPHeader(h$value()) type = strsplit(header["Content-Type"], "/")[[1]] if(type[2] %in% c("x-gzip", "gzip")) { if (getRversion() >= "4") { cat(memDecompress(content, asChar = TRUE)) } else { tmp = tempfile() writeBin(content, con = tmp) writeLines(readLines(gzfile(tmp))) unlink(tmp) } } })
u = "https://www.omegahat.net/RCurl/data.gz" if(url.exists(u)) withAutoprint({ content = getBinaryURL(u) if (getRversion() >= "4") withAutoprint({ x <- memDecompress(content, asChar = TRUE) read.csv(textConnection(x)) }) else withAutoprint({ tmp = tempfile() writeBin(content, con = tmp) read.csv(gzfile(tmp)) unlink(tmp) }) # Working from the Content-Type in the header of the HTTP response. h = basicTextGatherer() content = getBinaryURL(u, .opts = list(headerfunction = h$update)) header = parseHTTPHeader(h$value()) type = strsplit(header["Content-Type"], "/")[[1]] if(type[2] %in% c("x-gzip", "gzip")) { if (getRversion() >= "4") { cat(memDecompress(content, asChar = TRUE)) } else { tmp = tempfile() writeBin(content, con = tmp) writeLines(readLines(gzfile(tmp))) unlink(tmp) } } })
The getBitIndicators
function decompose a value into its respective
bit components.
The setBitIndicators
combines individual components into a
single number
to "set" a bit field value.
getBitIndicators(val, defs) setBitIndicators(vals, defs)
getBitIndicators(val, defs) setBitIndicators(vals, defs)
val |
the value to break up into the bit field components. |
defs |
the named integer vector that defines the bit field elements. |
vals |
the individual components that are to be combined into a single integer value representing the collection of components. These can be given as names or integer values that correspond to the elements in the defs, either by name or value. |
getBitIndicators
returns a named integer vector representing
the components of the bit field in the value.
The names of the vector give the symbolic elements that were set in
the value.
setBitIndicators
returns a single integer value representing
the value from combining the different components (e.g. ORing the bits
of the different values).
Duncan Temple Lang
Curl homepage https://curl.se/
The features field in curlVersion
.
getBitIndicators(7, c(A = 1, B = 2, C = 4)) getBitIndicators(3, c(A = 1, B = 2, C = 4)) getBitIndicators(5, c(A = 1, B = 2, C = 4))
getBitIndicators(7, c(A = 1, B = 2, C = 4)) getBitIndicators(3, c(A = 1, B = 2, C = 4)) getBitIndicators(5, c(A = 1, B = 2, C = 4))
This function returns the names of all of the
error classes that curl can raise as a result
of a request. You can use these names
in calls to tryCatch
to identify the class of the error for which you
want to provide an error handler.
getCurlErrorClassNames()
getCurlErrorClassNames()
A character vector
Duncan Temple Lang
libcurl documentation
tryCatch
curlPerform
and higher-level functions
for making requests.
These functions create and duplicate curl handles for use in calls to the HTTP facilities provided by that low-level language and this R-level interface. A curl handle is an opaque data type that contains a reference to the internal C-level data structure of libcurl for performing HTTP requests.
The getCurlMutliHandle
returns an object
that can be used for concurrent, multiple requests.
It is quite different from the regular curl handle
and again, should be treated as an opaque data type.
getCurlHandle(..., .opts = NULL, .encoding = integer(), .defaults = getOption("RCurlOptions")) dupCurlHandle(curl, ..., .opts = NULL, .encoding = integer()) getCurlMultiHandle(..., .handles = list(...))
getCurlHandle(..., .opts = NULL, .encoding = integer(), .defaults = getOption("RCurlOptions")) dupCurlHandle(curl, ..., .opts = NULL, .encoding = integer()) getCurlMultiHandle(..., .handles = list(...))
curl |
the existing curl handle that is to be duplicated |
... |
a named list of curl options to set after the handle has
been created. For |
.opts |
a named list or |
.encoding |
an integer or a string that explicitly identifies the
encoding of the content that is returned by the HTTP server in its
response to our query. The possible strings are
‘UTF-8’ or ‘ISO-8859-1’
and the integers should be specified symbolically
as |
.defaults |
a collection of default values taken from R's global/session options. This is a parameter so that one can override it if necessary. |
.handles |
a list of curl handle objects that are used as the individual request handles within the multi-asynchronous requests |
These functions create C-level data structures.
An object of class CURLHandle
which is simply a pointer to the memory for the C
structure.
Duncan Temple Lang
Curl homepage https://curl.se/
options(RCurlOptions = list(verbose = TRUE, followlocation = TRUE, autoreferer = TRUE, nosignal = TRUE)) if(url.exists("https://www.omegahat.net/RCurl")) { x = getURL("https://www.omegahat.net/RCurl") # here we override one of these. x = getURL("https://www.omegahat.net/RCurl", verbose = FALSE) }
options(RCurlOptions = list(verbose = TRUE, followlocation = TRUE, autoreferer = TRUE, nosignal = TRUE)) if(url.exists("https://www.omegahat.net/RCurl")) { x = getURL("https://www.omegahat.net/RCurl") # here we override one of these. x = getURL("https://www.omegahat.net/RCurl", verbose = FALSE) }
This function provides access to data about a previously
executed CURL request that is accessible via a
CURLHandle
object.
This means, of course, that one must have access to the CURLHandle
object.
The information one can get includes items such as the name of the
file (potentially containing redirects), download time,
See getCurlInfoConstants
for the names of the possible fields.
getCurlInfo(curl, which = getCurlInfoConstants()) getCurlInfoConstants()
getCurlInfo(curl, which = getCurlInfoConstants()) getCurlInfoConstants()
curl |
the |
which |
identifiers for the elements of interest.
These can be specified by integer value or by name.
The names are matched against those in the
|
.
This is an interface to the get_curl_info
routine in
the libcurl package.
A named list whose elements correspond to the requested fields. The names are expanded to match the names of these fields and the values are either strings or integer values.
Duncan Temple Lang
Curl homepage https://curl.se/
curlPerform
getURL
getCurlHandle
if(url.exists("https://www.omegahat.net/RCurl/index.html")) withAutoprint({ curl = getCurlHandle() txt = getURL("https://www.omegahat.net/RCurl/index.html", curl = curl) getCurlInfo(curl) rm(curl) # release the curl! })
if(url.exists("https://www.omegahat.net/RCurl/index.html")) withAutoprint({ curl = getCurlHandle() txt = getURL("https://www.omegahat.net/RCurl/index.html", curl = curl) getCurlInfo(curl) rm(curl) # release the curl! })
This function facilitates getting the parameter names and values from a URL that is an parameterized HTML query.
This is motivated by a function from Chris Davis and Delft University.
getFormParams(query, isURL = grepl("^(http|\\?)", query))
getFormParams(query, isURL = grepl("^(http|\\?)", query))
query |
the query string or full URL containing the query |
isURL |
a logical value. If |
A named character vector giving the parameter values The names are the parameter names.
Duncan Temple Lang
if(url.exists("https://www.omegahat.net/foo/bob.R")) withAutoPrint({ getFormParams("https://www.omegahat.net/foo/bob.R?xyz=1&abc=verylong") getFormParams("xyz=1&abc=verylong") getFormParams("xyz=1&abc=&on=true") getFormParams("xyz=1&abc=") })
if(url.exists("https://www.omegahat.net/foo/bob.R")) withAutoPrint({ getFormParams("https://www.omegahat.net/foo/bob.R?xyz=1&abc=verylong") getFormParams("xyz=1&abc=verylong") getFormParams("xyz=1&abc=&on=true") getFormParams("xyz=1&abc=") })
This function allows the caller to specify multiple URIs to download at the same time. All the requests are submitted and then the replies are processed as data becomes available on each connection. In this way, the responses are processed in an inter-leaved fashion, with a chunk from one response from one request being processed and then followed by a chunk from a different request.
Downloading documents asynchronously involves some trade-offs. The switching between different streams, detecting when input is available on any of them involves a little more processing and so increases the consumption of CPU cycles. On the other hand, there is a potentially large saving of time when one considers total time to download. See https://www.omegahat.net/RCurl/concurrent.xml for more details. This is a common trade-off that arises in concurrent/parallel/asynchronous computing.
getURI
calls this function if more than one
URI is specified and async
is TRUE
, the default in this case.
One can also download the (contents of the) multiple URIs
serially, i.e. one after the other using getURI
with a value of FALSE
for async
.
getURIAsynchronous(url, ..., .opts = list(), write = NULL, curl = getCurlHandle(), multiHandle = getCurlMultiHandle(), perform = Inf, .encoding = integer(), binary = rep(NA, length(url)))
getURIAsynchronous(url, ..., .opts = list(), write = NULL, curl = getCurlHandle(), multiHandle = getCurlMultiHandle(), perform = Inf, .encoding = integer(), binary = rep(NA, length(url)))
url |
a character vector identifying the URIs to download. |
... |
named arguments to be passed to |
.opts |
a named list or |
write |
an object giving the functions or routines that are to be called when input is waiting on the different HTTP response streams. By default, a separate callback function is associated with each input stream. This is necessary for the results to be meaningful as if we use a single reader, it will be called for all streams in a haphazard order and the content interleaved. One can do interesting things however using a single object. |
curl |
the prototypical curlHandle that is duplicated and used in in |
multiHandle |
this is a curl handle for performing asynchronous requests. |
perform |
a number which specifies the maximum number of calls to
|
.encoding |
an integer or a string that explicitly identifies the
encoding of the content that is returned by the HTTP server in its
response to our query. The possible strings are
‘UTF-8’ or ‘ISO-8859-1’
and the integers should be specified symbolically
as |
binary |
a logical vector identifying whether each URI has binary content or simple text. |
This uses curlMultiPerform
and the multi/asynchronous interface for libcurl.
The return value depends on the run-time characteristics of the call. If the call merely specifies the URIs to be downloaded, the result is a named character vector. The names identify the URIs and the elements of the vector are the contents of the corresponding URI.
If the requests are not performed or completed
(i.e. perform
is zero or too small a value to process all the chunks)
a list with 2 elements is returned.
These elements are:
multiHandle |
the curl multi-handle, of class
|
write |
the |
Duncan Temple Lang <[email protected]>
Curl homepage https://curl.se/
getURL
getCurlMultiHandle
curlMultiPerform
uris = c("https://www.omegahat.net/RCurl/index.html", "https://www.omegahat.net/RCurl/philosophy.xml") txt = getURIAsynchronous(uris) names(txt) nchar(txt)
uris = c("https://www.omegahat.net/RCurl/index.html", "https://www.omegahat.net/RCurl/philosophy.xml") txt = getURIAsynchronous(uris) names(txt) nchar(txt)
These functions download one or more URIs (a.k.a. URLs). It uses libcurl under the hood to perform the request and retrieve the response. There are a myriad of options that can be specified using the ... mechanism to control the creation and submission of the request and the processing of the response.
getURLContent
has been added as a high-level function
like getURL
and getBinaryURL
but which
determines the type of the content being downloaded
by looking at the resulting HTTP header's Content-Type
field. It uses this to determine whether the bytes
are binary or "text".
The request supports any of the facilities within the
version of libcurl that was installed.
One can examine these via curlVersion
.
getURLContent
doesn't perform asynchronous or multiple
concurrent requests at present.
getURL(url, ..., .opts = list(), write = basicTextGatherer(.mapUnicode = .mapUnicode), curl = getCurlHandle(), async = length(url) > 1, .encoding = integer(), .mapUnicode = TRUE) getURI(url, ..., .opts = list(), write = basicTextGatherer(.mapUnicode = .mapUnicode), curl = getCurlHandle(), async = length(url) > 1, .encoding = integer(), .mapUnicode = TRUE) getURLContent(url, ..., curl = getCurlHandle(.opts = .opts), .encoding = NA, binary = NA, .opts = list(...), header = dynCurlReader(curl, binary = binary, baseURL = url, isHTTP = isHTTP, encoding = .encoding), isHTTP = length(grep('^[[:space:]]*http', url)) > 0)
getURL(url, ..., .opts = list(), write = basicTextGatherer(.mapUnicode = .mapUnicode), curl = getCurlHandle(), async = length(url) > 1, .encoding = integer(), .mapUnicode = TRUE) getURI(url, ..., .opts = list(), write = basicTextGatherer(.mapUnicode = .mapUnicode), curl = getCurlHandle(), async = length(url) > 1, .encoding = integer(), .mapUnicode = TRUE) getURLContent(url, ..., curl = getCurlHandle(.opts = .opts), .encoding = NA, binary = NA, .opts = list(...), header = dynCurlReader(curl, binary = binary, baseURL = url, isHTTP = isHTTP, encoding = .encoding), isHTTP = length(grep('^[[:space:]]*http', url)) > 0)
url |
a string giving the URI |
... |
named values that are interpreted as CURL options governing the HTTP request. |
.opts |
a named list or |
write |
if explicitly supplied, this is a function that is called with a single argument each time the the HTTP response handler has gathered sufficient text. The argument to the function is a single string. The default argument provides both a function for cumulating this text and is then used to retrieve it as the return value for this function. |
curl |
the previously initialized CURL context/handle which can be used for multiple requests. |
async |
a logical value that determines whether the download
request should be done via asynchronous, concurrent downloading or a serial
download. This really only arises when we are trying to download
multiple URIs in a single call. There are trade-offs between
concurrent and serial downloads, essentially trading CPU cycles
for shorter elapsed times. Concurrent downloads reduce the overall
time waiting for |
.encoding |
an integer or a string that explicitly identifies the
encoding of the content that is returned by the HTTP server in its
response to our query. The possible strings are
‘UTF-8’ or ‘ISO-8859-1’
and the integers should be specified symbolically
as |
.mapUnicode |
a logical value that controls whether the resulting text is processed to map components of the form \uxxxx to their appropriate Unicode representation. |
binary |
a logical value indicating whether the caller knows
whether the resulting content is binary ( |
header |
this is made available as a parameter of the function
to allow callers to construct different readers for processing
the header and body of the (HTTP) response.
Callers specifying this will typically only adjust the
call to The caller can specify a value of |
isHTTP |
a logical value that indicates whether the request an HTTP request. This is used when determining how to process the response. |
If no value is supplied for write
,
the result is the text that is the HTTP response.
(HTTP header information is included if the header option for CURL is
set to TRUE
and no handler for headerfunction is supplied in
the CURL options.)
Alternatively, if a value is supplied for the write
parameter,
this is returned. This allows the caller to create a handler within
the call and get it back. This avoids having to explicitly create
and assign it and then call getURL
and then access the result.
Instead, the 3 steps can be inlined in a single call.
Duncan Temple Lang
Curl homepage https://curl.se/
getBinaryURL
curlPerform
curlOptions
omegahatExists = url.exists("https://www.omegahat.net") # Regular HTTP if(omegahatExists && requireNamespace("XML", quietly = TRUE)) withAutoprint({ txt = getURL("https://www.omegahat.net/RCurl/") ## Then we could parse the result. XML::htmlTreeParse(txt, asText = TRUE) }) # HTTPS. First check to see that we have support compiled into # libcurl for ssl. if(interactive() && ("ssl" %in% names(curlVersion()$features)) && url.exists("https://sourceforge.net/")) { txt = tryCatch(getURL("https://sourceforge.net/"), error = function(e) { getURL("https://sourceforge.net/", ssl.verifypeer = FALSE) }) } # Create a CURL handle that we will reuse. if(interactive() && omegahatExists) { curl = getCurlHandle() pages = list() for(u in c("https://www.omegahat.net/RCurl/index.html", "https://www.omegahat.net/RGtk/index.html")) { pages[[u]] = getURL(u, curl = curl) } } # Set additional fields in the header of the HTTP request. # verbose option allows us to see that they were included. if(omegahatExists) getURL("https://www.omegahat.net", httpheader = c(Accept = "text/html", MyField = "Duncan"), verbose = TRUE) # Arrange to read the header of the response from the HTTP server as # a separate "stream". Then we can break it into name-value # pairs. (The first line is the HTTP/1.1 200 Ok or 301 Moved Permanently # status line) if(omegahatExists) withAutoprint({ h = basicTextGatherer() txt = getURL("https://www.omegahat.net/RCurl/index.html", header= TRUE, headerfunction = h$update, httpheader = c(Accept="text/html", Test=1), verbose = TRUE) print(paste(h$value(NULL)[-1], collapse="")) con <- textConnection(paste(h$value(NULL)[-1], collapse="")) read.dcf(con) close(con) }) # Test the passwords. if(omegahatExists) withAutoprint({ x = getURL("https://www.omegahat.net/RCurl/testPassword/index.html", userpwd = "bob:duncantl") # Catch an error because no authorization # We catch the generic HTTPError, but we could catch the more specific "Unauthorized" error # type. x = tryCatch(getURLContent("https://www.omegahat.net/RCurl/testPassword/index.html"), HTTPError = function(e) { cat("HTTP error: ", e$message, "\n") }) }) ## Not run: # Needs specific information from the cookie file on a per user basis # with a registration to the NY times. x = getURL("https://www.nytimes.com", header = TRUE, verbose = TRUE, cookiefile = "/home/duncan/Rcookies", netrc = TRUE, maxredirs = as.integer(20), netrc.file = "/home2/duncan/.netrc1", followlocation = TRUE) ## End(Not run) if(interactive() && omegahatExists) { d = debugGatherer() x = getURL("https://www.omegahat.net", debugfunction = d$update, verbose = TRUE) d$value() } ############################################# # Using an option set in R if(interactive() && omegahatExists) { opts = curlOptions(header = TRUE, userpwd = "bob:duncantl", netrc = TRUE) getURL("https://www.omegahat.net/RCurl/testPassword/index.html", verbose = TRUE, .opts = opts) # Using options in the CURL handle. h = getCurlHandle(header = TRUE, userpwd = "bob:duncantl", netrc = TRUE) getURL("https://www.omegahat.net/RCurl/testPassword/index.html", verbose = TRUE, curl = h) } # Use a C routine as the reader. Currently gives a warning. if(interactive() && omegahatExists) { routine = getNativeSymbolInfo("R_internalWriteTest", PACKAGE = "RCurl")$address getURL("https://www.omegahat.net/RCurl/index.html", writefunction = routine) } # Example if(interactive() && omegahatExists) { uris = c("https://www.omegahat.net/RCurl/index.html", "https://www.omegahat.net/RCurl/philosophy.xml") txt = getURI(uris) names(txt) nchar(txt) txt = getURI(uris, async = FALSE) names(txt) nchar(txt) routine = getNativeSymbolInfo("R_internalWriteTest", PACKAGE = "RCurl")$address txt = getURI(uris, write = routine, async = FALSE) names(txt) nchar(txt) # getURLContent() for text and binary x = getURLContent("https://www.omegahat.net/RCurl/index.html") class(x) x = getURLContent("https://www.omegahat.net/RCurl/data.gz") class(x) attr(x, "Content-Type") x = getURLContent("https://www.omegahat.net/Rcartogram/demo.jpg") class(x) attr(x, "Content-Type") curl = getCurlHandle() dd = getURLContent("https://www.omegahat.net/RJSONIO/RJSONIO.pdf", curl = curl, header = dynCurlReader(curl, binary = TRUE, value = function(x) { print(attributes(x)) x})) } # FTP # Download the files within a directory. if(interactive() && url.exists('ftp://ftp.wcc.nrcs.usda.gov')) { url = 'ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/' filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE) # Deal with newlines as \n or \r\n. (BDR) # Or alternatively, instruct libcurl to change \n's to \r\n's for us with crlf = TRUE # filenames = getURL(url, ftp.use.epsv = FALSE, ftplistonly = TRUE, crlf = TRUE) filenames = paste(url, strsplit(filenames, "\r*\n")[[1]], sep = "") con = getCurlHandle( ftp.use.epsv = FALSE) # there is a slight possibility that some of the files that are # returned in the directory listing and in filenames will disappear # when we go back to get them. So we use a try() in the call getURL. contents = sapply(filenames[1:5], function(x) try(getURL(x, curl = con))) names(contents) = filenames[1:length(contents)] }
omegahatExists = url.exists("https://www.omegahat.net") # Regular HTTP if(omegahatExists && requireNamespace("XML", quietly = TRUE)) withAutoprint({ txt = getURL("https://www.omegahat.net/RCurl/") ## Then we could parse the result. XML::htmlTreeParse(txt, asText = TRUE) }) # HTTPS. First check to see that we have support compiled into # libcurl for ssl. if(interactive() && ("ssl" %in% names(curlVersion()$features)) && url.exists("https://sourceforge.net/")) { txt = tryCatch(getURL("https://sourceforge.net/"), error = function(e) { getURL("https://sourceforge.net/", ssl.verifypeer = FALSE) }) } # Create a CURL handle that we will reuse. if(interactive() && omegahatExists) { curl = getCurlHandle() pages = list() for(u in c("https://www.omegahat.net/RCurl/index.html", "https://www.omegahat.net/RGtk/index.html")) { pages[[u]] = getURL(u, curl = curl) } } # Set additional fields in the header of the HTTP request. # verbose option allows us to see that they were included. if(omegahatExists) getURL("https://www.omegahat.net", httpheader = c(Accept = "text/html", MyField = "Duncan"), verbose = TRUE) # Arrange to read the header of the response from the HTTP server as # a separate "stream". Then we can break it into name-value # pairs. (The first line is the HTTP/1.1 200 Ok or 301 Moved Permanently # status line) if(omegahatExists) withAutoprint({ h = basicTextGatherer() txt = getURL("https://www.omegahat.net/RCurl/index.html", header= TRUE, headerfunction = h$update, httpheader = c(Accept="text/html", Test=1), verbose = TRUE) print(paste(h$value(NULL)[-1], collapse="")) con <- textConnection(paste(h$value(NULL)[-1], collapse="")) read.dcf(con) close(con) }) # Test the passwords. if(omegahatExists) withAutoprint({ x = getURL("https://www.omegahat.net/RCurl/testPassword/index.html", userpwd = "bob:duncantl") # Catch an error because no authorization # We catch the generic HTTPError, but we could catch the more specific "Unauthorized" error # type. x = tryCatch(getURLContent("https://www.omegahat.net/RCurl/testPassword/index.html"), HTTPError = function(e) { cat("HTTP error: ", e$message, "\n") }) }) ## Not run: # Needs specific information from the cookie file on a per user basis # with a registration to the NY times. x = getURL("https://www.nytimes.com", header = TRUE, verbose = TRUE, cookiefile = "/home/duncan/Rcookies", netrc = TRUE, maxredirs = as.integer(20), netrc.file = "/home2/duncan/.netrc1", followlocation = TRUE) ## End(Not run) if(interactive() && omegahatExists) { d = debugGatherer() x = getURL("https://www.omegahat.net", debugfunction = d$update, verbose = TRUE) d$value() } ############################################# # Using an option set in R if(interactive() && omegahatExists) { opts = curlOptions(header = TRUE, userpwd = "bob:duncantl", netrc = TRUE) getURL("https://www.omegahat.net/RCurl/testPassword/index.html", verbose = TRUE, .opts = opts) # Using options in the CURL handle. h = getCurlHandle(header = TRUE, userpwd = "bob:duncantl", netrc = TRUE) getURL("https://www.omegahat.net/RCurl/testPassword/index.html", verbose = TRUE, curl = h) } # Use a C routine as the reader. Currently gives a warning. if(interactive() && omegahatExists) { routine = getNativeSymbolInfo("R_internalWriteTest", PACKAGE = "RCurl")$address getURL("https://www.omegahat.net/RCurl/index.html", writefunction = routine) } # Example if(interactive() && omegahatExists) { uris = c("https://www.omegahat.net/RCurl/index.html", "https://www.omegahat.net/RCurl/philosophy.xml") txt = getURI(uris) names(txt) nchar(txt) txt = getURI(uris, async = FALSE) names(txt) nchar(txt) routine = getNativeSymbolInfo("R_internalWriteTest", PACKAGE = "RCurl")$address txt = getURI(uris, write = routine, async = FALSE) names(txt) nchar(txt) # getURLContent() for text and binary x = getURLContent("https://www.omegahat.net/RCurl/index.html") class(x) x = getURLContent("https://www.omegahat.net/RCurl/data.gz") class(x) attr(x, "Content-Type") x = getURLContent("https://www.omegahat.net/Rcartogram/demo.jpg") class(x) attr(x, "Content-Type") curl = getCurlHandle() dd = getURLContent("https://www.omegahat.net/RJSONIO/RJSONIO.pdf", curl = curl, header = dynCurlReader(curl, binary = TRUE, value = function(x) { print(attributes(x)) x})) } # FTP # Download the files within a directory. if(interactive() && url.exists('ftp://ftp.wcc.nrcs.usda.gov')) { url = 'ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/' filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE) # Deal with newlines as \n or \r\n. (BDR) # Or alternatively, instruct libcurl to change \n's to \r\n's for us with crlf = TRUE # filenames = getURL(url, ftp.use.epsv = FALSE, ftplistonly = TRUE, crlf = TRUE) filenames = paste(url, strsplit(filenames, "\r*\n")[[1]], sep = "") con = getCurlHandle( ftp.use.epsv = FALSE) # there is a slight possibility that some of the files that are # returned in the directory listing and in filenames will disappear # when we go back to get them. So we use a try() in the call getURL. contents = sapply(filenames[1:5], function(x) try(getURL(x, curl = con))) names(contents) = filenames[1:length(contents)] }
This function returns the MIME type, i.e. part of the value used in the Content-Type for an HTTP request/response or in email to identify the nature of the content. This is a string such as "text/plain" or "text/xml" or "image/png".
The function consults an R object constructed by reading a Web site of known MIME types (not necessarily all) and matching the extension of the file name to the names of that table.
guessMIMEType(name, default = NA)
guessMIMEType(name, default = NA)
name |
character vector of file names |
default |
the value to use if no MIME type is found in the table for the given file name/extension. |
A character vector giving the MIME types for each element of name
.
Duncan Temple Lang
The table of MIME types and extensions was programmatically extracted
from ‘http://www.webmaster-toolkit.com/mime-types.shtml’ with
tbls = readHTMLTable("http://www.webmaster-toolkit.com/mime-types.shtml")
tmp = tbls[[1]][-1,]
mimeTypeExtensions = structure(as.character(tmp[[2]]), names = gsub("^\.", "", tmp[[1]]))
save(mimeTypeExtensions, file = "data/mimeTypeExtensions.rda")
The IANA list is not as convenient to programmatically compile.
Uploading file.
guessMIMEType(c("foo.txt", "foo.png", "foo.jpeg", "foo.Z", "foo.R")) guessMIMEType("foo.bob") guessMIMEType("foo.bob", "application/x-binary")
guessMIMEType(c("foo.txt", "foo.png", "foo.jpeg", "foo.Z", "foo.R")) guessMIMEType("foo.bob") guessMIMEType("foo.bob", "application/x-binary")
These are values that can be used to set the http.version
and sslversion
options of curlPerform
.
HTTP_VERSION_1_0
HTTP_VERSION_1_0
https://curl.se/libcurl/c/curl_easy_setopt.html
These two functions are simple, high-level functions
that implement the HTTP request methods PUT and DELETE.
These can also be done by specifying the method
type using the curl option customrequest
.
These functions merely provide a convenience
wrapper for getURLContent
with the HTTP method specified.
httpPUT(url, content, ..., curl = getCurlHandle()) httpPOST(url, ..., curl = getCurlHandle()) httpDELETE(url, ..., curl = getCurlHandle()) httpGET(url, ..., curl = getCurlHandle()) httpHEAD(url, ..., curl = getCurlHandle()) httpOPTIONS(url, ..., curl = getCurlHandle())
httpPUT(url, content, ..., curl = getCurlHandle()) httpPOST(url, ..., curl = getCurlHandle()) httpDELETE(url, ..., curl = getCurlHandle()) httpGET(url, ..., curl = getCurlHandle()) httpHEAD(url, ..., curl = getCurlHandle()) httpOPTIONS(url, ..., curl = getCurlHandle())
url |
the URL of the server to which the HTTP request is to be made |
content |
the value that is to be used as the content of the
|
... |
additional arguments passed to |
curl |
the curl handle to be used to make the request |
The content returned by the server as a result of the request.
Duncan Temple Lang
## Not run: # create a database in a CouchDB server httpPUT("http://127.0.0.1:5984/temp_db") # Insert an entry into an ElasticSearch dabtabase. httpPUT("http://localhost:9200/a/b/axyz", '{"abc" : 123}') # Then delete the database httpDELETE("http://127.0.0.1:5984/temp_db") ## End(Not run)
## Not run: # create a database in a CouchDB server httpPUT("http://127.0.0.1:5984/temp_db") # Insert an entry into an ElasticSearch dabtabase. httpPUT("http://localhost:9200/a/b/axyz", '{"abc" : 123}') # Then delete the database httpDELETE("http://127.0.0.1:5984/temp_db") ## End(Not run)
This is a method that merges the contents of one list with another by adding the named elements in the second that are not in the first. In other words, the first list is the target template, and the second one adds any extra elements that it has.
merge.list(x, y, ...)
merge.list(x, y, ...)
x |
the list to which elements will be added |
y |
the list which will supply additional elements to |
... |
not used. |
A named list whose name set is the union of the elements in names of x and y and whose values are those taken from y and then with those in x, overwriting if necessary.
Duncan Temple Lang
Curl homepage https://curl.se/
## Not run: # Not exported. merge.list(list(a=1, b = "xyz", c = function(x, y) {x+y}), list(a = 2, z = "a string")) # No values in y merge.list(list(a=1, b = "xyz", c = function(x, y) {x+y}), list()) # No values in x merge.list(list(), list(a=1, b = "xyz", c = function(x, y) {x+y})) ## End(Not run)
## Not run: # Not exported. merge.list(list(a=1, b = "xyz", c = function(x, y) {x+y}), list(a = 2, z = "a string")) # No values in y merge.list(list(a=1, b = "xyz", c = function(x, y) {x+y}), list()) # No values in x merge.list(list(), list(a=1, b = "xyz", c = function(x, y) {x+y})) ## End(Not run)
This is a programmatically generated character vector
whose names identify the MIME type typically associated with
the extension which are the values.
This is used in guessMIMEType
.
We can match an extension and then find the corresponding
MIME type.
There are duplicates.
data(mimeTypeExtensions)
data(mimeTypeExtensions)
The format is a named character vector where the names are the MIME types and the values are the file extensions.
The table of MIME types and extensions was programmatically extracted
from ‘http://www.webmaster-toolkit.com/mime-types.shtml’ with
tbls = readHTMLTable("http://www.webmaster-toolkit.com/mime-types.shtml")
tmp = tbls[[1]][-1,]
mimeTypeExtensions = structure(as.character(tmp[[2]]), names = gsub("^\.", "", tmp[[1]]))
save(mimeTypeExtensions, file = "data/mimeTypeExtensions.rda")
The IANA list is not as convenient to programmatically compile.
data(mimeTypeExtensions)
data(mimeTypeExtensions)
This is a class that represents a handle to an internal C-level data structure provided by libcurl to perform multiple HTTP requests in a single operation and process the responses in an inter-leaved fashion, i.e. a chunk from one, followed by a chunk from another.
Objects of this class contain not only a reference to the
internal C-level data structure, but also have a list
of the CURLHandle-class
objects
that represent the individual HTTP requests that make up
the collection of concurrent requests.
These are maintained for garbage collection reasons.
Essentially, the data in objects of this class are for internal use; this is an opaque class in R.
The constructor function getCurlMultiHandle
is the only way to create meaningful instances of this class.
ref
:Object of class "externalptr"
. This is
a reference to the instance of the libcurl data structure
CURLM
pointer.
subhandles
:Object of class "list"
.
This is a list of CURLHandle-class
instances that have been push()
ed onto the
multi-handle stack.
signature(obj = "MultiCURLHandle", val = "CURLHandle")
: ...
signature(obj = "MultiCURLHandle", val = "character")
: ...
signature(obj = "MultiCURLHandle", val = "CURLHandle")
: ...
Duncan Temple Lang
Curl homepage https://curl.se/ https://www.omegahat.net/RCurl/
getCurlMultiHandle
curlMultiPerform
multiTextGatherer
These functions provide facilities for submitting an HTML form using either the simple GET mechanism (appending the name-value pairs of parameters in the URL) or the POST method which puts the name-value pairs as separate sections in the body of the HTTP request. The choice of action is defined by the form, not the caller.
postForm(uri, ..., .params = list(), .opts = curlOptions(url = uri), curl = getCurlHandle(), style = 'HTTPPOST', .encoding = integer(), binary = NA, .checkParams = TRUE, .contentEncodeFun = curlEscape) .postForm(curl, .opts, .params, style = 'HTTPPOST') getForm(uri, ..., .params = character(), .opts = list(), curl = getCurlHandle(), .encoding = integer(), binary = NA, .checkParams = TRUE)
postForm(uri, ..., .params = list(), .opts = curlOptions(url = uri), curl = getCurlHandle(), style = 'HTTPPOST', .encoding = integer(), binary = NA, .checkParams = TRUE, .contentEncodeFun = curlEscape) .postForm(curl, .opts, .params, style = 'HTTPPOST') getForm(uri, ..., .params = character(), .opts = list(), curl = getCurlHandle(), .encoding = integer(), binary = NA, .checkParams = TRUE)
uri |
the full URI to which the form is to be posted. This includes the host and the specific file or script which will process the form. |
... |
the name-value pairs of parameters. Note that these are not the CURL options. |
.params |
instead of specifying the name-value parameters in "free" form via the ... argument, one can specify them as named list or character vector. |
.opts |
an object representing the CURL options for this call. |
curl |
the |
style |
this is typically a string
and controls how the form data is posted, specifically
the value for the Content-Type header and the particular
representation.
Use 'httppost' to use a |
.encoding |
the encoding of the result, if it is known a priori. This can be an integer between 0 and 4 or more appropriately a string identifying the encoding as one of "utf-8", or "ISO-859-1". |
binary |
a logical value indicating whether the caller knows
whether the resulting content is binary ( |
.checkParams |
a logical value that indicates whether we should perform a check/test
to identify if any of the arguments passed to the form correspond to Curl options.
This is useful to identify potential errors in specifying the Curl options in the
wrong place (in the way we would for |
.contentEncodeFun |
a function which encodes strings in a
suitable manner. For x-www-form-encoded submissions, this should
most likely should be |
Creating a new CURLHandle
allows the C-level code
to more efficiently map the R-level values to their
C equivalents needed to make the call. However, reusing
the handle across calls can be more efficient in that
the connection to a server can be maintained and thus,
the sometimes expensive task of establishing it is
avoided in subsequent calls.
By default, the text from the HTTP response is returned.
if(url.exists("http://www.google.com")) withAutoprint({ # Two ways to submit a query to google. Searching for RCurl getURL("http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=RCurl&btnG=Search") # Here we let getForm do the hard work of combining the names and values. getForm("http://www.google.com/search", hl="en", lr="", ie="ISO-8859-1", q="RCurl", btnG="Search") # And here if we already have the parameters as a list/vector. getForm("http://www.google.com/search", .params = c(hl="en", lr="", ie="ISO-8859-1", q="RCurl", btnG="Search")) }) # Now looking at POST method for forms. url <- "http://wwwx.cs.unc.edu/~jbs/aw-wwwp/docs/resources/perl/perl-cgi/programs/cgi_stdin.cgi" if(url.exists(url)) postForm(url, name = "Bob", "checkedbox" = "spinich", submitButton = "Now!", textarea = "Some text to send", selectitem = "The item", radiobutton = "a", style = "POST") # Genetic database via the Web. if(url.exists('http://www.wormbase.org/db/searches/advanced/dumper')) withAutoprint({ x = postForm('http://www.wormbase.org/db/searches/advanced/dumper', species="briggsae", list="", flank3="0", flank5="0", feature="Gene Models", dump = "Plain TEXT", orientation = "Relative to feature", relative = "Chromsome", DNA ="flanking sequences only", .cgifields = paste(c("feature", "orientation", "DNA", "dump","relative"), collapse=", ")) # Note that we don't have to paste multiple values together ourselves, # e.g. the .cgifields can be specified as a character vector rather # than a string. x = postForm('http://www.wormbase.org/db/searches/advanced/dumper', species="briggsae", list="", flank3="0", flank5="0", feature="Gene Models", dump = "Plain TEXT", orientation = "Relative to feature", relative = "Chromsome", DNA ="flanking sequences only", .cgifields =c("feature", "orientation", "DNA", "dump", "relative")) })
if(url.exists("http://www.google.com")) withAutoprint({ # Two ways to submit a query to google. Searching for RCurl getURL("http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=RCurl&btnG=Search") # Here we let getForm do the hard work of combining the names and values. getForm("http://www.google.com/search", hl="en", lr="", ie="ISO-8859-1", q="RCurl", btnG="Search") # And here if we already have the parameters as a list/vector. getForm("http://www.google.com/search", .params = c(hl="en", lr="", ie="ISO-8859-1", q="RCurl", btnG="Search")) }) # Now looking at POST method for forms. url <- "http://wwwx.cs.unc.edu/~jbs/aw-wwwp/docs/resources/perl/perl-cgi/programs/cgi_stdin.cgi" if(url.exists(url)) postForm(url, name = "Bob", "checkedbox" = "spinich", submitButton = "Now!", textarea = "Some text to send", selectitem = "The item", radiobutton = "a", style = "POST") # Genetic database via the Web. if(url.exists('http://www.wormbase.org/db/searches/advanced/dumper')) withAutoprint({ x = postForm('http://www.wormbase.org/db/searches/advanced/dumper', species="briggsae", list="", flank3="0", flank5="0", feature="Gene Models", dump = "Plain TEXT", orientation = "Relative to feature", relative = "Chromsome", DNA ="flanking sequences only", .cgifields = paste(c("feature", "orientation", "DNA", "dump","relative"), collapse=", ")) # Note that we don't have to paste multiple values together ourselves, # e.g. the .cgifields can be specified as a character vector rather # than a string. x = postForm('http://www.wormbase.org/db/searches/advanced/dumper', species="briggsae", list="", flank3="0", flank5="0", feature="Gene Models", dump = "Plain TEXT", orientation = "Relative to feature", relative = "Chromsome", DNA ="flanking sequences only", .cgifields =c("feature", "orientation", "DNA", "dump", "relative")) })
This generic and the associated method for a CURLHandle allows one to reset the state of the Curl object to its default state. This is convenient if we want to reuse the same connection, but want to ensure that it is in a particular state.
Unfortunately, we cannot query the state of different fields in an existing Curl handle and so we need to be able to reset the state and then update it with any particular settings we would have liked to keep.
reset(x, ...)
reset(x, ...)
x |
the object to be reset. For our method, this is an object of
class |
... |
additional arguments for methods |
This calls the C routine curl_easy_reset
in libcurl.
Methods typically return the updated version of the object passed to it. This allows the caller to assign the new result to the same variable rather than relying on mutating the content of the object in place. In other words, the object should not be treated as a reference but a new object with the updated contents should be returned.
Duncan Temple Lang
Curl homepage https://curl.se/
h = getCurlHandle() curlSetOpt(customrequest = "DELETE") reset(h)
h = getCurlHandle() curlSetOpt(customrequest = "DELETE") reset(h)
This function allows us to retrieve the contents of a file from a remote host via SCP. This is done entirely within R, rather than a command line application and the contents of the file are never written to disc. The function allows the
scp(host, path, keypasswd = NA, user = getUserName(), rsa = TRUE, key = sprintf(c("~/.ssh/id_%s.pub", "~/.ssh/id_%s"), if (rsa) "rsa" else "dsa"), binary = NA, size = 5000, curl = getCurlHandle(), ...)
scp(host, path, keypasswd = NA, user = getUserName(), rsa = TRUE, key = sprintf(c("~/.ssh/id_%s.pub", "~/.ssh/id_%s"), if (rsa) "rsa" else "dsa"), binary = NA, size = 5000, curl = getCurlHandle(), ...)
host |
the name of the remote host or its IP address |
path |
the path of the file of interest on the remote host's file systems |
keypasswd |
a password for accessing the local SSH key. This is the passphrase for the key. |
user |
the name of the user on the remote machine |
rsa |
a logical value indicating whether to use the RSA or DSA key |
key |
the path giving the location of the SSH key. |
binary |
a logical value giving |
size |
an estimate of the size of the buffer needed to store the contents of the file. This is used to initialize the buffer and potentially avoid resizing it as needed. |
curl |
a curl handle ( |
... |
additional parameters handed to |
This uses libcurl's facilities for scp.
Use "scp" %in% curlVersion()$protocols
to
see if SCP is supported.
Either a raw or character vector giving the contents of the file.
Duncan Temple Lang
libcurl https://curl.se/
curlPerform
getCurlOptionsConstants
## Not run: x = scp("eeyore.ucdavis.edu", "/home/duncan/OmegaWeb/index.html", "My.SCP.Passphrase", binary = FALSE) x = scp("eeyore.ucdavis.edu", "/home/duncan/OmegaWeb/RCurl/xmlParse.bz2", "My.SCP.Passphrase") o = memDecompress(x, asChar = TRUE) ## End(Not run)
## Not run: x = scp("eeyore.ucdavis.edu", "/home/duncan/OmegaWeb/index.html", "My.SCP.Passphrase", binary = FALSE) x = scp("eeyore.ucdavis.edu", "/home/duncan/OmegaWeb/RCurl/xmlParse.bz2", "My.SCP.Passphrase") o = memDecompress(x, asChar = TRUE) ## End(Not run)
This functions is analogous to file.exists
and determines whether a request for a specific URL responds
without error. We make the request but ask the server
not to return the body. We just process the header.
url.exists(url, ..., .opts = list(...), curl = getCurlHandle(.opts = .opts), .header = FALSE)
url.exists(url, ..., .opts = list(...), curl = getCurlHandle(.opts = .opts), .header = FALSE)
url |
a vector of one or more URLs whose existence we are to test |
... |
name = value pairs of Curl options. |
.opts |
a list of name = value pairs of Curl options. |
curl |
a Curl handle that the caller can specify if she wants to reuse an existing handle, e.g. with different options already specified or that has previously established a connection to the Web server |
.header |
a logical value that if |
This makes an HTTP request but with the nobody option set to
FALSE
so that we don't actually retrieve the contents of the URL.
If .header
is FALSE
, this returns
TRUE
or FALSE
for each URL indicating whether
the request was successful (had a status with a value
in the 200 range).
If .header
is TRUE
, the header is returned for the
request for each URL.
Duncan Temple Lang
HTTP specification
url.exists("https://www.omegahat.net/RCurl") try(url.exists("https://www.omegahat.net/RCurl-xxx"))
url.exists("https://www.omegahat.net/RCurl") try(url.exists("https://www.omegahat.net/RCurl-xxx"))