Title: | Machine Learning for S.E.O |
---|---|
Description: | Measures different aspects of page content, structure and performance for SEO (Search Engine Optimization). Aspects covered include HTML tags used in SEO, duplicate and near-duplicate content, structured data, on-site linking structure and popularity transfer, and many other amazing things. This package can be used to generate a real, full SEO audit report, which serves to detect errors or inefficiencies on a page that can be corrected in order to optimise its performance on search engines. |
Authors: | Vincent Terrasi [aut, cre], OnCrawl [cph, fnd] |
Maintainer: | Vincent Terrasi <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2024-12-23 06:46:14 UTC |
Source: | CRAN |
Transform HTML widget into picture
export_formattableWidget(w, file, width = 400, background = "white", delay = 0.2)
export_formattableWidget(w, file, width = 400, background = "white", delay = 0.2)
w |
HTML content to print |
file |
A vector of names of output files. Should end with .png, .pdf, or .jpeg. If several screenshots have to be taken and only one filename is provided, then the function appends the index number of the screenshot to the file name. |
width |
Viewport width. This is the width of the browser "window". |
background |
Background color for web page |
delay |
Time to wait before taking screenshot, in seconds. Sometimes a longer delay is needed for all assets to display properly. |
Get a crawl
getCrawl(crawlId)
getCrawl(crawlId)
crawlId |
Id of your crawl |
<http://developer.oncrawl.com/#Get-a-crawl>
ResCode 400 : Returned when the request has incompatible values or does not match the API specification. 401 : Returned when the request is not authenticated. 403 : Returned the current quota does not allow the action to be performed. 404 : Returned when any of resources referred in the request is not found. 403 : Returned when the request is authenticated but the action is not allowed. 409 : Returned when the requested operation is not allowed for current state of the resource. 500 : Internal error
The HTTP response is JSON object with a single crawl key containing the crawl’s data
Json
Vincent Terrasi
## Not run: initAPI() project <- getCrawl("YOURCRAWLID") ## End(Not run)
## Not run: initAPI() project <- getCrawl("YOURCRAWLID") ## End(Not run)
List all available fields from a crawl
getPageFields(crawlId)
getPageFields(crawlId)
crawlId |
ID of your crawl |
ResCode 400 : Returned when the request has incompatible values or does not match the API specification. 401 : Returned when the request is not authenticated. 403 : Returned the current quota does not allow the action to be performed. 404 : Returned when any of resource(s) referred in the request is not found. 403 : Returned when the request is authenticated but the action is not allowed. 409 : Returned when the requested operation is not allowed for current state of the resource. 500 : Internal error
Character Array
Vincent Terrasi
## Not run: pages <- getFields(YOURCRAWLID) ## End(Not run)
## Not run: pages <- getFields(YOURCRAWLID) ## End(Not run)
List all available fields from logs
getPageFieldsLogs(projectId)
getPageFieldsLogs(projectId)
projectId |
ID of your project |
ResCode 400 : Returned when the request has incompatible values or does not match the API specification. 401 : Returned when the request is not authenticated. 403 : Returned the current quota does not allow the action to be performed. 404 : Returned when any of resource(s) referred in the request is not found. 403 : Returned when the request is authenticated but the action is not allowed. 409 : Returned when the requested operation is not allowed for current state of the resource. 500 : Internal error
Character Array
Vincent Terrasi
## Not run: logsFields <- getFieldsLogs(YOURPROJECTID) ## End(Not run)
## Not run: logsFields <- getFieldsLogs(YOURPROJECTID) ## End(Not run)
Get a project
getProject(projectId)
getProject(projectId)
projectId |
Id of your project |
<http://developer.oncrawl.com/#Get-a-project>
ResCode 400 : Returned when the request has incompatible values or does not match the API specification. 401 : Returned when the request is not authenticated. 403 : Returned the current quota does not allow the action to be performed. 404 : Returned when any of resource(s) referred in the request is not found. 403 : Returned when the request is authenticated but the action is not allowed. 409 : Returned when the requested operation is not allowed for current state of the resource. 500 : Internal error
The HTTP response is JSON object with three keys: - A project key with the project’s data - A crawl_configs key with a list of all the project’s crawl configurations. - A crawls key with a list of all project’s crawl.
Json
Vincent Terrasi
## Not run: initAPI() project <- getProject(YOURPROJECTID) ## End(Not run)
## Not run: initAPI() project <- getProject(YOURPROJECTID) ## End(Not run)
Prepare Token for API calls
initAPI(path)
initAPI(path)
path |
path of your conf file |
Example file for oncrawl_configuration.txt
key = 5516LP29W5Q9XXXXXXXXXXXXOEUGWHM9 debug = FALSE api = https://app.oncrawl.com/api/v2/
ok if no error with API authentification
Vincent Terrasi
## Not run: initAPI("oncrawl_configuration.txt") ## End(Not run)
## Not run: initAPI("oncrawl_configuration.txt") ## End(Not run)
List all links from a crawl
listLinks(crawlId, originFilter = "", targetFilter = "")
listLinks(crawlId, originFilter = "", targetFilter = "")
crawlId |
ID of your crawl |
originFilter |
select a specific source |
targetFilter |
select a specific target |
<http://developer.oncrawl.com/#Data-types>
ResCode 400 : Returned when the request has incompatible values or does not match the API specification. 401 : Returned when the request is not authenticated. 403 : Returned the current quota does not allow the action to be performed. 404 : Returned when any of resource(s) referred in the request is not found. 403 : Returned when the request is authenticated but the action is not allowed. 409 : Returned when the requested operation is not allowed for current state of the resource. 500 : Internal error
Json
Vincent Terrasi
## Not run: links <- listLinks(YOURCRAWLID) ## End(Not run)
## Not run: links <- listLinks(YOURCRAWLID) ## End(Not run)
List all pages from logs monitoring
listLogs(projectId)
listLogs(projectId)
projectId |
ID of your project |
<http://developer.oncrawl.com/#Data-types>
ResCode 400 : Returned when the request has incompatible values or does not match the API specification. 401 : Returned when the request is not authenticated. 403 : Returned the current quota does not allow the action to be performed. 404 : Returned when any of resources referred in the request is not found. 403 : Returned when the request is authenticated but the action is not allowed. 409 : Returned when the requested operation is not allowed for current state of the resource. 500 : Internal error
Json
Vincent Terrasi
## Not run: pages <- listLogs(YOURPROJECTID) ## End(Not run)
## Not run: pages <- listLogs(YOURPROJECTID) ## End(Not run)
List all pages from a crawl
listPages(crawlId)
listPages(crawlId)
crawlId |
ID of your crawl |
<http://developer.oncrawl.com/#Data-types>
ResCode 400 : Returned when the request has incompatible values or does not match the API specification. 401 : Returned when the request is not authenticated. 403 : Returned the current quota does not allow the action to be performed. 404 : Returned when any of resource(s) referred in the request is not found. 403 : Returned when the request is authenticated but the action is not allowed. 409 : Returned when the requested operation is not allowed for current state of the resource. 500 : Internal error
Json
Vincent Terrasi
## Not run: pages <- listPages(YOURCRAWLID) ## End(Not run)
## Not run: pages <- listPages(YOURCRAWLID) ## End(Not run)
Get Aggregate Queries for a specific OQL
listPagesAggs(crawlId, oqlList)
listPagesAggs(crawlId, oqlList)
crawlId |
ID of your crawl |
oqlList |
json of your OQL |
<http://developer.oncrawl.com/#OnCrawl-Query-Language>
ResCode 400 : Returned when the request has incompatible values or does not match the API specification. 401 : Returned when the request is not authenticated. 403 : Returned the current quota does not allow the action to be performed. 404 : Returned when any of resource(s) referred in the request is not found. 403 : Returned when the request is authenticated but the action is not allowed. 409 : Returned when the requested operation is not allowed for current state of the resource. 500 : Internal error
Json
Vincent Terrasi
## Not run: agg <- listPagesAggs(YOURCRAWLID, YOURJSON) page_crawled = agg[[1]]$rows ## End(Not run)
## Not run: agg <- listPagesAggs(YOURCRAWLID, YOURJSON) page_crawled = agg[[1]]$rows ## End(Not run)
List all projects
listProjects(limit = 100)
listProjects(limit = 100)
limit |
number of projects |
<http://developer.oncrawl.com/#List-projects>
ResCode 400 : Returned when the request has incompatible values or does not match the API specification. 401 : Returned when the request is not authenticated. 403 : Returned the current quota does not allow the action to be performed. 404 : Returned when any of resources referred in the request is not found. 403 : Returned when the request is authenticated but the action is not allowed. 409 : Returned when the requested operation is not allowed for current state of the resource. 500 : Internal error
Json
Vincent Terrasi
initAPI() projects <- listProjects()
initAPI() projects <- listProjects()
Create a dashboard
oncrawlCreateDashboard(dataset, namefile, width, pathfile = tempdir())
oncrawlCreateDashboard(dataset, namefile, width, pathfile = tempdir())
dataset |
Dataset with 3 columns : date, unique name of metric, value |
namefile |
Filename for the export |
width |
Width of your picture |
pathfile |
string. Optional. If not specified, the intermediate files are created under |
a graph
Vincent Terrasi
## Not run: oncrawlCreateDashboard(res, "metric.png", 500, ".") ## End(Not run)
## Not run: oncrawlCreateDashboard(res, "metric.png", 500, ".") ## End(Not run)
Create a graph
oncrawlCreateGraph(dataset, namefile, width, height, pathfile = tempdir())
oncrawlCreateGraph(dataset, namefile, width, height, pathfile = tempdir())
dataset |
dataset generated by DALEX package |
namefile |
the filename for the export |
width |
width of your picture |
height |
height of your picture |
pathfile |
string. Optional. If not specified, the intermediate files are created under |
file
Vincent Terrasi
## Not run: oncrawlCreateGraph(res, "metric.png", width=5, height=4, ".") ## End(Not run)
## Not run: oncrawlCreateGraph(res, "metric.png", width=5, height=4, ".") ## End(Not run)
Transform a character array of URLs into JSON file for OnCrawl platform
oncrawlCreateSegmentation(list_urls, namefile, pathfile = tempdir())
oncrawlCreateSegmentation(list_urls, namefile, pathfile = tempdir())
list_urls |
your urls |
namefile |
the filename for the JSON export |
pathfile |
string. Optional. If not specified, the intermediate files are created under |
JSON file
Vincent Terrasi
mylist <- c("/cat/domain","/cat/") oncrawlCreateSegmentation(mylist,"test.json")
mylist <- c("/cat/domain","/cat/") oncrawlCreateSegmentation(mylist,"test.json")
Explain XGBoost Model by displaying each importance variables
oncrawlExplainModel(model, x, y, max = 10, path = tempdir())
oncrawlExplainModel(model, x, y, max = 10, path = tempdir())
model |
your XgBoost model |
x |
your training data |
y |
your predicted data |
max |
the number of importance variable you want to explain |
path |
path of your conf file |
graphs
Vincent Terrasi
## Not run: list <- oncrawlTrainModel(dataset,200) oncrawlExplainModel(list$model, list$x, list$y, 3) ## End(Not run)
## Not run: list <- oncrawlTrainModel(dataset,200) oncrawlExplainModel(list$model, list$x, list$y, 3) ## End(Not run)
Split URLs
oncrawlSplitURL(list_urls, limit = 15)
oncrawlSplitURL(list_urls, limit = 15)
list_urls |
your urls |
limit |
the maximum of URLS you want |
data.frame
Vincent Terrasi
mylist <- c("/cat/domain/web/","/cat/","/cat/domain/") oncrawlSplitURL(mylist, 2)
mylist <- c("/cat/domain/web/","/cat/","/cat/domain/") oncrawlSplitURL(mylist, 2)
Train XGBoost Model
oncrawlTrainModel(dataset, nround = 300, verbose = 1)
oncrawlTrainModel(dataset, nround = 300, verbose = 1)
dataset |
your data frame |
nround |
number of iterations |
verbose |
display errors ? |
a list with your ML model, your training data
Vincent Terrasi
## Not run: list <- oncrawlTrainModel(dataset) plot(list$roc) print(list$matrix) ## End(Not run)
## Not run: list <- oncrawlTrainModel(dataset) plot(list$roc) print(list$matrix) ## End(Not run)