api-key
.POST https://ml.nexosis.com/v1/imports/url { "dataSetName" : "AmazonReviews", "url" : "https://raw.githubusercontent.com/couchbaselabs/blog-source-code/master/Groves/101MachineLearningNexosis/src/modified5000.csv" }
{ "importId": "< guid >", "type": "url", "status": "requested", "dataSetName": "AmazonReviews", "parameters": { "url": "https://raw.githubusercontent.com/" }, "requestedDate": "2018-02-19T19:04:13.012859+00:00", "statusHistory": [ { "date": "2018-02-19T19:04:13.012859+00:00", "status": "requested" } ], "messages": [], "columns": {}, "links": [ { "rel": "self", "href": "https://ml.nexosis.com/v1/imports/url" }, { "rel": "data", "href": "https://ml.nexosis.com/v1/data/AmazonReviews" } ] }
POST https://ml.nexosis.com/v1/sessions/model { "predictionDomain":"classification", "dataSourceName" : "AmazonReviews", "targetColumn": "review_sentiment", "extraParameters" : { "balance": true } }
"predictionDomain":"classification"
– Remember when I said I was going to use “classification” for sentiment analysis?"dataSourceName" : "AmazonReviews"
– I gave the data source this name, so I’m telling it to use this data source for training."targetColumn": "review_sentiment"
– The ‘review_sentiment’ column contains the “__label__1” or “__label__2” values. This is the value that I want Nexosis to learn how to generate."extraParameters" : { "balance": true }
– If your data set is unbalanced (meaning, for instance, it contains a lot more negative reviews than positive ones), that could disproportionally influence the machine learning. Set balance to “true” to adjust for this.{ "columns": { "text": { "dataType": "text", "role": "feature" }, "review_sentiment": { "dataType": "string", "role": "target", "imputation": "mode", "aggregation": "mode" } }, "sessionId": "< guid >", "type": "model", "status": "requested", "predictionDomain": "classification", "availablePredictionIntervals": [], "requestedDate": "2018-02-19T19:28:41.812052+00:00", "statusHistory": [ { "date": "2018-02-19T19:28:41.812052+00:00", "status": "requested" } ], "extraParameters": { "balance": true }, "messages": [], "name": "Classification on AmazonReviews", "dataSourceName": "AmazonReviews", "dataSetName": "AmazonReviews", "targetColumn": "review_sentiment", "isEstimate": false, "links": [ { "rel": "results", "href": "https://ml.nexosis.com/v1/sessions/< guid >/results" }, { "rel": "data", "href": "https://ml.nexosis.com/v1/data/AmazonReviews" }, { "rel": "vocabularies", "href": "https://ml.nexosis.com/v1/vocabulary?createdFromSessionid=< guid >" } ] }
modelId
in the results, that will be another GUID. You will need this to proceed.metrics
field. The results for mine looks like:"metrics": { "macroAverageF1Score": 0.81341486902927584, "rocAreaUnderCurve": 0.88777613666838784, "accuracy": 0.814, "macroAveragePrecision": 0.81521331769769212, "macroAverageRecall": 0.81309365130828448, "matthewsCorrelationCoefficient": 0.62830339352567144 },
POST https://ml.nexosis.com/v1/models/<model ID guid>/predict
{ "data":[{ "text" : "Junk! Don't waste your money! ... It worked great for about two weeks then progressively got worse for the next four until it now barely works" }], "extraParameters" :{ "includeClassScores" : false } }
{ "data": [ { "text": "Junk! Don't waste your money! ... It worked great for about two weeks then progressively got worse for the next four until it now barely works", "review_sentiment": "__label__1" } ], // ... etc ... }
C:\Program Files\Couchbase\Server\var\lib\couchbase\n1qlcerts
folder.{ "all_access":false, "allowed_urls":["https://ml.nexosis.com/v1/models/< modelId guid >/predict "], "disallowed_urls":[] }
CREATE PRIMARY INDEX on reviews
).SELECT
and build from there.SELECT CURL(url, { "header": headers, "data": body, "request":requestType}) AS nexosis FROM reviews r LET url = 'https://ml.nexosis.com/v1/models/< modelId guid >/predict', headers = ["Content-Type: application/json", "api-key: < my API key >"], body = '{ "data": [{ "text": "' || r.text || '" }], "extraParameters": { "includeClassScores": false }}', requestType = "POST";
body
is pulling the text directly from the review document in Couchbase. The only thing being I’ve selected (so far) is the entire response from Nexosis, which looks similar to this:SELECT r.actual, r.text, CURL(url, { "header": headers, "data": body, "request":requestType}).data[0].review_sentiment AS nexosis FROM reviews r LET url = 'https://ml.nexosis.com/v1/models/< modelId GUID >/predict', headers = ["Content-Type: application/json", "api-key: < my api key >"], body = '{ "data": [{ "text": "' || r.text || '" }], "extraParameters": { "includeClassScores": false }}', requestType = "POST";
[ { "actual": 1, "nexosis": "__label__1", "text": "Junk! Don't waste your money..." }, { "actual": 5, "nexosis": "__label__2", "text": "This is the greatest thing since sliced bread..." }, { "actual": 3, "nexosis": "__label__1", "text": "The most confusing RC I've ever seen..." } ]
UPDATE
and INSERT
instead of a live SELECT
like I’m doing.