MySQL and R

Source: Internet
Author: User

MySQL and Raugust, 2011By Christopher Bare

(This article is first published on Digithead ' s Lab Notebook, and kindly contributed toR-bloggers)

Using MySQL with the pretty easy, with Rmysql. Here is a few notes to keep me straight on a few things I always get snagged on.

Typically, most folks is going to want to analyze data, the ' s already in a MySQL database. Being a little bass-ackwards, I often want to go. One reason to do the-do some analysis in R and make the results available dynamically in a web app, which Necessita TES writing data from R to a database. As of this writing, INSERT isn ' t even mentioned in Thermysql docs, sadly for me, but it works just fine.

The docs was a bit clearer for RS-DBI, which was the standard R interface to relational databases and of which Rmysql was on E implementation.

Opening and closing connections

The best-of-the-close DB connections, like you would does in a-finally clause in Java, is-to-use On.exit, as this:

Con <-dbconnect (MySQL (),         user= "Me", password= "nuts2u",         dbname= "my_db", host= "localhost") on.exit ( Dbdisconnect (Con))
Building queries

Using sprintf to build the queries feels a little primitive. As far as I can tell, there's no prepared statements in Rmysql. I don ' t Suppose Sql-injection is a concern here, but prepared statements might was a little tidier, anyway.

Processing Query Results

You can process the query results row by row, in blocks or all at once. The highly useful function dbgetquery (con, SQL) returns all query results as a data frame. With Dbsendquery, you can get any or partial results with fetch.

Con <-dbconnect (MySQL (), user= "Network_portal", password= "Monkey2us", Dbname=db.name, host= "localhost") RS <- Dbsendquery (Con, "Select name from Genes limit 10;") Data <-Fetch (RS, n=10) huh <-dbhascompleted (RS) Dbclearresult (RS) dbdisconnect (con)

If there ' s no more results, FETCH returns a data frame with 0 columns and 0 rows. dbhascompleted is supposed to indicate whether there was more records to being fetched, but seems broken. The value of huh in the code above was false, which seems wrong to me.

Retrieving Auto_increment IDs

A standard newbie question with MySQL are how to retrieve freshly generated primary keys from auto_increment fields. That's what MySQL's last_insert_id () is for.

You can retrieve the most recent auto_increment value with the last_insert_id () SQL function or the mysql_insert_id () C AP I function. These functions was connection-specific, so their return values was not affected by another connection which was also PERFO Rming inserts.

The same works with Rmysql, but there is some traps to watch out for. Let's say you ' re inserting a row into a table of networks. Don ' t worry about the specifics. You want to insert related data in another table and so you need the ID of the newly inserted row.

Create.network <-function (species.id, network.name, Data.source, description) {    con <-dbconnect (MySQL (),           user= "Super_schmuck", password= "nuts2u",           dbname= "my_db", host= "localhost")  on.exit (Dbdisconnect (Con ))  SQL <-sprintf ("INSERT INTO networks                  (species_id, name, Data_source, description, created_at)                  values (%d, '%s ', '%s ', '%s ', now ()); ",                 species.id, Network.name, Data.source, description)  rs <-dbsendquery (Con, SQL)  Dbclearresult (RS)  ID <-dbgetquery (Con, "select last_insert_id ();") [About]  return (ID)}

Don ' t forget to clear the result of the insert. If you don't, you'll get 0 from the last_insert_id(). Also, using dbgetqueryfor the insert produces a strange error when your go to calllast_insert_id:

Error in Mysqlexecstatement (conn, statement, ...):   rs-dbi driver: (Could not run statement:commands out of sync; Can ' t run this command now)

Alternatively, you can also the combine both SQL statements into one call to Dbsendquery, but that you had to remember T o set a flag when you make the connection:client.flag=client_multi_statements. Trying to use multiple queries seems do with Dbgetquery.

Create.network <-function (species.id, network.name, Data.source, description) {  con <-dbconnect (MySQL (),           user= "Super_schmuck", password= "nuts2u",           dbname= "my_db", host= "localhost",           client.flag=client_multi _statements)  on.exit (Dbdisconnect (con))  SQL <-sprintf ("INSERT INTO networks                  (species_id, name, Data_ Source, description, Created_at)                  values (%d, '%s ', '%s ', '%s ', now ());                  Select last_insert_id (); ",                 species.id, Network.name, Data.source, description)  rs <-dbsendquery (con, SQL )  if (Dbmoreresults (Con)) {    rs <-dbnextresult (con)    ID <-Fetch (RS) [+]  } else {    stop ( ' Error getting last inserted ID. ')  }  Dbclearresult (RS)  return (ID)}

Any effort saved by combining the SQL queries are lost in the extra house-keeping so I prefer the first method.

In spite of these few quirks, Rmysql generally works fine and is pretty straightforward.

MySQL and R

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.