Final: Question 1

Please download the Enron email dataset enron.zip, unzip it and then restore it using mongorestore. It should restore to a collection called “messages” in a database called “enron”. Note that this is an abbreviated version of the full corpus. There should be 120,477 documents after restore.

Inspect a few of the documents to get a basic understanding of the structure. Enron was an American corporation that engaged in a widespread accounting fraud and subsequently failed.

In this dataset, each document is an email message. Like all Email messages, there is one sender but there can be multiple recipients.

Construct a query to calculate the number of messages sent by Andrew Fastow, CFO, to Jeff Skilling, the president. Andrew Fastow’s email address was andrew.fastow@enron.com. Jeff Skilling’s email was jeff.skilling@enron.com.

For reference, the number of email messages from Andrew Fastow to John Lavorato (john.lavorato@enron.com) was 1.

Solution:
3
  • Download the  enron.zip and extract it
  • Now run “mongod”
  • Now import the extracted database
    mongorestore --drop --db enron dump/enron
  • Now run “mongo”
  • See if enron database is imported with “show databases”
  • In the list if you see enron then you are ready to go.
  • Now run “use enron”
  • Then “show collections”
  • You will see messages collection
  • Check your data with “db.messages.findOne()”
    {
            "_id" : ObjectId("4f16fc97d1e2d32371003f02"),
            "body" : "COURTYARD\n\nMESQUITE\n2300 HWY 67\nMESQUITE, TX  75150\ntel: 972-681-3300\nfax: 972-681-3324\n\nHotel Information: http://courtyard.com/DALCM
    \n\n\nARRIVAL CONFIRMATION:\n Confirmation Number:84029698\nGuests in Room: 2\nNAME: MR ERIC  BASS \nGuest Phone: 7138530977\nNumber of Rooms:1\nArrive: Oct 6 2
    001\nDepart: Oct 7 2001\nRoom Type: ROOM - QUALITY\nGuarantee Method:\n Credit card guarantee\nCANCELLATION PERMITTED-BEFORE 1800 DAY OF ARRIVAL\n\nRATE INFORMA
    TION:\nRate(s) Quoted in: US DOLLAR\nArrival Date: Oct 6 2001\nRoom Rate: 62.10  per night. Plus tax when applicable\nRate Program: AAA AMERICAN AUTO ASSN\n\nSP
    ECIAL REQUEST:\n NON-SMOKING ROOM, GUARANTEED\n   \n\n\nPLEASE DO NOT REPLY TO THIS EMAIL \nAny Inquiries Please call 1-800-321-2211 or your local\ninternationa
    l toll free number.\n \nConfirmation Sent: Mon Jul 30 18:19:39 2001\n\nLegal Disclaimer:\nThis confirmation notice has been transmitted to you by electronic\nma
    il for your convenience. Marriott's record of this confirmation\nnotice is the official record of this reservation. Subsequent\nalterations to this electronic m
    essage after its transmission\nwill be disregarded.\n\nMarriott is pleased to announce that High Speed Internet Access is\nbeing rolled out in all Marriott hote
    l brands around the world.\nTo learn more or to find out whether your hotel has the service\navailable, please visit Marriott.com.\n\nEarn points toward free va
    cations, or frequent flyer miles\nfor every stay you make!  Just provide your Marriott Rewards\nmembership number at check in.  Not yet a member?  Join for free
     at\nhttps://member.marriottrewards.com/Enrollments/enroll.asp?source=MCRE\n\n",
            "filename" : "2.",
            "headers" : {
                    "Content-Transfer-Encoding" : "7bit",
                    "Content-Type" : "text/plain; charset=us-ascii",
                    "Date" : ISODate("2001-07-30T22:19:40Z"),
                    "From" : "reservations@marriott.com",
                    "Message-ID" : "<32788362.1075840323896.JavaMail.evans@thyme>",
                    "Mime-Version" : "1.0",
                    "Subject" : "84029698 Marriott  Reservation Confirmation Number",
                    "To" : [
                            "ebass@enron.com"
                    ],
                    "X-FileName" : "eric bass 6-25-02.PST",
                    "X-Folder" : "\\ExMerge - Bass, Eric\\Personal",
                    "X-From" : "Reservations@Marriott.com",
                    "X-Origin" : "BASS-E",
                    "X-To" : "EBASS@ENRON.COM",
                    "X-bcc" : "",
                    "X-cc" : ""
            },
            "mailbox" : "bass-e",
            "subFolder" : "personal"
    }
  • Now run following query to get the output
    • db.messages.find({"headers.From":"andrew.fastow@enron.com","headers.To":"jeff.skilling@enron.com"}).count()
  • Confirm you the result, I got “3” as an answer.
final_exam_question_1

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here