Skip Navigation Links

 

I am planning on adding comments, tutorials, code, information about Artificial Intelligence, even as I explore the subject myself.  While it will not be all-encompassing, I think it will be valuable because I will be posting comments, ideas, and discoveries I make as I go along.  My comments will be structured as "Day 1, Day2, etc., where the meaning of days is points where there was a new discovery or event in the path of learning. . .

Day 1:   My plan was to collect data from an online database and code a Naive Bayes Classifier . . . . So I found several data sources, here  are some good ones, by the  way:

Dataset Sources:   https://github.com/caesar0301/ . . . . https://www.kaggle.com/datasets . .  . . http://reddit.com/r/datasets

However, I ran into a brick wall, because the data I wanted to use, when I found it, was in a HUGE table, with thousands of columns and rows!  So this cost me a lot of time, first searching for a smaller dataset, and eventually concluding that the real solution was to find a way to reduce the size of the dataset, or find a smaller  dataset on a different subject than what I was originally looking for.  (I wanted to do something original.)  So of the two solutions, I wanted to discover how to reduce  the size of a  dataset, to perhaps two to three columns, only the interesting columns, and perhaps 100 rows.   So through my own  recognizance, I figured out that you can open  a csv (comma separated values) file in Open Office, and filter or delete rows  and columns.  At that point it was later in the day I had reached the point of saturation, and was done for the  day.   But there is one more thing I want to add.   Siraj Raval has a video on youtube about how to reduce datasets, I would suppose there is also more info on this subject using a youtube or google search, but here is his video on this subject.   https://www.youtube.com/watch?v=0xVqLJe9_CY&index=4&list=PL2-dafEMk2A7EEME489DsI468AB0wQsMV    By the  way Siraj is an excellent source  for tutorials on AI!  Spent the rest of the day watching more AI tutorials on youtube . . . . loading information . . .  loading . . . .  loading . . . .  loading . . .

Day 2.   I have two servers, with 16 quad-core  processors, so it was time to install Python and Tensorflow on them.  No problem . . . . ha ha.  Because this is cutting edge stuff, there  are  a lot of dependencies!   And, because  you  can't just install a pre-built version of Tensorflow on a multiprocessor server and expect it to use all  of the processors, you have to compile Tensorflow!   This is not a concern on a single-processor PC.  So far, it has been a battle so epic, that I am going to start setting up the second server tomorrow, because what I have done so far is still fresh in my mind, and it will help me solidify my knowledge of what needs to be done to do this.   (The first server is almost there, but it appears I will need to uninstall Tensorflow, then compile.) Watching more of Siraj's videos, and,  I found a pretty clear explanation of what exactly a neural network is here:  Introduction To Tensorflow.   So a tensor is really  just a weight between nodes, and the matrix is built from those tensors.   While I have done some coding, I have not been pushing that end of it very hard, because I want to keep loading more information and trying to get a better grasp on how it all works, and learn the  tools better, so Siraj's  videos help a lot . . . . still loading . . .

 Day 3.   Windows and Tensorflow are like archenemies . . . they seem to hate each other.  I have yet to get them to work together in Pycharm, although I did succeed in getting the console to work with Tensorflow.  For Tensorflow in  Pycharm, I have so far resorted to my Linux (Fedora 25 to be exact), and that was easier to work with.   With Siraj's videos, I am discovering that some of the  code is dysfunctional, so I have  been spending some time  cleaning it up to get it to work, which is OK, as it gets me to look at the  code and consider what it is doing, so I am learning more about Python and its packages.  The packages are  not straightforward in some cases . . . first, if you are  required  to use pip to install them, it appears that you  need to make sure you use  the version of pip that matches the version of Python you are  using.  (versions of pip are located in the /usr/bin folder as  far as I know.)  And,  some of the  projects you download may  be done in Python 2.7, some in Python 3.5.2, and some in Python 3.6!   A valuable thing I have learned previously, is that you  can do searches for packages, and use wild cards.   Interestingly,  many references use yum to install, but Fedora 25 uses dnf.   I will not comment on yum, other than that I don't quite understand why some Linux versions use  it,  and some  use  dnf.  What I do know though is that to do a search with dnf, the syntax is: dnf search keyword  . . . you can use wild cards, or, quotation marks.   If you are  searching for something that is difficult, like cv2, you can also do so using keywords that you would expect to find in the  description, if you put them in quotes, which found the cv2  package for me using "computer vision".   I was able to get Siraj's web scraper up and running in Pycharm, although it seemed to balk at loading Russian text, which was a dissappointment.  I want to address that issue, but not now, as I will move on to another project so I can progress as fast as  possible.  

Day  4.  If you are wanting to get up and running in this field, I have a book recommendation for you.   Python for Data Science For Dummies John Wiley & Sons 2015.   This book has a lot of useful information including how to get and format data, what to look  out for in data, like duplicate records, etc, and, a lot of code that is in simple-to-understand form.   Additionally, I want to mention that you can register with and contact Robert Half to gain free access to Books 24x7, which then gives you access to a lot of books for free, so you can  study online.   The instruction on how to access skillport are here, but you likely will need to contact Robert Half to get registered, using the  email  address in the following information:  https://www.roberthalf.ca/sites/roberthalf.ca/files/RH-PDFs/logon_instructions_113007_2.pdf      At this point, I am reading a lot of information from  the aforementioned book, still loading more information . .  . .

Day 5.  Data!   My, my . . . you'll be wondering what you got yourself into when you start looking at the data.  Right now, I am working on a program that can scrape data off of a PDF file, and it has proven to be quite difficult to implement, although cetainly not impossible.  The difficult part has been tweeking the  data to land it in either a comma separated value file (CSV file) or an excel file.   The problem with CSV files seems to be lack of standardization, and that really slowed things down when it came to learning about them, and how to create them.  I can't yet say that it has been a total success, but progress has definitely been made.   Compared to CSV files, it looks like Excel files are much easier to deal with, and there is a Python package  you can download to help you out with it, along with instructions . . . XlsxWriter . . . So while  I am working on the data end of things, I am also learning more linear algebra, and taking an online Machine Learning course . . . . Coursera . . . from Stanford University, which seems to be quite interesting, and I would recommend it so far . . . in conclusion, clearly, the biggest challenge appears to be dealing with data.  For that reason, having a working Python program that can deal with importing the data and creating both CSV files and (preferrably) excel files will be one of my priorities.  Without a good way  to wrangle  data, not much is likely to happen.

 

Copyright © 2012-2017 by TachuFind Color Master