Jerry's Space: 04/01/2015

星期五, 4月 24, 2015

如何在Windows下安裝SciPy, matplotlib

在Windows下安裝SciPy, matplotlib 實在是很麻煩的事，上網查了一下，決定試用

http://conda.pydata.org/miniconda.html

Miniconda是Anaconda下的工具，Anaconda把很多在Windows需要編譯的package都編好了，直接安裝就可以，可以省很多事....相較之前的痛苦，一整個輕鬆愉快

把舊的python砍掉重練. 下戴miniconda後重新安裝....

安裝packeage方式如下:

$ conda install numpy

也可以產生像virtualenv的虛擬環境，解決dependency的問題

參閱https://gist.github.com/ccwang002/449159cc2a05b1011467

> conda create -n ngs python=2.7 pip

使用它很簡單就 activate ngs、deactivate，細節可以看 conda 的說明文件。總之在這邊

> activate ngs
Activating environment ngs ...
[ngs]> conda install numpy scipy
...
Proceed ([y]/n)?

在pycharm重設 interpter的路徑就好了:

http://unlikenoise.com/setup-pycharm-anaconda-python-windows/

Acer S7 鍵盤問題Solution

2015/5/13更新:
下戴zerofix後執行，此法更方便，不用設定半天....

https://dl.dropbox.com/u/26748522/zechofix.html

Reference:
http://community.acer.com/t5/Ultra-Thin/Acer-s7-keyboard-issue-with-repeating-characters/td-p/24407

Acer S7 的鍵盤常常會重覆前一個字元，例如: talkk, yearr...造成輸入文字時嚴重的問題。
上網查了一下，很多外國人都有反應這個問題，目前解法是:

基本上可以解決問題，缺點是按刪除鍵時不能一直壓著連續刪除，要改變使用習慣。

星期一, 4月 20, 2015

安裝Windows 版NumPy及SciPy

NumPy及SciPy在windows 安裝，都會有compile errors...

http://www.scipy.org/scipylib/download.html

要安裝的話，建議到sourceforge安裝預先compile的all-in-one版。

http://sourceforge.net/projects/numpy/files/

http://sourceforge.net/projects/scipy/files/

後記補充，由於這種方式安裝matplot等套件不方便，建議改用miniconda....

Python Text Classification using Naive Bayes and scikit-learn

Feature extraction (特徵擷取) [5]

CountVectorizer implements both tokenization (英文分詞) and occurrence counting(計算英文文字出現計數) in a single class:

>>>

>>> from sklearn.feature_extraction.text import CountVectorizer

This model has many parameters, however the default values are quite reasonable (please see the reference documentation for the details):

>>>

>>> vectorizer = CountVectorizer(min_df=1)
>>> vectorizer                     
CountVectorizer(analyzer=...'word', binary=False, decode_error=...'strict',
        dtype=<... 'numpy.int64'>, encoding=...'utf-8', input=...'content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 1), preprocessor=None, stop_words=None,
        strip_accents=None, token_pattern=...'(?u)\\b\\w\\w+\\b',
        tokenizer=None, vocabulary=None)

Let’s use it to tokenize and count the word occurrences of a minimalistic corpus of text documents:

說明:

fit函式代表tokenize，加入到字典陣列vocabulary

fit(raw_documents[, y]) Learn a vocabulary dictionary of all tokens in the raw documents.

transform函式用來計計算英文文字出現計數

transform(raw_documents) Transform documents to document-term matrix.

fit_transform(raw_documents[, y]) Learn the vocabulary dictionary and return term-document matrix.

>>>

>>> corpus = [
...     'This is the first document.',
...     'This is the second second document.',
...     'And the third one.',
...     'Is this the first document?',
... ]
>>> X = vectorizer.fit_transform(corpus)
>>> X                              
<4x9 matrix="" numpy.int64="" of="" sparse="" type="">'
    with 19 stored elements in Compressed Sparse ... format>

The default configuration tokenizes the string by extracting words of at least 2 letters. Each term found by the analyzer during the fit is assigned a unique integer index corresponding to a column in the resulting matrix. This interpretation of the columns can be retrieved as follows:

>>>

>>> vectorizer.get_feature_names() == (
...     ['and', 'document', 'first', 'is', 'one',
...      'second', 'the', 'third', 'this'])
True

>>> X.toarray()           
array([[0, 1, 1, 1, 0, 0, 1, 0, 1],
       [0, 1, 0, 1, 0, 2, 1, 0, 1],
       [1, 0, 0, 0, 1, 0, 1, 1, 0],
       [0, 1, 1, 1, 0, 0, 1, 0, 1]]...)

說明:
經過特徵擷取後，可以利用get_feature_names()取得特徵索引字串字典，接著對映到結果計數陣列。
呼叫X.toarray()可以看出，每個文件，例[0, 1, 1, 1, 0, 0, 1, 0, 1]所對映到的英文字計數....

'This is the first document.' ->['and', 'document', 'first', 'is', 'one','second', 'the', 'third', 'this']

get_feature_names() Array mapping from feature integer indices to feature name

References:

1. Machine Learning Tutorial: The Naive Bayes Text Classifier

2. Naive Bayes

3. Working With Text Data — scikit-learn 0.16.1 documentation

4. Text Classification

5. Feature extraction

Videos:

高一下數學3-0引言01什麼是機率

高一下數學3-3A觀念01條件機率的概念

星期四, 4月 16, 2015

node js tcp socekt end and close different

Good Reference: http://maxogden.com/node-streams.html

end(): Half-closes the socket. i.e., it sends a FIN packet. It is possible the server will still send some data. only closes the writing stream of the socket, the remote host can keep his writing stream open and send you data.

destroy(): Ensures that no more I/O activity happens on this socket. Only necessary in case of errors (parse error or so).

Event: 'end'#

Emitted when the other end of the socket sends a FIN packet.

By default (allowHalfOpen == false) the socket will destroy its file descriptor once it has written out its pending write queue. However, by setting allowHalfOpen == true the socket will not automatically end() its side allowing the user to write arbitrary amounts of data, with the caveat that the user is required to end() their side now.

Event: 'error'#

Error object

Emitted when an error occurs. The 'close' event will be called directly following this event.

Event: 'close'#

had_error Boolean true if the socket had a transmission error

Emitted once the socket is fully closed. The argument had_error is a boolean which says if the socket was closed due to a transmission error.