还原论 Reductionism

懒惰和纯净

懒惰 lazy 是一个值如果不需要,则不会计算,如

f x = f x   -- 会陷入无限循环
g x y = x

g 2 (f 1) ==> 2 -- 不会陷入无限循环,因为 (f 1) 没有求值

等价推理

对于相同的输入总是返回相同的值

无限列表

Prelude> repeat 1 -- 会陷入死循环
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
^C

Prelude> take 10 $ repeat 1 -- 不会陷入死循环
[1,1,1,1,1,1,1,1,1,1]
Prelude> repeat 1 !! 13337
1

运行原理

  • C 等语言从内向外求值
  • Haskell 从外向内求值

这是 Haskell 执行 length 的过程

length (map not (True:False:[]))
==> length (not True : map not (False:[]))
==> 1 + length (map not (False:[]))
==> 1 + length (not False : map not ([]))
==> 1 + (1 + length (map not []))
==> 1 + (1 + length [])
==> 1 + (1 + 0)
==> 1 + 1
==> 2

Haskell 将值求解为弱头范式 weak head normal form (WHNF),即一个可以被匹配的值,有三种情况:

  • 是一个常数,如 1
  • 有一个顶级构造,如 FalseJust (1+1)0:filter f xs
  • 是一个函数,如 (\x -> 1+x)

注:

  • 表达式的类不是 WHNF
  • 如果一个函数跟着参数,也不是 WHNF,必须能够模式匹配

当给一个名字值的时候,这个值被分享 share 了,如

f :: Int -> Int
f i = if i>10 then 10 else i
_______shared________
| |
f (1+1) ==> if (1+1)>10 then 10 else (1+1)
==> if 2>10 then 10 else 2
==> if False then 10 else 2
==> 2

自动分享等价表达式是一种叫 公共子表达式删除 Common Subexpression Elimination (CSE) 的优化

和懒惰结合起来,即一个名称最多计算一次

(||) :: Bool -> Bool -> Bool
True || _ = True
_ || x = x

可以看出,|| 强制左边的参数,即对它的左参数严格 strict in its left argument

所以

even :: Int -> Bool
even x = x == 0 || not (even (x-1)) -- 可行

even' x = not (even' (x-1)) || x == 0 -- 死循环

作用于无限列表

一些函数利用了懒惰求值,可以作用于无限列表:

everySecond :: [a] -> [a]
everySecond [] = []
everySecond (x:y:xs) = x : everySecond xs

take 10 (everySecond [0..]) ==> [0,2,4,6,8,10,12,14,16,18]

添加限制

foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f y [] = y
foldr f y (x:xs) = f x (foldr f y xs)

foldr (+) 0 [1,2,3] ==> 1+(2+(3+0))
foldl (+) 0 [1,2,3] ==> ((0+1)+2)+3

foldr 能懒惰求值,如

    foldr (&&) True [False,False,False]
==> False && (foldr (&&) True [False,False])
==> False

foldl (&&) True [False,False,False]
==> foldl (&&) (True&&False) [False,False]
==> foldl (&&) ((True&&False)&&False) [False]
==> foldl (&&) (((True&&False)&&False)&&False) []
==> ((True&&False)&&False)&&False
==> ( False &&False)&&False
==> False &&False
==> False

但是两者都需要在内存中建立表达式,造成了空间泄漏 space leak 的问题

解决方法是强制求值:

foldl'Int :: (Int -> Int -> Int) -> Int -> [Int] -> Int
foldl'Int f z [] = z
foldl'Int f 0 (x:xs) = foldl'Int f (f 0 x) xs
foldl'Int f z (x:xs) = foldl'Int f (f z x) xs

foldl'Int (+) 0 [1,2,3]
==> foldl'Int (+) (0+1) [2,3]
==> foldl'Int (+) 1 [2,3]
==> foldl'Int (+) (1+2) [3]
==> foldl'Int (+) 3 [3]
==> foldl'Int (+) (3+3) []
==> 3+3
==> 6

seq a b 函数返回 b 但会求值 a,如

Prelude> seq (not True) 3
3
Prelude> seq undefined 3
*** Exception: Prelude.undefined

新类型声明

newtype 在一定程度上可以代替 data

newtype Money = Cents Int

但其仅仅接受一个构造和一个参数,相当于在编译过程中,新定义的类型消失了

-- 代码:                                 内存:

data Money = Cents Int x --> Cents --> 100
x = Cents 100


newtype Money = Cents Int x --> 100
x = Cents 100

打结

  code                     memory

let xs = 1:2:xs xs -> (1:) -> (2:) -+
in xs ^ |
+------------+

Debug.Trace

trace "message" xx 一样,但是当计算时会打印 message

Debug.Trace 有很多 trace的变式,如 traceShowId x,它求值并打印 x.

RealWorld -> (a,RealWorld)

IO

IO a 是一个产生类型 a 的值的操作,如 getLine 是一个产生一个字符串的 IO 操作

IO 操作可以使用 do 来合并成更大的操作:

query :: IO ()
query = do
putStrLn "Write something!" -- 运行一个操作
s <- getLine -- 运行一个操作并获取返回值
let n = length s -- 运行一个纯函数
putStrLn ("You wrote "++show n++" characters") -- 运行一个操作,传递产生的值

和普通的函数一样,IO 操作也可以传递参数或返回一个 IO 操作

ask :: String -> IO String
ask question = do
putStrLn question
getLine

return

return :: a -> IO a 函数接受一个值,并把它变成一个产生那个值的操作

注意:return 不会停止操作的执行,如

produceTwo :: IO Int
produceTwo = do return 1
return 2
Prelude> produceTwo
2

do 和类型

do 块的最后一行决定了整个操作产生什么,所以必须是一个操作

你可以暂时打开 IO 盒子,但你必须返回它,即

What happens in IO, stays in IO.

控制结构

函数注释 使用实例 表示意义
print :: Show a => a -> IO () print x 相当于使用了 showputStrln
readLn :: Read a => IO a x <- readLn 相当于使用了 readgetLine
when :: Bool -> IO () -> IO () when b op 如果 b 为真,则执行 op
unless :: Bool -> IO () -> IO () unless b op 如果 b 为假,则执行 op
replicateM :: Int -> IO a -> IO [a] replicateM m op 执行 op m 次,收集结果
replicateM_ :: Int -> IO a -> IO () replicateM_ m op 执行 op m 次,抛弃结果
mapM :: (a -> IO b) -> [a] -> IO [b] 对每个列表元素做一些事情
mapM_ :: (a -> IO b) -> [a] -> IO () 对每个列表元素做一些事情,抛弃结果
forM :: [a] -> (a -> IO b) -> IO [b] mapM 相同,但是参数反转
forM_ :: [a] -> (a -> IO b) -> IO () mapM_ 相同,但是参数反转

do 和缩进

在一个 do 块中的所有操作必须在同一列开始

如果一个操作有多行,缩进紧跟着的行

它究竟意味着什么

一个操作是对一连串的副作用的纯净描述,只有执行操作才有副作用

在一个 Haskell 程序运行时,只有一个操作执行,即 main :: IO (),其它的操作必须链接到 main 中才可运行

IORef

实现对类型 a 的值的可变 mutable 引用 IORef a

newIORef :: a -> IO (IORef a)                -- 新建一个有一个值的 IORef
readIORef :: IORef a -> IO a -- 生成一个在 IORef 中的新值
writeIORef :: IORef a -> a -> IO () -- 在 IORef 中设置值
modifyIORef :: IORef a -> (a -> a) -> IO () -- 用一个纯函数修改 IORef 中的值

fmap fmap fmap

仿函数 functors

保持结构

map :: (a -> b) -> [a] -> [b] 可以写成 (a -> b) -> ([a] -> [b]) 表示 map 把函数 g :: a -> b 转成 map g :: [a] -> [b],即 map 是把函数转为函数的高阶函数 higher-order function

对于二叉树

data Tree a = Leaf | Node a (Tree a) (Tree a)

mapTree :: (a -> b) -> Tree a -> Tree b
mapTree f Leaf = Leaf
mapTree f (Node val left right) = Node (f val) (mapTree f left) (mapTree f right)

二叉树看起来像这样:

mapTree g 后,这棵树看起来像这样:

Functor

f类型构造函数

class Functor f where
fmap :: (a -> b) -> f a -> f b

Maybe 实例:

instance Functor Maybe where
fmap f Nothing = Nothing
fmap f (Just x) = Just (f x)

fmap 的中缀版本:<$>

reverse . tail  $       "hello"       ==>  "olle"
reverse . tail <$> Just "hello" ==> Just "olle"
-- 和以下相同
fmap (reverse . tail) (Just "hello") ==> Just "olle"

合法的实例

对于任何仿函数实例 f,必须满足:

  • fmap id === id
  • fmap (f . g) === fmap f . fmap g

但可惜的是,编译器不会检查我们的实现是否满足这两个规则

种类 kinds

Kinds 是类型的类型

  • IntBoolMaybe Int 能够容纳 kind 为 * 的值

  • Maybe 能容纳 kind * -> *

Functor 的实例的 kind 为 * -> *

Foldable

class Foldable (t :: *->*) where
foldr :: (a -> b -> b) -> b -> t a -> b

Foldable 从左到右处理元素,即 Functor 是容器的类,Foldable有序容器的类

问题范畴中的一个幺半群 A Monoid in the Category of Problems

Monad 类

由来:我们需要连接函数

class Monad m where
(>>=) :: m a -> (a -> m b) -> m b
-- 把一个正常值变成 monad
return :: a -> m a
-- 更简单的链接
(>>) :: m a -> m b -> m b
a >> b = a >>= \_ -> b -- 即忽略了前者参数的 >>=

Maybe 是 monad:

instance  Monad Maybe  where
(Just x) >>= k = k x
Nothing >>= _ = Nothing

-- 由默认定义可以看出,>> 可以不用写
(Just _) >> k = k
Nothing >> _ = Nothing

return x = Just x

do 的返回

这是用 monad 写的

f = op1 >>= \x ->
op2 >>
op3 >>= \y ->
op4 >>
op5 x y

这是用 do 写的

f = do x <- op1
op2
y <- op3
op4
op5 x y

可以观察到,do 就是 monad 和 lambda 的一种语法,两者一一对应

do x <- op a       ~~~>       op a >>= \x -> do ...
do op a ~~~> op a >> do ...
do let x = expr ~~~> let x = expr in do ...
do finalOp ~~~> finalOp

monad 的一个实例

import Control.Monad

data Logger a = Logger [String] a deriving Show

msg :: String -> Logger ()
msg s = Logger [s] ()

-- The Functor instance just maps over the stored value
instance Functor Logger where
fmap f (Logger log x) = Logger log (f x)

-- This is an Applicative instance that works for any
-- monad, you can just ignore it for now. We'll get back
-- to Applicative later.
instance Applicative Logger where
pure = return
(<*>) = ap

-- Finally, the Monad instance
instance Monad Logger where
return x = Logger [] x
Logger la a >>= f = Logger (la++lb) b -- 连接两个 monad 时会合并 msg,保留第二个值
where Logger lb b = f a

State Monad

data State s a = State (s -> (a,s))

runState (State f) s = f s

-- 覆盖 state 的操作 (并生成 ())
put :: s -> State s ()
put state = State (\oldState -> ((),state))

-- 生成当前 state 的操作
get :: State s s
get = State (\state -> (state,state))

-- 用一个函数修改 state 的操作 (并生成 ())
modify :: (s -> s) -> State s ()
modify f = State (\state -> ((), f state))

-- Functor and Applicative instances skipped

instance Monad (State s) where
return x = State (\s -> (x,s))

op >>= f = State h
where h state0 = let (val,state1) = runState op state0
op2 = f val
in runState op2 state1

mapM 的回归

这里说过的所有控制结构都支持 monad

更进一步说,IO 也是 monad

monad 是仿函数

liftM :: Monad m => (a->b) -> m a -> m b
liftM f op = do x <- op
return (f x)

看起来很眼熟,fmap = liftM

[] Monad

列表也是一个 monad

findSum xs k = do a <- xs
b <- xs
if (a+b==k) then [(a,b)] else []

findSum :: [Int] -> Int -> [(Int,Int)]
findSum xs k = [(a,b) | a <- xs, b <- xs, a+b==k ]

可以看出,列表解释本质上就是一个 monad 的简写

让我们使用一些库 Let's Use Some Libraries

TextByteString

因为 String 只是表示 [Char],不够高效

  • Data.Text 表示 Unicode 字符序列,类似 String,但更高效
  • Data.ByteString 表示字节序列,用于处理二进制数据

两者都有懒惰严格的变式

两者都有 packunpack 函数将普通的字符串转成该类型或取消,也有和列表类似的函数,如 maptakereverse

导入方法:``import qualified Data.Text as T`

Text 有 monoid 实例,可以使用 T.append<>T.concat 连接

也支持模式匹配:(T.uncons :: T.Text -> Maybe (Char, T.Text)

countLetter :: Char -> T.Text -> Int
countLetter c t =
case T.uncons t of
Nothing -> 0
Just (x,rest) -> (if x == c then 1 else 0) + countLetter c rest

懒惰的变式:Data.Text.Lazy

TL.toStrict 变成严格,TL.fromStrict 变成懒惰

ByteString 是由 Word8 组成的,只支持 0-255

encodeUtf8 将文本编码成使用 UTF-8 的 ByteString

写一个 HTTP 服务器: WAI 和 Warp

module Examples.HelloServer where

import qualified Data.ByteString.Lazy.Char8 as BL
import Network.HTTP.Types.Status (status200)
import Network.Wai (Application, responseLBS)
import Network.Wai.Handler.Warp (run)

port :: Int
port = 3421

main :: IO ()
main = run port application

-- type Application = Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived
application :: Application
application request respond =
respond (responseLBS status200 [] (BL.pack "Hello World!"))

pathInfo :: Request -> [Text] 将请求 http://example.com/abcd/ef/file 转为文本 /abcd/ef/file

数据库:sqlite-simple

Prelude> import Database.SQLite.Simple
Prelude Database.SQLite.Simple> :t open
open :: String -> IO Connection
Prelude Database.SQLite.Simple> db <- open "example.sqlite"

-- 数据库查询
Prelude Database.SQLite.Simple> import qualified Data.Text as T
Prelude Database.SQLite.Simple T> q = Query (T.pack "SELECT 1;")
Prelude Database.SQLite.Simple T> res <- query_ db q :: IO [[Int]]
Prelude Database.SQLite.Simple T> res

Prelude Database.SQLite.Simple T> input = (1,"hello") :: (Int,String)
-- 参数化查询,参数用 ?
Prelude Database.SQLite.Simple T> parameterized = Query (T.pack "SELECT ?+1, true, ?;")
Prelude Database.SQLite.Simple T> query db parameterized input :: IO [(Int,Bool,String)]
[(2,True,"hello")]

-- 如果不需要查询的结果,即把东西插入数据库中
execute_ :: Connection -> Query -> IO ()
execute :: ToRow q => Connection -> Query -> q -> IO ()

即使没有单子你也是有效的 You're Valid Even Without Monads

介绍 Applicatives

class Functor f => Applicative f where
pure :: a -> f a
liftA2 :: (a -> b -> c) -> f a -> f b -> f c

可以看出,Applicative 类介于 FunctorMonad 之间

liftA2 接受两个值,而不是 fmap :: (a -> b) -> f a -> f b 接受一个值

对于 Maybe 的实例

instance Applicative Maybe where
pure x = Just x
liftA2 f (Just x) (Just y) = Just (f x y)
liftA2 f _ _ = Nothing

列表 Applicative

instance Applicative [] where
pure x = [x]
liftA2 f xs ys = [f x y | x <- xs, y <- ys]

新的操作符

<$>fmap 的前缀版本

(<$>) :: Functor f => (a -> b) -> f a -> f b
f <$> x = fmap f x
not <$> Just True ==> Just False

<*> 是 applicative 版本的函数应用

(<*>) :: Applicative f => f (a -> b) -> f a -> f b
Just not <*> Just True ==> Just False

两者组合:

say :: String -> Int -> String -> String
say x i y = x ++ " has " ++ show i ++ " " ++ y
say <$> Just "haskell" <*> Just 99 <*> Just "operators"
==> Just "haskell has 99 operators"

另外两个操作符

(*>) :: Applicative f => f a -> f b -> f b
x *> y = liftA2 (\a b -> b) x y

(<*) :: Applicative f => f a -> f b -> f a
x <* y = liftA2 (\a b -> a) x y

意思是两个操作都执行,但是只保留一个结果

Validation Applicative

Validation 用于收集错误信息

data Validation a = Ok a | Errors [String]
deriving (Show,Eq)

两个函数常用于构造的函数

invalid :: String -> Validation a
invalid err = Errors [err]

check :: Bool -> String -> a -> Validation a
check b err x
| b = pure x
| otherwise = invalid err

有效化列表: traverse

想要实现一个

allPositive [1,-2,3,-4]
==> Errors ["Not positive: -2","Not positive: -4"]

如果是一个 monad,可以直接 mapM

mapM (\x -> if x>=0 then Just x else Nothing) [1,2,3]
==> Just [1,2,3]

Applicative 的等价函数是 traverse,是类 Traversable 的成员

traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)

Traversable

class (Functor t, Foldable t) => Traversable t where
traverse :: Applicative f => (a -> f b) -> t a -> f (t b)
mapM :: Monad m => (a -> m b) -> t a -> m (t b)

处理失败:Alternative

class Applicative f => Alternative f where
empty :: f a -- 没有结果
(<|>) :: f a -> f a -> f a -- 结合结果

Monads and Applicatives

如果一个 Applicative 同时是一个 Monad,则

pure             === return

fmap === liftM

fmap f op === do x <- op
return (f x)

liftA2 === liftM2

liftA2 f op1 op2 === do x <- op1
y <- op2
return (f x y)

op1 *> op2 === op1 >> op2

op1 <*> op2 === do f <- op1
x <- op2
return (f x)

零碎 Odds and Ends

QuickCheck 测试

QuickCheck 被设计用来基于性质的测试 property-based testing,如 reverse 有性质应用两次后列表不变,即

propRevTwice :: [Int] -> Property
propRevTwice xs = rev (rev xs) === xs

*Examples.QuickCheck> quickCheck propRevTwice
+++ OK, passed 100 tests.

使用 verboseCheck 查看运行的值:

*Examples.QuickCheck> verboseCheck propRevTwice
Passed:
[]
[] == []

Passed:
[1]
[1] == [1]

Passed:
[-2,1,-1]
[-2,1,-1] == [-2,1,-1]
-- lots of output
+++ OK, passed 100 tests.

修饰符 ModifiersNonEmptyList 保证了给定的列表非空

propLastFixed :: NonEmptyList Int -> Property
propLastFixed (NonEmpty xs) = last xs === head (reverse xs)

还有其它的修饰符,如 PositiveNonNegativeSortedList

propToUpperChangesLetter :: Property
propToUpperChangesLetter = forAll (elements ['a'..'z']) propToUpperChanges

Gen 是一个 monad

someLetters :: Gen String
someLetters = do
c <- elements "xyzw"
n <- choose (1,10)
return (replicate n c)

可以用 sample 检查输出

*Examples.QuickCheck> sample someLetters
"yyyyyyyy"
"zzzzzzzzz"
"xxxxxxxxx"
"yyyyyyy"
"yyy"
"ww"
"xxxxxx"
"yyy"
"yyyyyyy"
"xxxxxxxxxx"
"y"

Arbitrary 类是 quickcheck 自动生成所有输入的方法

class Arbitrary a where
arbitrary :: Gen a
shrink :: a -> [a]

对于自定义类型,要么使用 forAll,要么实现 Arbitrary 实例

data Switch = On | Off
deriving (Show, Eq)

toggle :: Switch -> Switch
toggle On = Off
toggle Off = On

propToggleTwice :: Switch -> Property
propToggleTwice s = s === toggle (toggle s)

*Examples.QuickCheck> quickCheck propToggleTwice -- 会出错

*Examples.QuickCheck> quickCheck (forAll (elements [On,Off]) propToggleTwice) -- 要么使用 forAll

instance Arbitrary Switch where -- 要么实现 Arbitrary 实例
arbitrary = elements [On,Off]

幻影类型

data EUR
data USD
data Money currency = Money Double -- currency 是约束
deriving Show

dollar :: Money USD
dollar = Money 1

twoEuros :: Money EUR
twoEuros = Money 2

-- 必须要函数注释,接受两个相同的 Money,返回的 Money 类型相同
addMoney :: Money currency -> Money currency -> Money currency
addMoney (Money a) (Money b) = Money (a+b)

应用:确保输入数据库的名称是安全的

data Safe
data Unsafe

data Input a = Input String

-- Public constructor function for Input, only allows constructing
-- Unsafe Inputs from Strings.
makeInput :: String -> Input Unsafe
makeInput xs = Input xs

-- Adds comment to the database.
addForumComment :: Input Safe -> IO Result
addForumComment = ...

-- We can combine inputs, but that won't change their safety
concatInputs :: Input a -> Input a -> Input a
concatInputs (Input xs) (Input ys) = Input (xs++ys)

-- Strip bad characters to turn an unsafe input safe
escapeInput :: Input Unsafe -> Input Safe
escapeInput (Input xs) = Input (filter (\c -> isAlpha c || isSpace c) xs)

同时性

平行 Parallelism:同时运行两个独立的计算,即平行是纯净

并发 Concurrent:多个交互的线程

因为函数是纯净的,平行实现很简单:

stack ghci --ghci-options "+RTS -N" -- 以平行方式打开 gchi
GHCi> fib 0 = 1; fib 1 = 1; fib n = fib (n-1) + fib (n-2)

GHCi> :set +s -- 设定计时器
GHCi> map fib [29,29,29,29,29]
[832040,832040,832040,832040,832040]
(7.54 secs, 2,440,860,632 bytes)

GHCi> import Control.Parallel.Strategies
Prelude Control.Parallel.Strategies> withStrategy (parList rseq) (map fib [29,29,29,29,29])
[832040,832040,832040,832040,832040]
(4.80 secs, 488,531,384 bytes)

可以发现,耗时明显减少了

一个简单的并发:

printA :: IO ()
printA = putStrLn (replicate 40 'A')

printB :: IO ()
printB = putStrLn (replicate 40 'B')

concurrency :: IO ()
concurrency = do
forkIO printA
forkIO printB
return ()

forkIO :: IO () -> IO ThreadId 接受 IO 操作并在背景中运行

两个线程之间交互的例子:

takeMVar :: MVar a -> IO a
putMVar :: MVar a -> a -> IO ()
newEmptyMVar :: IO (MVar a)
send :: [String] -> MVar String -> IO ()
send values var = mapM_ (putMVar var) values

receive :: MVar String -> IO ()
receive var = do val <- takeMVar var
print val
-- loop unless at last value
when (val/="end") (receive var)

concurrency2 :: IO ()
concurrency2 = do
var <- newEmptyMVar
forkIO (send ["hello","world","and","goodbye","end"] var)
forkIO (receive var)
return ()