刷题: November 2014

Wednesday, November 26, 2014

JMockit

    final static String HTTP_RESP = "Test HTTP response\nHello World!\n";

    @Test
    public void test(@Mocked final URLConnection conn, @Mocked final URL url) throws IOException {

        new NonStrictExpectations() {
            {
                url.openConnection(); returns(conn);
                conn.getInputStream(); returns(new ByteArrayInputStream(HTTP_RESP.getBytes()));
            }
        };

        Assert.assertEquals(HTTP_RESP, getResponseStr());
    }

    public String getResponseStr() throws IOException {
        StringBuilder sb = new StringBuilder();
        URL url = new URL("http://www.google.com");
        URLConnection conn = url.openConnection();
        BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
        String line = null;
        while((line = reader.readLine()) != null) {
            sb.append(line).append("\n");
        }
        return sb.toString();
    }

http://abhinandanmk.blogspot.com/2012/06/jmockit-tutoriallearn-it-today-with.html

Type Safety

Type safety is closely linked to memory safety, a restriction on the ability to copy arbitrary bit patterns from one memory location to another. For instance, in an implementation of a language that has some type t, such that some sequence of bits (of the appropriate length) does not represent a legitimate member of t, if that language allows data to be copied into a variable of type t, then it is not type-safe because such an operation might assign a non-t value to that variable. Conversely, if the language is type-unsafe to the extent of allowing an arbitrary integer to be used as a pointer, then it is not memory-safe.
Java
The Java language is designed to enforce type safety. Anything in Java happens inside an object and each object is an instance of a class.

To implement the type safety enforcement, each object, before usage, needs to be allocated. Java allows usage of primitive types but only inside properly allocated objects.

Sometimes a part of the type safety is implemented indirectly: e.g. the class BigDecimal represents a floating point number of arbitrary precision, but handles only numbers that can be expressed with a finite representation. The operation BigDecimal.divide() calculates a new object as the division of two numbers expressed as BigDecimal.

In this case if the division has no finite representation, as when one computes e.g. 1/3=0.33333..., the divide() method can rise an exception if no rounding mode is defined for the operation. Hence the library, rather than the language, guarantees that the object respects the contract implicit in the class definition.

C++
Some features of C++ that promote more type-safe code:

    The new operator returns a pointer of type based on operand, whereas malloc returns a void pointer.
    C++ code can use virtual functions and templates to achieve polymorphism without void pointers.
    Preprocessor constants (without type) can be rewritten as const variables (typed).
    Preprocessor macro functions (without type) can be rewritten as inline functions (typed). The flexibility of accepting and returning different types can still be obtained by function overloading.
    Safer casting operators, such as dynamic_cast that performs run-time type checking.

Friday, November 21, 2014

Java 7 new features

Diamond Operator
Map<String, List<Trade>> trades = new TreeMap <> ();

Using strings in switch statements
compared against the case label by using the String.equals() method

Automatic resource management
Resources such as Connections, Files, Input/OutStreams, etc. should be closed manually by the developer by writing bog-standard code.

try (BufferedReader br = new BufferedReader(new FileReader(path)); PrintWriter pw = ...) {
pw.println(br.readLine());
}

Behind the scenes, the resources that should be auto closed must implement java.lang.AutoCloseable interface.

Numeric literals with underscores
int million = 1_000_000

Improved exception handling
try {
} catch(ExceptionOne | ExceptionTwo | ExceptionThree e) {
}

New file system API (NIO 2.0)
There were methods such as delete or rename that behaved unexpected in most cases. Working with symbolic links was another issue.
The NIO 2.0 has come forward with many enhancements. It’s also introduced new classes to ease the life of a developer when working with multiple file systems.
Path path = Paths.get("c:\\Temp\\temp");
Files.deleteIfExists(path);
Files.copy(..)
Files.move(..)
Files.createSymbolicLink(..)

File change notifications
The WatchService API lets you receive notification events upon changes to the subject (directory or file).

Fork and Join
Basically the Fork-Join breaks the task at hand into mini-tasks until the mini-task is simple enough that it can be solved without further breakups. It’s like a divide-and-conquer algorithm. One important concept to note in this framework is that ideally no worker thread is idle. They implement a work-stealing algorithm in that idle workers “steal” the work from those workers who are busy.
The core classes supporting the Fork-Join mechanism are ForkJoinPool and ForkJoinTask. The ForkJoinPool is basically a specialized implementation of ExecutorService

Supporting dynamism
This makes VM changes to incorporate non-Java language requirements. A new package, java.lang.invoke, consisting of classes such as MethodHandle, CallSite and others, has been created to extend the support of dynamic languages.

http://radar.oreilly.com/2011/09/java7-features.html

Tuesday, November 11, 2014

Two phase commit

2PC - A feature of transaction processing systems that enables databases to be returned to the pre-transaction state if some error condition occurs. A single transaction can update many different databases. The two-phase commit strategy is designed to ensure that either all the databases are updated or none of them, so that the databases remain synchronized.

Database changes required by a transaction are initially stored temporarily by each database. The transaction monitor then issues a "pre-commit" command to each database which requires an acknowledgment. If the monitor receives the appropriate response from each database, the monitor issues the "commit" command, which causes all databases to simultaneously make the transaction changes permanent.

Commit request phase

or voting phase

The coordinator sends a query to commit message to all cohorts and waits until it has received a reply from all cohorts.
The cohorts execute the transaction up to the point where they will be asked to commit. They each write an entry to their undo log and an entry to their redo log.
Each cohort replies with an agreement message (cohort votes Yes to commit), if the cohort's actions succeeded, or an abort message (cohort votes No, not to commit), if the cohort experiences a failure that will make it impossible to commit.

Commit phase

or Completion phase

Success

If the coordinator received an agreement message from all cohorts during the commit-request phase:

The coordinator sends a commit message to all the cohorts.
Each cohort completes the operation, and releases all the locks and resources held during the transaction.
Each cohort sends an acknowledgment to the coordinator.
The coordinator completes the transaction when all acknowledgments have been received.

Failure

If any cohort votes No during the commit-request phase (or the coordinator's timeout expires):

The coordinator sends a rollback message to all the cohorts.
Each cohort undoes the transaction using the undo log, and releases the resources and locks held during the transaction.
Each cohort sends an acknowledgement to the coordinator.
The coordinator undoes the transaction when all acknowledgements have been received.

http://en.wikipedia.org/wiki/Two-phase_commit_protocol

Y2038 Bug

The furthest time that a signed 32-bit integer can represent the Unix time format is 03:14:07 UTC on Tuesday, 19 January 2038 (2147483647 seconds after 1 January 1970).
As of 2012, most embedded systems use 8-bit or 16-bit microprocessors, even as desktop systems are transitioning to 64-bit systems.
MySQL database's inbuilt functions like UNIX_TIMESTAMP() will return 0 after 03:14:07 UTC on 19 January 2038.

Many data structures in use today have 32-bit time representations embedded into their structure. A full list of these data structures is virtually impossible to derive but there are well-known data structures that have the Unix time problem:

    file systems (many filesystems use only 32 bits to represent times in inodes)
    binary file formats (that use 32-bit time fields)
    databases (that have 32-bit time fields)
    database query languages, like SQL that have UNIX_TIMESTAMP() like commands
    COBOL systems of 1970s - 1990s vintage that have not been replaced by 2038-compliant systems
    embedded factory, refinery control and monitoring subsystems
    assorted medical devices
    assorted military devices

Each one of these places where data structures using 32-bit time are in place has its own risks related to failure of the product to perform as designed.

There is no universal solution for the Year 2038 problem. Any change to the definition of the time_t data type would result in code compatibility problems in any application in which date and time representations are dependent on the nature of the signed 32-bit time_t integer. For example, changing time_t to an unsigned 32-bit integer, which would extend the range to the year 2106, would adversely affect programs that store, retrieve, or manipulate dates prior to 1970, as such dates are represented by negative numbers. Increasing the size of the time_t type to 64-bit in an existing system would cause incompatible changes to the layout of structures and the binary interface of functions.

Friday, November 7, 2014

LRU cache

import java.util.LinkedHashMap;
import java.util.Collection;
import java.util.Map;
import java.util.ArrayList;

/**
* The generic implementation of a LRU (last recently used) cache
* @author Yaniv Erel
*
* @param <K> Key
* @param <V> Value
*/
public class LRUCache<K, V> {

    private static final float hashTableLoadFactor = 0.75f;

    private LinkedHashMap<K, V> map;
    private int cacheSize;

    /**
    * Creates a new LRU (last recently used) cache.
    *
    * @param cacheSize
    *            the maximum number of entries that will be kept in this cache.
    */
    public LRUCache(int cacheSize) {
        this.cacheSize = cacheSize;
        int hashTableCapacity = (int) Math.ceil(cacheSize / hashTableLoadFactor) + 1;
        map = new LinkedHashMap<K, V>(hashTableCapacity, hashTableLoadFactor,
                true) { // accessOrder==true
            // (an anonymous inner class)
            private static final long serialVersionUID = 1;

            @Override
            protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
                return size() > LRUCache.this.cacheSize;
            }
        };
    }

    /**
    * Retrieves an entry from the cache.<br>
    * The retrieved entry becomes the MRU (most recently used) entry.
    *
    * @param key
    *            the key whose associated value is to be returned.
    * @return the value associated to this key, or null if no value with this
    *         key exists in the cache.
    */
    public synchronized V get(K key) {
        return map.get(key);
    }

    /**
    * Adds an entry to this cache. The new entry becomes the MRU (most recently
    * used) entry. If an entry with the specified key already exists in the
    * cache, it is replaced by the new entry. If the cache is full, the LRU
    * (least recently used) entry is removed from the cache.
    *
    * @param key
    *            the key with which the specified value is to be associated.
    * @param value
    *            a value to be associated with the specified key.
    */
    public synchronized void put(K key, V value) {
        map.put(key, value);
    }

    /**
    * Clears the cache.
    */
    public synchronized void clear() {
        map.clear();
    }

    /**
    * Returns the number of used entries in the cache.
    *
    * @return the number of entries currently in the cache.
    */
    public synchronized int usedEntries() {
        return map.size();
    }

    /**
    * Returns a <code>Collection</code> that contains a copy of all cache
    * entries.
    *
    * @return a <code>Collection</code> with a copy of the cache content.
    */
    public synchronized Collection<Map.Entry<K, V>> getAll() {
        return new ArrayList<Map.Entry<K, V>>(map.entrySet());
    }
}

Doubly linked list + Hashtable solution
http://www.programcreek.com/2013/03/leetcode-lru-cache-java/

Wednesday, November 5, 2014

JVM parameters

所有已制定的HotSpot内存管理和垃圾回收算法都基于一个相同的堆内存划分：新生代（young generation）里存储着新分配的和较年轻的对象，老年代（old generation）里存储着长寿的对象。在此之外，永久代（permanent generation）存储着那些需要伴随整个JVM生命周期的对象，比如，已加载的对象的类定义或者String对象内部Cache。

-XX:InitialHeapSize/-Xms & -XX:MaxHeapSize/-Xmx
$ java -XX:InitialHeapSize=128m -XX:MaxHeapSize=2g MyApp

-XX:+HeapDumpOnOutOfMemoryError & -XX:HeapDumpPath
$ java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof -XX:OnOutOfMemoryError ="sh ~/cleanup.sh" MyApp

-XX:PermSize and -XX:MaxPermSize
永久代在堆内存中是一块独立的区域，它包含了所有JVM加载的类的对象表示。请注意，这里设置的永久代大小并不会被包括在使用参数-XX:MaxHeapSize 设置的堆内存大小中。

-XX:InitialCodeCacheSize and -XX:ReservedCodeCacheSize
JVM一个有趣的，但往往被忽视的内存区域是“代码缓存”，它是用来存储已编译方法生成的本地代码。

-XX:+UseCodeCacheFlushing
如果代码缓存不断增长，例如，因为热部署引起的内存泄漏，那么提高代码的缓存大小只会延缓其发生溢出。为了避免这种情况的发生，我们可以尝试一个有趣的新参数：当代码缓存被填满时让JVM放弃一些编译代码。不过，我仍建议尽快解决代码缓存问题发生的根本原因，如找出内存泄漏并修复它。

新生代存在的唯一理由是优化垃圾回收(GC)的性能。更具体说，把堆划分为新生代和老年代有2个好处：简化了新对象的分配(只在新生代分配内存),可以更有效的清除不再需要的对象(即死对象)(新生代和老年代使用不同的GC算法)
很多对象的生存时间都很短。同时研究发现，新生对象很少引用生存时间长的对象。结合这2个特点，很明显 GC 会频繁访问新生对象，例如在堆中一个单独的区域，称之为新生代。在新生代中，GC可以快速标记回收”死对象”，而不需要扫描整个Heap中的存活一段时间的”老对象”。
SUN/Oracle 的HotSpot JVM 又把新生代进一步划分为3个区域：一个相对大点的区域，称为”伊甸园区(Eden)”；两个相对小点的区域称为”From 幸存区(survivor)”和”To 幸存区(survivor)”。按照规定,新对象会首先分配在 Eden 中(如果新对象过大，会直接分配在老年代中)。在GC中，Eden 中的对象会被移动到survivor中，直至对象满足一定的年纪(定义为熬过GC的次数),会被移动到老年代。
基于大多数新生对象都会在GC中被收回的假设。新生代的GC 使用复制算法。在GC前To 幸存区(survivor)保持清空,对象保存在 Eden 和 From 幸存区(survivor)中，GC运行时,Eden中的幸存对象被复制到 To 幸存区(survivor)。针对 From 幸存区(survivor)中的幸存对象，会考虑对象年龄,如果年龄没达到阀值(tenuring threshold)，对象会被复制到To 幸存区(survivor)。如果达到阀值对象被复制到老年代。复制阶段完成后，Eden 和From 幸存区中只保存死对象，可以视为清空。如果在复制过程中To 幸存区被填满了，剩余的对象会被复制到老年代中。最后 From 幸存区和 To幸存区会调换下名字，在下次GC时，To 幸存区会成为From 幸存区。
总结一下，对象一般出生在Eden区，年轻代GC过程中，对象在2个幸存区之间移动，如果对象存活到适当的年龄，会被移动到老年代。当对象在老年代死亡时，就需要更高级别的GC，更重量级的GC算法(复制算法不适用于老年代，因为没有多余的空间用于复制)
如果新生代过小，会导致新生对象很快就晋升到老年代中，在老年代中对象很难被回收。如果新生代过大，会发生过多的复制过程。我们需要找到一个合适大小，不幸的是，要想获得一个合适的大小，只能通过不断的测试调优。

-XX:NewSize & -XX:MaxNewSize
设置 XX:MaxNewSize 参数时，应该考虑到新生代只是整个堆的一部分，新生代设置的越大，老年代区域就会减少。一般不允许新生代比老年代还大，因为要考虑GC时最坏情况，所有对象都晋升到老年代。(译者:会发生OOM错误) -XX:MaxNewSize 最大可以设置为-Xmx/2

-XX:NewRatio
这种方式的优点是新生代大小会随着整个堆大小动态扩展。例如 -XX:NewRatio=3 指定老年代/新生代为3/1. 老年代占堆大小的 3/4 ，新生代占 1/4
如果针对新生代,同时定义绝对值和相对值,绝对值将起作用。
$ java -XX:NewSize=32m -XX:MaxNewSize=512m -XX:NewRatio=3 MyApp
以上设置, JVM 会尝试为新生代分配四分之一的堆大小，但不会小于32MB或大于521MB

-XX:SurvivorRatio
指定伊甸园区(Eden)与幸存区大小比例. 例如, -XX:SurvivorRatio=10 表示伊甸园区(Eden)是幸存区To 大小的10倍(也是幸存区From的10倍)
假设幸存区相对伊甸园区(Eden)太小, 相应新生对象的伊甸园区(Eden)永远很大空间, 我们当然希望,如果这些对象在GC时全部被回收,伊甸园区(Eden)被清空,一切正常.然而,如果有一部分对象在GC中幸存下来, 幸存区只有很少空间容纳这些对象.结果大部分幸存对象在一次GC后，就会被转移到老年代 ,这并不是我们希望的.考虑相反情况, 假设幸存区相对伊甸园区(Eden)太大,当然有足够的空间，容纳GC后的幸存对象. 但是过小的伊甸园区(Eden),意味着空间将越快耗尽，增加新生代GC次数，这是不可接受的。

-XX:+PrintTenuringDistribution
指定JVM 在每次新生代GC时，输出幸存区中对象的年龄分布。

-XX:InitialTenuringThreshold, -XX:MaxTenuringThreshold and -XX:TargetSurvivorRatio
-XX:MaxTenuringThreshold=10 -XX:TargetSurvivorRatio=90 设定老年代阀值的上限为10,幸存区空间目标使用率为90%。
有多种方式,设置新生代行为，没有通用准则。我们必须清楚以下2中情况：
1 如果从年龄分布中发现，有很多对象的年龄持续增长，在到达老年代阀值之前。这表示 -XX:MaxTenuringThreshold 设置过大
2 如果 -XX:MaxTenuringThreshold 的值大于1，但是很多对象年龄从未大于1.应该看下幸存区的目标使用率。如果幸存区使用率从未到达，这表示对象都被GC回收，这正是我们想要的。如果幸存区使用率经常达到，有些年龄超过1的对象被移动到老年代中。这种情况，可以尝试调整幸存区大小或目标使用率。

-XX:+NeverTenure and -XX:+AlwaysTenure
颇为少见的参数,对应2种极端的新生代GC情况.-XX:+NeverTenure , 对象永远不会晋升到老年代.-XX:+AlwaysTenure, 表示没有幸存区,所有对象在第一次GC时，会晋升到老年代。

http://ifeve.com/useful-jvm-flags-part-4-heap-tuning/
http://ifeve.com/useful-jvm-flags-part-5-young-generation-garbage-collection/